So The Next Time Your Service Provider Assures You That...........
.......your stuff has REALLY been "deleted".........
.......you will know for sure that the claim is a bare-faced lie!!!!
Tape – as a digital storage medium – has been considered dead for your correspondent's entire 29-year career. But that didn't stop manufacturers behind the Linear Tape-Open (LTO) standard shipping 152.9* exabytes worth of the stuff last year. HPE, IBM, and Quantum are the only three LTO Program Technology Providers, and last …
Google archives or used to archive data on media that could not be overwritten or deleted. The way they handle the fact that it must be possible to delete data for legal reasons is by encrypting the data, and storing the decryption key separately. Instead of deleting the data, they delete the decryption key, so the archive becomes unreadable for all practical purposes.
> Why not just destroy the media?
Because one lump of media can hold a lot more data than just the stuff that is to be "deleted"?
So destroying the whole lump would mean copying everything but the bad stuff to a new lump. Repeat when another small amount has to be destroyed, which gets costly quite quickly.
Instead, consider filling the lump with, say, n zip files, each one with is own password. Delete two passwords and (n-2)*100/n percent of the lump's value is retained.
Since the YouTube content is already H.26x-compressed, or equivalent, I expect that makes a mockery of the tape vendor's claimed capacity. Why not just claim the actual capacity and let users work out what it means for their data? Quoting a "compressed" capacity strikes me as borderline fraudulent, even if there is a defence that "everyone knows" the details.
I always wondered much the same thing and agree with the sentiment.
Though- as you note- the fact that this appears to have been established practice in professional tape storage for decades means it's very unlikely that anyone in a position to be buying such things is likely to be misled.
It just seems pointless, especially as LTO is apparently an open standard with standard-sized tapes and hence no obvious issues of "we'll look bad if we're the only ones not doing it" rivalry.
Dear Simon,
You have been mocking Oh Great Vendor products. The LTO revolution is now upon you and are sentenced to fill one tape manually and compressed with the text(*): "I Will Not Mock LTO.".
The tape will be checked and you will be informed of any failures to restore.
Thank you for your article.
Signed HPE, IBM and Quantum.
(*) In alternating UTF-8 and EBCDIC, of course
Does it?
I would have thought the rate limiting step would be getting the bits (compressed or not) on or off the magnetic media? Unless the cpu and memory inside the tape drives are seriously underpowered or slow, I would think a modern cpu could more than keep up.
Even then, if the drive can operate with hardware compression turned off, host based compression (and encryption) is an option. If you were going to use host based encryption you need to compress first anyway. (Would you really trust encryption in tape hardware?)
If you were to use asymmetric encryption (public key) to encrypt your archives and kept each decrypting key in a "tamper proof" device then destroying the device effectively deletes the archives so encrypted. The encrypting key doesn't need to be secret.
In practise I imagine you would use ephemeral symmetric encryption with those keys protected by the asymmetric cryptography.
Its ironic that in more than 50 years that archive storage is still magnetic tape based. "Breakthroughs" in optical/photographic/holographic technologies haven't got much past headlines. I vaguely recall decades ago a proposal to store such data on silicon(?) wafers using the same photolithographic(?) technology used in IC fabrication yet in 2024 we still trust our data to rust on great lengths of plastic. :)
I'm not an expert, but as far as i'm aware you're correct about the compression.
The limiting factor is the storage medium, and most vendors actually indicate the read and write speeds scale (almost) 1:1 with compression ratios (2:1 compression ratio would theoretically double the read and write speeds). Source (IBM)
As far as encryption goes, LTO says "Native hardware encryption typically affects less than 1% of tape drive performance.". This page also goes into the specifics of the encryption used, for how they manage the keys you'd need to look up the vendor documentation, IBM should have those publicly available.
Just considering compression, and done by the CPU, you can easily find[1] recommendations to enable compression as a speedup. Then the tech changes (IDE to SATA) and the needle swings, then the tech changes (CPUs speed up, the stage in that link), needle swings, tech changes (SSD) and needle swings...
Unless you are using some pretty dang amazing tapes, at great speed and parallel track densities (i.e. really expensive, which totally negates the point of using tape in the first place), I'd be willing to bet that on-the-fly (de)compression can easily keep up.
Come to think of of it, I'm sure I recall being told to use compression when PIP'ing data on a marvellous new dual eight inch floppy CP/M box, because that would be faster.
[1] just the first hit I got, there will be others with long tables of timing experiments etc if you want
About 15 years ago when I was working with LTO3 tapes, compression added basically no time penalty in our setup.
On the other hand, even then, there wasn't much data that wasn't already compressed, so I think we usually got 95%+ compression rates (ie saved less than 5% of the space). Still, not having to copy that 5% probably sped up the total backup by 10-20 mins (in a eight hour job).
I sleep better knowing our data is being carved into sophisticated linear rust* and taken offsite**
It scares me when I hear of companies that think replication is a sufficient backup, and long-term archive? Whassat?
It depresses me that requesting a tape library in our datacentre raises eyebrows and sarcastic comments.
* Yes, we do regularly test restore too.
** Yes, it is encrypted.
Anti-Ransomware is how we sold our tape storage to upper management but in reality it was the mountains of structured and unstructured data that need to be retained long term for compliance purposes. Data that has long term retention but will never be recalled is the bane of the storage industry!
I can remember about 30 years ago being asked by a project manager for a copy of their teams project data (source code, binaries, development tools, ...) on tape(s), so that a final set of tapes could be stored long term (25+ years) at an underground storage location (under a mountain) in case any customers ever wanted/needed to bring the project back from the dead. I think it was DAT tapes back then, before LTO. And I always remember thinking, and pointing out at the time, that they should have 3 copies of each tape, and also archive with 3 tape drives, 3 servers, 3 copies of the backup software and three copies of the OS on the servers, and at least 9 client machines to be able to partially restore the environment in 25+ year time. But even though they did agree with my thinking, at least in principal, there was no allowance for that in the shutdown budget for the project. I really do wonder where those tapes are now, how long until DAT drives become Unobtainium. And I think that the development clients were sparc based, and they are nearly Unobtainium now! I guess with M.A.M.E. (M.E.S.S.), it would be possible to emulate some of the environment today. But it would not be easy.
The longer that a project is mothballed the harder it is to bring it back to life, even with one set of fully working tapes.
I think I still have a DAT drive somewhere, and one of my older machine still has an Adaptec 2940 SCSI controller in it, even though there's nothing currently connected to it.
I would like to be able to justify an LTO drive of adequate capacity for home use - said DAT drive, in the days when capacity was larger than a hard disk, used to do weekly full backups, spit the tape out and prompt for a new one, then do incremental backups for the rest of the week, then spit the tape out and prompt for a new one, ready for the following full backup. All done with scripts, too. I even did a successful restore once, when the disk abruptly died on me. Every few months I'd take a tape to work and store it there, bringing home an older one.
A lot of storage is on tape because it is still relevant.
Scientific data from 150 years ago is still important in astronomy. Not sure if recopying reel-reel tape to exabyte to DAT to LTO is easier than just rescanning the plates.
And in the less real world, a lot of corporate / tax / insurance data has to be kept for decades.
And don't even start thinking about drivers....
And just when you need to get at it, you can't remember how to get into the mountain (or even which mountain) nor find the (physical) key.
Speaking of keys, I once was taking part in a product launch in the mid 1990''s of a certain very early multi-function peripheral (fax/scanner/printer/copier) made by a certain Japanese printer manufactuer. The launch was in Dublin (mmmm. REAL Guinness!) at some country house hotel.
To keep the four demo systems (with host PCs) safe overnight the hotel staff secured the doors to the function room using a massive motorbike chain and lock - the doors didn't actually have a lock.
Come the morning, could they find the bloody key? NO!
We got some rather odd looks from the press as they wandered into this function room with a huge great D shaped brass handle on one door with the handle for the other door (no longer attached to the other door) hanging down from it by the chain.
Mmmmm....
My first dev machine in my current employment (~1990/1) was a SPARC box, with a DAT drive. I've still got the tapes, but it would cost more than I've got to get any data off them. And the data has exactly zero value anyway. Still, it would be interesting - I've backed up stuff on USB sticks that was unreadable after 2 years.
"And again, remember that read and write times – and therefore recovery efforts – take even longer when there's compression or decompression to be done."
This is incorrect (and, in fact, exactly backwards.) Compression/Decompression is done by a dedicated CPU on the drive itself (it's been that way the entire time I've been in storage, which goes back to the Gen 1 LTO), and it's designed to keep up with about twice the "native" data rate of the physical tape. Therefore, data transfer rates are usually *increased* by the amount of the compression. The limiting factor for tape is more-often how fast you can get the data off the disk, and/or congestion on your storage network. (This is less-bad than it used to be with flash replacing many HDD's.)
Not for nothing does modern high-performance flash perform native data compression right there on the flash module.
(Source: Nearly a quarter century in Enterprise Storage, much of it spent worrying about tape and SAN performance limits.)
Thank you -- I was hoping someone would get that on the table.
I had a HUGE restore of historical data to be stuffed into a "data lake" and had to use an older Sun Box. The only thing that made it useful was the fact that it had a TON of extra ram -- Virtual disks and a 10Gb link to the data lake hosts cut ?? I seem to recall 6 months off the process. The 10G card was initially a no go with the OS in place, but since the host was EOL and was for this purpose only I was allowed to upgrade and kludge in the driver.
In addition to the cost of storage media, their capacity, and their potential speed for writing and reading data, there is the matter of storage durability. This concerns the intended timescale for the preservation of archives. Individuals, businesses, and archivists of human knowledge and digitally expressed culture, will have differing perspectives and requirements.
During the post-WW2 boom in digital data storage, various technologies have emerged. Some, e.g. hard drives, have advanced apace. Meanwhile, tape remains a reliable backup. Hard drives are in competition with static storage devices. Some media are obsolete, e.g. floppy disks, and information stored upon them may no longer be retrievable. CDs and DVDs are a modern version of inscribing information on tablets of stone, yet their resistance to the ravages of entropy appears negligible in comparison. Paper/cardboard based media — leaves of paper, punched paper tape, and punched cards — under proper storage conditions can survive many centuries, but their data capacities are tiny by modern standards. Also, paper libraries by touch of a flame can become bonfires.
The above underpin the very important question of how mankind can preserve information and (non-transient) digital cultural artefacts for the long-term. Some materials have necessitate preservation for a fixed time only, e.g. records of a deceased individual's bank transactions. Many others are too voluminous to merit indefinite preservation; for example, only samples of content on Twitter and Facebook justify permanent archives.
All present stores of digitised culture/knowledge are susceptible to sudden large scale entropic disruption arising from natural disaster, civil disorder, or warfare. Moreover, apart from noble efforts by the likes of the Internet Archive, Lib-Gen, Sci-Hub, and Anna's Archive to both preserve and make readily available these materials, there is no incentive for present keepers of 'content' deemed proprietorial to participate in systematic archival preservation; indeed, they put tremendous effort - albeit with diminishing returns - to retain informational/cultural fiefdoms.
Over the span of millennia the solution to survival has been number of copies - and those may be in the form of being quoted by a later author rather than straight copies - and chance. In other words, we have the occasional copy of the occasional old text and know very well that more has been lost. That's why there's now great interest in recovering the contents of the carbonised scrolls from the House of Papyri in Herculaneum which might include texts unknown elsewhere. Ironically being buried in volcanic ash has preserved them against the usual decay of organic matter. This is still not a recommendation for carbonising your records in order to preserve them.
Physical longevity is an issue but so is software longevity. Essentially being able to decode the bits into the original meaningful information. And then the compute hardware to run that software on.
I am reminded of the challenge in writing warning signs to protect future civilisations from the dangers of say, buried radioactive waste. Producing a sign that will still exist in a million years is one thing, but coming up with images or language that will still make sense then is a whole other challenge.
Apart from the durability of the storage medium itself, one of the problems with long-term recoverability is that LTO drives can only read two generations back, so those new LTO-9 drives can only read LTO-7. If you want to read older tapes, you have to have older drives ... which are no longer made or supported by the manufacturers. As a result, at my last job, we had a recurring task every few years to call back the tapes from archive and recopy them to a newer format, which was a time-consuming PITA. It did have the virtue, however, of making us ensure that the media were still good.
I have to wonder how many of the exabytes of tape capacity sold is to people performing the same exercise.
> we had a recurring task every few years to ... recopy them to a newer format
Hopefully, we are *all* doing that with our archives, both corporate and personal, if we want to have a chance of reading them again.
Digital photos moving from Floppy to Zip Disk to Winchester to IDE to SATA; from plastic box on shelf to second drive on PC to USB drive caddy to NAS to duplicate NASes as far apart as you can get them; from FAT to NTFS to EX4 to UFS to ZFS; from single drives to mdraid to GEOM striping to ZFS.
As I understand it with regards to tape migration what happens is in a large tape library holding many LTO-X drives and LTO-X tapes is that they just rip out the LTO-X drives and put in new LTO-(X+2) drives and a bunch of LTO-(X+2) tapes (which should hold about four times as much data per cartridge) and simply start reading data from the LTO-X tapes which those drives can still read and then write that data to the new LTO-(X+2) tapes and then they eventually remove the old tapes. I believe that much of this process is automated anyway so shouldn't be that much of a drama as far as human involvement goes. So rinse and repeat again every half dozen years time or so. One advantage of doing this is that you never have to test the tapes reputed longevity of several decades as you've probably only got them on hand for perhaps a decade at most. Disadvantage is the cost in doing this migration but again it would cost even more if you go outside the (X+2) window. This is probably the only real method of digital preservation for very large amounts of data. Interesting read here if anyone is interested. https://spectrum.ieee.org/the-lost-picture-show-hollywood-archivists-cant-outpace-obsolescence
Not just for chasing media formats. One of the background tasks that computer operators (remember them :) ) did when things where quiet was Forth Bridging tapes. That is, copying and old archive tapes onto other tapes so that the archive remained viable. If I remember correctly, the operating system (George 3 at the time) had a built in job to keep track of everything and prompt which tapes needed to go on which drives etc. Just leaving a tape to molder in an archive is a sure way to have a write only backup.
Longevity of the media is a different order of magnitude. The LTO drive itself is a different prospect... The tape might be fine but the ability to read it not so much.
I pratted around with some second hand drives; Seagate LTO5 and HP LTO 6 respectively in the home lab. While I can't speak for the third manufacturer, the driver support from Seagate was excellent, whereas HP hide almost everything behind enterprise service contracts. HP drives are near-useless outside of business use and getting one of those contracts.
Linux support is a mixed bag; Redhat RPMs work out of the box with Centos no questions asked. Getting the generic driver tarball working in another distro has so eluded me.
Optical media is still proving incredibly useful for domestic backup purposes, though even a 50GB disc is now getting a little short in size for anything but the irreplaceable. And, being a reg reader, one assumes most of us have philosophical issues in relying on cloudy services...
Given the retrieval time for anything in ACME's deep storage can take a full day, I'm guessing that involves a guy driving a golf cart deep into the Blue Ridge Mountains to fetch a tape of encrypted data to which a known key may no longer exist. And the idea that such data can & should be deleted for "legal reasons" is absurd. A reasonable NDA (and any related retention clauses) will explicitly speak to the aforementioned archival factors.
A 3% increase in capacity shipped, means a *decrease* in unit shipments. Because over time, more people are buying the higher capacity LTO-9, hence fewer tapes required.
Given that the unit capacity is scaling by an average of 15% annually, the unit shipments must be decreasing at least 10% annually. At some point, the decreasing revenue is just not going to support the R&D to release new media and drive. Probably LTO10 will happen, I’m dubious about LTO11, and afterwards it’s just legacy.
This post has been deleted by its author
[citation needed]
I have never had this experience, and I have run hundreds of SSDs in multiple classes of device: servers, storage arrays, PCs, and laptops. The failure rates were much lower than hard drives. LTO tapes tended to be pretty reliable, although drive failures were not uncommon.
https://arstechnica.com/gadgets/2023/05/hdds-typically-fail-in-under-3-years-backblaze-study-of-17155-drives-finds/
May be HDs instead of SSDs.
https://arstechnica.com/gadgets/2023/08/sandisk-extreme-ssds-are-worthless-multiple-lawsuits-against-wd-say/
May just be that brand.
https://arstechnica.com/gadgets/2023/03/hdds-arent-as-durable-as-they-used-to-be-study-of-2007-damaged-drives-suggests/
That's worrysome.
https://arstechnica.com/gadgets/2023/06/clearly-predatory-western-digital-sparks-panic-anger-for-age-shaming-hdds/
Overall while there seems to be a bias against HDs on the data, is not like SSDs don't fail either.
Definitely avoiding SanDisk from now on.
There are four copies extant of the eight hundred year old Magna Carta document. Pretty much worthwhile keeping this sort of cultural/legal document, don't you think?
By comparison Microsoft Word documents I wrote in 1990 can't be opened in M$ software today!! Are there cultural/legal documents in M$ Word worth keeping? Guess the answer is "Yes"!!!
So.....before we get wrapped around the axle over "magnetic media"........just remember that even if you can restore something, THERE MAY BE NO SOFTWARE TO READ IT!!!!!
Thirty years with Microsoft.......eight hundred years with parchment. And people tell me this is progress!!
Sigh!!
@AC
In 2004 I copied all my Osborne 01 CP/M-80 floppies (1983 to 1986 -- 92K and 192K) to an MS-DOS hard drive.
I moved the dBase-II data to dBase-III format, also on a MS-DOS hard drive.
So.....most of my 198x Osborne CP/M-80 data is available today under Linux:
- Wordstar documents still available: Linux/DOSBOX/Wordstar
- dBase data still available: Linux/Harbour
....not bothered about M$ BASIC or SupercCalc, since I never used those much.
The only niggle about those CP/M-80 files is that CP/M-80 files did not have a date stamp...I need to rely on dates (if any) within the files themselves.
So with a little bit of effort on file transfers, forty-some years old data is available today on my Linux server.....and there's still software available to read it.
Unfortunately, M$ Word (1990) and MicroGraphx Designer (1990) are both MUCH harder...maybe impossible!!
Sigh!!
"By comparison Microsoft Word documents I wrote in 1990 can't be opened in M$ software today!!"
But, as I'm sure you know and your comment suggests, you can easily get software that can read them, for free, which runs on your computer in a free VM program, which can convert them to something else which can be opened. We haven't lost that data. Not only could you have easily converted them at the time as people do with their backup media, but unlike that media, it's also easy to recover them on demand today. Recovery taking minutes instead of seconds is not the same as the permanent loss of the data.
I know of some long-lived contracts, usually between military organizations and commercial ones, for diagnose-and-fix and technical support and so on. Typically 20 to 40 years duration.
The original documents were in Wordperfect version zero point nought or something like that.
So they have to keep the documents either in paper form (scanned at 600 dpi when convenient) or "converted".
The big advantage to running a contract for 30 years is that the olden days had no terabyte or even gigabyte disc drives. If there isn't much data to archive, there isn't much of a problem.
If they're in 1980s Word Perfect format, they don't contain that much that can't be handled in plain text. Nowadays, PDF is a more expected format, and that will likely be easily read years from now. As much as I dislike it as a format, it's a format that, due to our powerful computers, puts a lot of weight on backwards compatibility and one for which lots of software exists. That is if they use the typical, unencrypted subset of PDF. If they insist on putting in the weird Adobe additions that only Adobe software* understands, then it's less likely to work.
* Well, one piece of Adobe software. Everything else will break. Well, a subset of versions of one piece of Adobe software.
Storing data in 'the cloud' does have the new sexy vibe going for it. The cloudy backup is stored forever as long as the monthly payment is made.
Not to mention the cloud provider could be required to share data with law enforcement under a gag order that forbids them from disclosing that to their customers. Backups are all about mitigating risk, so the risk of unauthorized access is a legitimate concern.
If an organization really wants to make sure their data always stays in their hands, the only solution is on-prem media they physically control. Okay, sure, take the tapes to a distant location to protect against facility destruction / natural disasters, but that is still maintaining physical control.
Realistically it should be rare that any one of us in IT during our career actually has to do a full facility restore from tape. Hit a local short-term disk based backup to get back the accidentally deleted spreadsheet, that is not what tape is good at, nor is that what a proper backup architecture should use it for.
For long term off-line storage, tape is still king in the dollars per bits category. A quick look online shows ~$2 per TB, which is about a tenth the price of hard drive storage. Tape sucks at a lot of of other stuff, but for its price & physical control it cannot be beat.
Admitted guy who sells tape into Peta and Exascale environments... This is intended as a rant and not a sales pitch.
Batch backup on tape is dead and has been for a long time. Cold storage in the cloud is tape and will be for a long time to come as there are no real competitors to store the bits more reliable, densely, cheaply and repeatablly (thats the important part). The new technologies on the horizon haven't figured out the commercial viability issues and may never.
Tape drives are actually quite fast with LTO-9 native performance of ~390MBs per drive. Most hard drives cannot keep up with a single tape drive. Many customers' network interfaces (10GbE) can't support more than 2 tape drives and only one if the customer has compressible data. As one other commenter noted, compression actually speeds up reads and writes as the compression occurs in the hardware at the drive level as the data is written. If you're performing checksums, the core count of the server will come into play more and you may only be able to support 3 or so drives on a single server anyways.
There is a perceived slowness of tape because of the time it takes for the robot to pick a tape, move, mount the tape and move to the appropriate location on the tape to retrieve the bits. That is going to be about 90 seconds give or take. Small files read and write slower but most backup formats will encapsulate the files into one or more large data streams to negate the that issue.
IMHO, the main issue with tape is the logistics of tape handling and costs associated with offsite data protection though the costs are far less than that of another data center with a DR disk or flash based storage system.
The second major issue is data migration between generations of media though you wouldn't need to consider that, unless you're storing an archive, for two to three generations of media as the lifespan of tape media stored in the correct environment is claimed to be 30 years with some tape formats being readable at over 50 with the correct tools. With LTO release schedules slowing from two, to three, to four between generations it is now realistic to think a customer can go 10+ years between migrations. When they do they'll be able to store multiple previous generation tapes on a single current cartridge. Backwards compatibility is fantastic but also prevents the innovation required to increase storage densities 2-2.5x for each generation. What do customers do when their existing storage systems come end-of-life? They migrate unless it's object storage in which case they add new nodes, offline old ones and repair the objects as part of normal operation.
A cartridge requires no power when not spinning in a drive, a rack of tape can support 10s of PBs and consume less than 1000 watts at peak utilization and ~50 when idle. A rack of NVMe can take ~49,000 watts with an Nvidia Blackwell consuming ~120,000 watts.
A sounds data management plan takes into account the necessary life of the data. Can the data be remonetize have enough value to justify storing it forever? Does it have a compliance or regulatory requirement? Some compliance requirements are 7 years while some may be for a 100 years past the life of a patient. To the point of the article, do you want to be able to retrain your model at a later date?
Rant over... I reserve the right to think of more things to extend my rant however.