But, but...
can't we just store all the data into The Cloud?
As data volumes increase in all industries and the challenges of information management continue to grow, we look for places to store our hoarded bytes. Inevitably the subject of archiving and tape comes up. It is the cheapest place to archive data by some way; my calculations give tape a four-year cost of something in the …
But no you can't see them...it's commercially sensitive information. :-(
You'll have to trust me; I have no vendor affiliation; suffice to say, it is not quite as simple as taking the cost of an LTO tape, you have to factor in SAN, Robot, Tape library costs; both capital and operational expenditure.
You also have to factor in the software to handle the migration.
Sure you can hack together some scripts*, but they don't give calendar based scheduling, use of drives in only downtime, node priority, etc. etc.
*You could also hack together some scripts to do your backups or archives, but you don't for very sound reasons.
Depends on type of disk and type of tape and humidity and temperature and density of each.
Finding an interface to read the disk vs a working tape drive and interface in 30 years is easier maybe for Disk. I can still read any IDE/PATA drives easily. Ancient SCSI via PCI or PCMCIA adaptors. But MFM drives need me to boot a machine with DOS or Win 3.0 or Win 3.1 and full length ISA slot. But new Laptop doesn't take the PCMCIA SCSI adaptor, and some of the newer PCs have no PATA/IDE interfaces, nor PCI slots for older ones.
I do have some drives that are a "kind" of IDE (on board CPU, same connection) but used a custom "PIO" ISA like adaptor. Not readable on anything except original Wang Mobo with ISA riser card and Pseudo IDE connection.
SO having complete faith in El Reg to know what's best for me I first tried to move my stuff info /dev/null, only to find out that this doesn't work:
[me@smtp ~]$ mv test.jpg /dev/null
mv: inter-device move failed: `test.jpg' to `/dev/null'; unable to remove target: Permission denied
SO then I asked our admin for the root password, but that didn't go too well either.... After he asked me why I wanted it I told him that I needed to move my data into /dev/null, then he only gave me a rude comment about lame jokes and hung up on me!
But I figured it out:
[me@smtp ~]$ cp test.jpg /dev/null
[me@smtp ~]$ rm test.jpg
However, there's one problem when I try to get my picture back:
[me@smtp ~]$ cp /dev/null test.jpg
[me@smtp ~]$ file test.jpg
test.jpg: empty
So I hope anyone can help, I need this picture back yesterday ;-)
I'm so sorry to hear you're missing the picture.
My advice to you is to locate the drive /dev/null, remove it from the computer and send it off to a professional data recovery company. I'm sure they will be most helpful.
Do remember to replace the drive once removed, (you will need to format it:
sudo dd if=/dev/random of=/dev/null
-- this may take some time) as otherwise you will not have any backup space to use.
Good luck!
'A wise, bearded and be-sandled system administrator once told me
"A backup is not a backup until its been read"
Wise words.'
Actually, a backup is not a backup until it has been read and you are able to use that data to replicate the system from which the backup was taken.
I had to learn that one the hard way. :(
No value got lost. Actually, a lot is saved with moving to /dev/null since most files are already copies from /dev/zero and gzip only allows for so much compression.
When we talk about management documents, then we can safely assume that they were retrieved from /dev/urandom (no, not /dev/random, management types are so predictable 50% of the time), so /dev/null makes a lot of sense. At least more sense than the management types do, have done and ever will do.
(sarcasm may be a part of this post, ymmv)
"Firms no longer seem to be able to categorise data into separate "keep" and "delete" piles."
Aint that the truth, it seems in the future a biger and bigger percentage of stored data will be
1) "stuff we dont know if we need"
2) "stuff we couldnt be bothered going through"
3) duplicates of 1) & 2)
When it comes to tape storage, there is a bit of a snag here. You may well know that, of that backup, only a few files and databases are actually needed long-term, but short of unravelling the tape and snipping out the unwanted bits, there's not a lot you can do.
Usually cheaper to store the whole lot than it is to pull 'em all in and then tie up a load of lads and hardware in restoring 'em somewhere and resaving the "required long-term" bits.
In the Government department I subcontract for fellow well known IT contracting company has a system where when a user pulls a customers record and makes an alteration several pages are printed out and then amended. The pages are then posted across the country to a scanning facility, where they re scanned and returned to the database - in addition to their previous scans, SO each time a record is altered its size doubles! and we are talking tiff files here not txt.
Glad I'm not in charge of the arcghive strategy for that.
Late last millennium, the establishment I was working for was transitioning to ISO9001; and outsourced the generation of their operating procedures to a third-party, as is the way with such places. Clearly, the third-party was paid by the kilo, as we received and had to dually implement and adhere to bookcases of glossy manuals.
Of course, there were reams of procedures ensuring that all our data was rigorously backed up to removable media; which could be archived safely off-site.
However, when the need arose, we discovered that there were no procedures outlined to ever get any of this data back...
Best thing about ISO9001 is that if you document your procedure as archiving everything to /dev/null then so long as you do that then you are ISO9001 compliant .... always amuses me when companies make a big deal of have ISO9001 accreditation when in theory it could just mean "we have documented that we are utterly useless but we rigourously adhere to that level of (in)competence"
While I generally agree with the idea that humans (and especially businesses run by humans) tend to be major pack rats when it comes to data.. This really isn't such a new problem if you consider the warehouses of file cabinets that were common not that long ago (and in some cases still are).
As our ability to store information digitally increases so does our desire to save everything. Nobody dares try to create an algorithm for sorting out what data is truly valuable for fear of making a mistake. Irregardless, we really could use a storage technology that is reliable and whose means of access are adaptive to changing technology.
How about bioengineering a storage medium that uses already established data redundancy and integrity techniques to recover lost data or reject mutated storage. :P You'd just have to engineer new interfaces for the storage over time. Or maybe you could induce a forced mutation to 'upgrade' the entire system? The cost would be in feeding the thing I imagine. And it could scale to any size as needed. lol. Kidding. Kinda.
I've got a manager over one site who's never deleted an email in his life. The man literally saves it all, even the spam. Do you have any idea how much space 20 years worth of email takes up in backups? Thankfully he retires in a couple months. I think I'll do the final backup of his inbox to /dev/null.
So I've seen it now. Impressive collection of personal data, to be sure, what with the number of keystrokes, mails and steps taken. It's also an impressive show of analytical power that Mathematica demonstrates.
And ?
Apart from giving this guy a chart saying that he's worked a lot, what does one get out of it ?
I don't see much use in a chart telling me that I send more mail now than I did ten years ago. Does that mean I'm wasting time sending mail, or does it mean that I'm more occupied and doing more efficient things with email ? I know the answer already, and I didn't need to saddle my life with a bunch of stat collectors to tell me.
I salute the performance in monitoring, but frankly I don't see what it can tell me about what I might need to do in the future, and forecasting future needs is the only reason to collect all that data in the first place.
So no, I don't see that "everybody" will be doing that in the future. Actually, I don't see that anyone will be doing that in the future. Having a good beer is so much more fun.
The value of big data is mining it for information by subjecting it to whatever crafty analysis you can think up.
But if it takes months to simply read the data in for analysis, the eventual results will be hopelessly out of date.
The data that's valuable in and of itself without all that analysis is only a miniscule proportion of that data heap.
(ALL:)
Every byte is sacred
Every byte is great
If a byte is wasted
God gets quite irate
(BOFH:)
Let the heathen store theirs
Up in the wispy cloud
God shall make them pay for
Each byte that cant be found
(ALL:)
Every byte is sacred
Every byte is great
If a byte is wasted
God gets quite irate
(PFY:)
LTO, DDR, VXA
Record theirs just anywhere
But God loves those who treat their
Data with more care
(ALL:)
Every byte is sacred
Every byte is great
If a byte is wasted
God gets quite irate
Actually /dev/null is quite sensible. I've pointed out to people quite often that there's not much point in keeping stuff on tape that's going to take so long to get back that it'll be no use in a disaster. At least with /dev/null you find out quickly that there's no archive and do something else about the problem - instead of waiting three days to come to the same conclusion.
The more space we create the more shit we find to fill it..
I used to be careful what I stored and where I stored it with 40Mb of drive space, not so now I have a couple of Tb to play with..
It's amazing what you don't actually need, and never ever use more than once (if that!)
This is not the silliest stuff I've read but comes close. Anyone running a big data center is migrating data all the time as tech goes obsolete. The bigger the data center the more you migrate off old disk to new. You do the same with a tape archive. Start with T10kA drives run those a few years and skip the T10kB drives and start writing new data to T10kC. Then wait for it; you also start migrating those T10kA tapes to T10kC tapes. Please think about using a HSM which has built in migration functions and if you have more than a PB you probably have more than 2 drives to do it with.. I know I do. I also, like my disk purchases, plan for obsolescence and build migration plans into my costs. This fiction that tape is write once read never is just tha I've also done the numbers and the cost it's 4 times cheaper. Go read David Rosenthals article on the cost of long term storage and the cost of cloud storage