Insightful!
Keep stuff you need to keep; and throw the rest away.
Wow! Cutting edge insights R us.
Storage is a big deal for IT people and beancounters alike. For the IT team the story is pretty consistent: there's never quite enough, and the users seem to eat it up and an amazing rate. For the finance team it's a seemingly endless queue of IT people asking for funds for yet more storage because the rate of growth in stored …
1. What's this directory called 'RI-79-SBNLMC'?
2. Dunno, think it belonged to Fred who retired 5 years ago?
3. It's full of cryptically labelled .dat files - I'll just zap it.
4. Does anyone know anything about a project sometime in the late 1970s to add a new layer of security to our secret buried nuclear landmines? I think Fred might have worked on it.
They see 1TB of full disk as £75 of assets which can't be used for any new project and is therefore a waste. What they don't realise is that it takes much more than 1 hour of a project managers time at £150/hr to sort though the data and kill off unnecessary files maybe gaining 25% of the space back because the rest will be required as stated in "Every time!"
We almost need a data retention tag on on all files and folder created with the data, that way at a later date we can purge all "retain_24m" files at two years and a day.
"pi_0" is personal information top level, down beyond something like "pi_7" - "first name and email address"
With the amount of data users keep, going through the stuff later is a nightmare, and with the cloud inviting users to squirrel data away in any one of many hidden services it almost justifies my dislike of the cloud.
So many small companies could be opening themselves to data issues because users can get to onedrive, dropbox, google drive etc. upload data there and on leaving the company not flag what services they used or clean up properly. The cloud providers are partly to blame for this, the easy services that "will get through a firewall" (silently) are not aimed at data compliance.
> that way at a later date we can purge all "retain_24m" files at two years and a day.
You'd need to change every app for it to be meaningful, since it's at data creation that you need to set it. E.g. start working on a word doc and put the tag there. In an email, in Excel, in Pages etc etc etc.
And then you'd need to encourage users to use it properly. Which they won't, because they can't. Is someone going to train all the users that a doc they create where they paste in an email address can't have a date of more than X, but if it's a different type of doc then it can have Y.
And each countries DP rules are different, so a multinational would need to figure out how to train users about the country the data belongs to.
Or just store everything forever. I know which offers more 'visible' value to the business...
"Is someone going to train all the users that a doc they create where they paste in an email address can't have a date of more than X, but if it's a different type of doc then it can have Y.
And each countries DP rules are different, so a multinational would need to figure out how to train users about the country the data belongs to."
Isn't that already the case where personal data is involved? If the business no longer has a valid business reason the retain the data, then it must be deleted. Each jurisdiction has it's own definitions of personal data and retention times.
Oh don't get me started on the "1TB Disk for £75". Bean-counters then wonder why enterprise storage costs loads more. RAID, controllers, snapshots, compression, deduplication, replication, and all them backups, plus the cost of the people keeping it all running smoothly. (you know, the ones who are immediately blamed on the rare occasion when it doesn't).
And then there's those who won't archive/delete old data from the active database. Do you really want 5-year old data that hasn't been touched in the last 4 years clogging up your expensive All-Flash Array?
I store everything on punch cards. I store the cards in a cardboard box. I stack the box like masonry around my house. It acts like insulation as the layers of boxes acretes to an ever deeper depth. I started out on a tiny little pebble out in space that nobody noticed. Astronomers now call my pebble Saturn. I'll be starting a new ring soon. I'll never run out of Outer Space! =-D
*Runs away cackling in glee*
So, with the data density of a punched card (about 14MB/M3, if I keyed the numbers into the stack correctly), you have about 10 Yottabytes of data. That is a fairly compulsive-obsessive hoarding habit. Makes my VCR collection of 'Great Open University Kipper Ties' look pathetic.
In the context of data, this platitude has been proven to be bunk, a myth, a fool's paradise. You'll be stepping over pounds to pick up pennies. You'll lose money chasing that dream just as surely as you would at the craps table.
Data appreciates in value faster than the cost of the underlying storage. The only "savings" are associated with some form of de-duplication, whether you are talking at a physical-block or a logical-copy level. In terms of post-dedup'd unique data, the only winning strategy is to keep it and work it. Data appreciates in value over time, as the envelope of tools and metadata that surrounds it grows. Correlation allows data to make data more valuable; it's a domain where 1+1 > 3, it's not like any other asset. The cost of finding the rare data elements that didn't appreciate will exceed any potential cost savings from deleting them (losses that will be compounded by lost-opportunity costs of missed future correlations).
Beware the siren's song.