back to article OK, we've got your data. But we really want to delete it ASAP

Storage is a big deal for IT people and beancounters alike. For the IT team the story is pretty consistent: there's never quite enough, and the users seem to eat it up and an amazing rate. For the finance team it's a seemingly endless queue of IT people asking for funds for yet more storage because the rate of growth in stored …

  1. BrowserUk

    Insightful!

    Keep stuff you need to keep; and throw the rest away.

    Wow! Cutting edge insights R us.

  2. Anonymous Coward
    Anonymous Coward

    Every time!

    1. What's this directory called 'RI-79-SBNLMC'?

    2. Dunno, think it belonged to Fred who retired 5 years ago?

    3. It's full of cryptically labelled .dat files - I'll just zap it.

    4. Does anyone know anything about a project sometime in the late 1970s to add a new layer of security to our secret buried nuclear landmines? I think Fred might have worked on it.

  3. Sebastian A

    Knowing which data is eligible for deletion...

    is most of the work. You have to make sure it's filed/classified/tagged correctly when it's active. Of course, most IT people aren't in a position to tell management from 5+ years ago how to tell their staff to manage documents.

  4. JassMan Silver badge
    Facepalm

    The real problem is the way bean counters work.

    They see 1TB of full disk as £75 of assets which can't be used for any new project and is therefore a waste. What they don't realise is that it takes much more than 1 hour of a project managers time at £150/hr to sort though the data and kill off unnecessary files maybe gaining 25% of the space back because the rest will be required as stated in "Every time!"

    1. Anonymous Coward
      Anonymous Coward

      Re: The real problem is the way bean counters work.

      We almost need a data retention tag on on all files and folder created with the data, that way at a later date we can purge all "retain_24m" files at two years and a day.

      "pi_0" is personal information top level, down beyond something like "pi_7" - "first name and email address"

      With the amount of data users keep, going through the stuff later is a nightmare, and with the cloud inviting users to squirrel data away in any one of many hidden services it almost justifies my dislike of the cloud.

      So many small companies could be opening themselves to data issues because users can get to onedrive, dropbox, google drive etc. upload data there and on leaving the company not flag what services they used or clean up properly. The cloud providers are partly to blame for this, the easy services that "will get through a firewall" (silently) are not aimed at data compliance.

      1. Paul Hargreaves

        Re: The real problem is the way bean counters work.

        > that way at a later date we can purge all "retain_24m" files at two years and a day.

        You'd need to change every app for it to be meaningful, since it's at data creation that you need to set it. E.g. start working on a word doc and put the tag there. In an email, in Excel, in Pages etc etc etc.

        And then you'd need to encourage users to use it properly. Which they won't, because they can't. Is someone going to train all the users that a doc they create where they paste in an email address can't have a date of more than X, but if it's a different type of doc then it can have Y.

        And each countries DP rules are different, so a multinational would need to figure out how to train users about the country the data belongs to.

        Or just store everything forever. I know which offers more 'visible' value to the business...

        1. John Brown (no body) Silver badge

          Re: The real problem is the way bean counters work.

          "Is someone going to train all the users that a doc they create where they paste in an email address can't have a date of more than X, but if it's a different type of doc then it can have Y.

          And each countries DP rules are different, so a multinational would need to figure out how to train users about the country the data belongs to."

          Isn't that already the case where personal data is involved? If the business no longer has a valid business reason the retain the data, then it must be deleted. Each jurisdiction has it's own definitions of personal data and retention times.

    2. Fortycoats

      Re: The real problem is the way bean counters work.

      Oh don't get me started on the "1TB Disk for £75". Bean-counters then wonder why enterprise storage costs loads more. RAID, controllers, snapshots, compression, deduplication, replication, and all them backups, plus the cost of the people keeping it all running smoothly. (you know, the ones who are immediately blamed on the rare occasion when it doesn't).

      And then there's those who won't archive/delete old data from the active database. Do you really want 5-year old data that hasn't been touched in the last 4 years clogging up your expensive All-Flash Array?

  5. Flip

    Verification

    "...data...bin it. Irrevocably and verifiably."

    How do you verify that data does not exist anymore?

    1. Anonymous Coward
      Anonymous Coward

      Re: Verification

      Simples, keep a copy and once every year run a simple script.

      $ grep deleted_data.dat *.*

    2. Ken Hagan Gold badge

      Re: Verification

      The same way that you "verified" that you didn't need it anymore.

  6. Shadow Systems

    I have a cunning plan!

    I store everything on punch cards. I store the cards in a cardboard box. I stack the box like masonry around my house. It acts like insulation as the layers of boxes acretes to an ever deeper depth. I started out on a tiny little pebble out in space that nobody noticed. Astronomers now call my pebble Saturn. I'll be starting a new ring soon. I'll never run out of Outer Space! =-D

    *Runs away cackling in glee*

    1. Anonymous Coward
      Anonymous Coward

      Re: I have a cunning plan!

      So, with the data density of a punched card (about 14MB/M3, if I keyed the numbers into the stack correctly), you have about 10 Yottabytes of data. That is a fairly compulsive-obsessive hoarding habit. Makes my VCR collection of 'Great Open University Kipper Ties' look pathetic.

    2. Daggerchild Silver badge
      Boffin

      Re: I have a cunning plan!

      That reminds me of the storage capacity of empty space:

      There are laser reflectors planted on the moon. Beam your data at it and delete it. When the data eventually returns, beam it back up again without storing it locally.

      1. Anonymous Coward
        Anonymous Coward

        Re: I have a cunning plan!

        Wot, every ~600ms?

  7. SeymourHolz

    “... and throwing it away when keeping it is no longer legitimate or relevant”

    In the context of data, this platitude has been proven to be bunk, a myth, a fool's paradise. You'll be stepping over pounds to pick up pennies. You'll lose money chasing that dream just as surely as you would at the craps table.

    Data appreciates in value faster than the cost of the underlying storage. The only "savings" are associated with some form of de-duplication, whether you are talking at a physical-block or a logical-copy level. In terms of post-dedup'd unique data, the only winning strategy is to keep it and work it. Data appreciates in value over time, as the envelope of tools and metadata that surrounds it grows. Correlation allows data to make data more valuable; it's a domain where 1+1 > 3, it's not like any other asset. The cost of finding the rare data elements that didn't appreciate will exceed any potential cost savings from deleting them (losses that will be compounded by lost-opportunity costs of missed future correlations).

    Beware the siren's song.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2022