back to article departments are each clinging on to 100 terabytes of legacy data

Some Whitehall departments are saddled with more than 100 terabytes of legacy data, and are wasting time recreating old work at a cost of £500m per year, according to a Cabinet Office report. The Better Information for Better Government report [PDF] said good information governance is critical for effective government. …

  1. AMBxx Silver badge


    That's an awful lot of buzzwords to explain the retention of cat videos.

  2. Dwarf

    I smell marketing BS

    I expect this is a report from a company specialising in handling unstructured data. Of course, any data without metadata is effectively lost, so I'd argue that there isn't a compliance issue as nobody knows that the data is held. Do you really know, or have the ability to read those old DLT-1 tapes, DAT, QIC or other tapes, DVD's or similar from yesterday.

    Secondly, 100Tb isn't a lot of data given that we can freely go into any on-line store and buy 10Tb disks for next to no money and stuff it onto our PC's for our cat pics, data backups, porn stash, DVD library, music library, family photos, VM's or whatever else we want.

    I expect that most of the old data will be old backups, which they will need next time they get hit by a cryptolocker variant, at which point another consultancy company will release a report saying that governments don't take enough care in protecting their data.

  3. Norman Nescio Silver badge

    GCHQ expertise sharing?

    You might think GCHQ might be persuaded to lend its experience in searching through reams of unstructured data to find stuff. It ought to have developed some pretty nifty algorithms by now.

    1. macjules

      Re: GCHQ expertise sharing?

      The day you get one government department exchanging experiences with another is the day that :

      1) Capita and Steria's various multi-billion pound applications actually work.

      2) Theresa May manages a smile that doesn't look like she just feasted on human flesh.

      I love the concept that "Cabinet Office, the National Archives and Government Digital Service intend to seek to drive change across departments and government as a whole while retaining a supportive, collaborative engagement model." I am sure that Home Office, BIS, HMRC and Health will do what they always do: throw the memo in the bin and carry on doing exactly the opposite. Defence tends to pay Capita for an impact statement on what would happen if they read it.

  4. Colin Millar

    Much better solution

    Flush the lot.

    No-one knows what is there and it will cost loadsamoney to piss around "structuring" the data which no-one will ever refer to.

    I liked this bit

    "Access to searchable digital legacy information can also prevent civil servants recreating previous policy ideas that do not work or inventing new solutions that are not actually new, known as reinventing the wheel, it said."

    Nothing in this universe will stop civil servants ministers (FTFY) "recreating policy ideas that do not work" coz they are all convinced it would have worked last time if only everyone had just listened to them. As for re-inventing the wheel - that is the very core of the civil service (motto: "never do anything for the first time")

  5. This post has been deleted by its author

    1. John Smith 19 Gold badge

      Re: Hmm. GCHQ rebranded as the British Backup Company?

      It's not like haven't got copies of most this stuff.

      And a much more palatable BBC to Gauleiter May I think.

  6. Anonymous Coward
    Anonymous Coward

    ...From £500 M /year


  7. Anonymous Coward
    Anonymous Coward

    Sounds like an excuse for one big database.

    Probably stooges working for STASI May.

  8. Eric Olson

    Does this feel like a big number to scare people?

    Cause it feels like a big number to scare people. 100 TB is all the space! Like... uh... 50 hard drives from Amazon. Bought for $69 a piece. But I swear it's a lot!

    C'mon. I know people are programmed from birth to distrust the government (who haven't helped themselves much, regardless of citizenship) and assume the worst... but if they knew what IT policies looked like at "well-run multinationals", they would see things differently. Hell, the small company I'm at has 1TB tables in a MS SQL DB for moderate sized clients. It sure sounds large when you remember ads used to boast about 1GB hard drives, but things have changed since Pentiums ruled the world.

    Sure, I'm sure that this legacy data could mostly be jettisoned, but if it's anything like the US, there are retention policies, rules, or laws that dictate what can be done with it or how accessible it should be. The finance world routinely keeps a five to seven year retention policy for audits and the like, not counting the numerous hard drives and email accounts under legal hold due to pending regulatory, civil, or criminal actions being taken.

    And am I right in thinking £500M/year isn't actually that much? It sounds like a lot, but here in the US, converted to USD, that would be... 0.03% of the annual budget for the entire government, including Social Security (income for old people) and Medicare (health care for old people).

  9. PNGuinn

    Traditional methods are best

    1. Just print the bloody lot out.

    2. Employ a few bods to sort through the data and categorise it into keep / discard.

    3. Employ a few more bods to take and file a photocopy of the latter.

    4. Employ a some more bods to shred the now redundant data.

    5. As a check against loosing anything which might possibly be important employ some specialist consultants to ensure that everything is cross checked before destruction and archived - just in case.

    May I have my peerage now, sir?

    1. J P

      Re: Traditional methods are best

      Am I missing something here, or is it deliberate that you suggest taking copies of the "discard" pile?

      (I think there's also possibly a typo around losing/loosing which could trigger jokes about leaks of critical information, but I can't face working one out right now)

  10. EnviableOne Silver badge

    Jobs for the boys

    "Estimates suggest that wasted effort recreating old work might cost government nearly £500m per year."

    So 2000+ civil servants are employed to come up with old ideas .....

    "Some Whitehall departments are saddled with more than 100 terabytes of legacy data"

    so that's per department, so government wide we are talking Exabytes of data

    No wonder May is trying to get Brexit through, holding this data might breach GDPR

  11. Whitter

    Data Protection Act

    You mean nobody has applied the Data Protection act to all this stuff? All those tricky bits about defining what personal data is collected for and then deleting said data once no longer needed. Who knew or even suspected!

  12. Nate Amsden Silver badge


    How some people see a number like 100TB and try to extrapolate that to some small number of hard drives you can buy from any number of places.

    More likely such data is stored in a dozen or more racks of storage arrays(maybe avg 300 to 450gb enterprise FC or SAS disk sizes) for online data processing.

    1. Anonymous Coward
      Anonymous Coward

      Re: funny

      More likely a vast proportion is an estimate based on tape capacity, with no allowance for success rates of restores from 20yr old DAT(with some backups to cleaning tape), from a pile of cartridges in a box file in the basement, stored next to the big magnets.

    2. Eric Olson

      Re: funny

      You're assuming that it's in some kind of high-availability storage built for concurrency or I/O, rather than chucked onto old department shares slapped together in the early 2000s, or maybe numerous Outlook archives of departed employees that people can't bring themselves to delete... most of which is likely on tape or, as I alluded to, off-the-shelf HDDs, maybe even in ESD bags after being pulled from the ancient desktop they once were part of. Perhaps even a fancy SMB NAS was thrown into the mix a few years ago. And much of it is probably backups ort is replicated/superseded elsewhere, but no one has the time to figure it out.

      That's the thing about the 100 TB figure... it doesn't take a lot of desktops and laptops that were turned in during because of departure, termination, or upgrade to reach it. But with various regulations about data retention, requirements to scrub other types of data before disposal, and just the inertia of government (just like in business), the better assumption is that this is spread across a hundred or more separate storage media, devices, and systems... and the consulting firm probably did the same, grunted out a number, then used a boilerplate conclusion with the subjects changed to match the industry.

      You show me a company of more than 20 people that's been around for more than a decade dealing with data, even just emails and a website, and I'll show you where to find the TBs of non-operational or archived data that someone(s) can't let go.

      Of course, it is possible that some of these departments do have it on modern storage solutions... but if that was the case, it's likely they aren't suffering from the same issues that the consultant identified in their summary. It's probably reasonably searchable, has sufficient redundancy, and may even have coherent archiving and deletion policies. Then again, I've yet to work for any company that can do this across all levels.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like