back to article Now that's what we're Tolkien about: You need one storage system to rule them all and in the darkness bind them

One of the tech industry’s longest running quests is developing the notion of a single source of truth within organizations. That is, no matter who or where you are within a business, when it comes to running the numbers or making a decision, your applications are accessing the same information as everybody else internally. No …

COMMENTS

This topic is closed for new posts.
  1. Pascal Monett Silver badge

    A single version of the truth

    As usual, the hypothetical idea has good ground but, in a world where encryption attacks are run-of-the-mill, companies that are big enough to have big storage and backup solutions are also going to be targets for data encryption.

    So a single version of truth means a single target to encrypt to cripple the company and potentially extort millions.

    How does this pie-in-the-sky thinking cope with that threat ?

  2. Ben Tasker

    The other approach is to access centralized primary source data through an abstraction layer that presents the underlying individual physical silos as a single virtual repository,

    The first thing that came to mind reading that was the thought of someone creating a git repository onto top of Gluster (to allow easy scaling), and that's my weekend ruined.

    Good way to lose it all in one go though :)

  3. Lorribot

    Data Sprawl is caused by the way projects are done in isolation and the need for their own copy of the data, also systems may need to be able to work with a version data or work with data at specific time point which is difficult to accommodate but can be done if it is thought about in the design rather then done as an afterthought.

    Most non-IT people (read the Business) have difficulty understanding that you don't have to have all the data in one database and try and fail to do this merging which just creates more problems latter when you sell of business units or other wise need to manage data, cost massive amounts of money and can fail and scare them for life. You need one copy of each piece of data not one place to put it all which a massive difference and is much more achievable and more manageable by IT for backups, restores and data abstraction.

  4. GreyWolf

    I've watched that crash and burn.

    In a building society long, long ago, and far, far away, the Movers and Shakers wanted to expand to be a full portfolio business, as big as they could be without becoming a bank (and all the regulation that would mean).

    So they looked at the idea that they had a captive audience to sell to, namely all the people who had mortgages with them. All those people need insurance for their houses, cars, and holidays, they said. Probably medical insurance, broadband, electricity, home security. And there was much licking of lips in the directors' corridor.

    And soon, in that land, there came a great cry, saying "How the hell do we talk to these people to sell them more stuff, yet not piss them off by sending the same flyer 6 times, incidentally that wastes money?"

    Thus was born the idea of a Customer Engagement System (very popular in those days), which would theoretically connection all the accounts of a single person together, so the sales droids could see exactly which of the Society's products a particular individual had, and which were missing and therefore to be sold to the poor mug.

    And they set to work, to design this beast. For beast it was. What's the meaning of an address in the mortgage system? Is it a mortgaged building (maybe not)? What is it in the insurance system? Is it an insured object (maybe not)? Why is the insured value different from the mortgage value? Are those two things the same for the same customer? Oh, and by the way, how do you know it's the same building, the same John or Jane Smith? The existing systems contained NONE of the data necessary to connect the concepts together.

    The Directors threw money, hardware and software at this.Yeah verily, it availed them nothing.

    That Building Society, once one of the best, is now no more than a brand name in somebody else's portfolio.

  5. cschneid

    not so much counterpoint as supplemental

    "A single version of the truth for your business is a lofty yet essential goal to maximize business opportunity." This is the author's conclusion, with which it is difficult to disagree. Some of what precedes this is, however, at odds with history as I experienced it.

    Many organizations started out with one single version of the truth in a centralized database. Then came the 1980s and relatively quickly many people had hitherto unheard of computing power on their respective desktops and the ignorance to equate capability with ability.

    Much has been made of the "PC Revolution" and the empowerment of the end user whilst slathering pejoratives on centralized IT. Suffice it to say that organizational culture kept these two essential parts of the whole at loggerheads.

    The move away from a single source of truth was due, not to caution, but to the perceived neglect from central IT. The end users needed to perform what-if analysis and central IT was not forthcoming with applications to do that. Enter Lotus 1-2-3, dBase, et. al. and what came to be known as "shadow IT."

    There was no source code management, there was no test version of the "database"; these were not IT people and they knew not of these things. Ignorance can be remedied, but no one saw fit to do so.

    And just why was central IT not providing a single application to access the single source of data? Management prioritized those requests far enough down the queue that they were never addressed.

    Again, I don't disagree with the conclusion that "[...] it's understandable to have secondary data sprinkled everywhere. It's also a smart move to unify it into a single source of truth."

    And again, I think some things are missing from the provided two routes to the truth: data administration and governance. It is essential that whomever is accessing the one true source of data understand what it is they are getting. If there's a "current status" column, as of when is it current? Also, GDPR, PII, HIPAA, SOX, et. al.

  6. Phil Bennett

    Bloody difficult to do

    Consider a simple business object like an order form. You'd think it would be straightforward to have a canonical order form, but you very quickly land in xkcd standards country as each system, grown over decades of mergers and development, has different requirements. Some need line by line. Some need the ability to mark partially complete. Some need different currencies supported. Some need suborders, backorders, etc.

    You can build up to a single source of truth field by field if all the applications can consult whatever you are using to hold the golden data, but in a large organisation you might have many, many different platforms and storage systems to work with, from mainframe to as/400 to SQL or noSQL databases to big data platforms to object storage and more.

    Not easy, unless you're essentially a greenfield site.

    1. The Pi Man

      Re: Bloody difficult to do

      Not easy, unless you're essentially a greenfield site

      This. For everyone else it’s an impossible panacea.

  7. revilo

    nice title

    The title of the article is genius (as many in the register). The article could have been condensed a bit more. It boils down to

    "in IT, never bet on one horse alone", especially if it involves burning down the bridges to the past. And the allegory with the "ring which binds them all" is something to keep in mind.

  8. Neil Barnes Silver badge

    A single version of the truth?

    That'll make it handy when the 'we've encrypted all your data' boys come knocking...

  9. Twanky
    FAIL

    Single copy...

    Yeah, I know the article is not advocating not having backup copies but I thought this anecdote would be worth telling:

    About 10 years ago the company I worked for in central London had a standby 'business continuity' facility well outside the city. Servers running 24x7 with near realtime replica copies of data. Also desk space for an essential core of workers equipped with PCs, phones etc etc.

    Our new CIO liked the idea of 'cloud'. He never gave any decent reasons to the IT folk, but he must have convinced the board because he was able to spend money on the project. So a few new boxes appeared into which our existing data was replicated and in the background it synched to a number of cloudy data stores. The existing applications were then updated to read and modify the data in the new shiny storage. After a relatively short period of dual running an instruction came down from on high to cease the data replication to the data stores in the business continuity centre in preparation for shutting it down completely.

    After a month or so, the fairly new shiny boxes stopped working properly and new ones were rapidly delivered. However, the guy who had set it up had forgotten to take a backup of the encryption key and they couldn't retrieve our data from the cloud stores to the new boxes. No matter: we'll use our local backups - except they had not been working properly either... In other words, a major cock-up.

    You can tell where this is going, of course: The IT bod (a very good friend of mine) who had been told to stop the replication had found something more useful to do with his time and had omitted to carry out the instruction. We got up-to-date data back from the 'clandestine' replica copy.

    My mate didn't get congratulated though... he got a telling off. A year or so later he got made redundant. Bastards.

  10. Anonymous Coward
    Anonymous Coward

    Runaway un-normalised data

    What about those poor buggers that work for major companies that treat every IT program as a project in isolation and never consider joining them back up?

    Or Different naming conventions not only between systems, but even within the same system?

    Worse yet most of the critical regulatory reporting is done off a source not merely known to have errors, but be actively wrong. Or where the equipment in On the books don’t agree with the asset register - worst examples being equipment decommissioned still featuring in one but not the other.

    I’m sold by the single source concept, but totally at a loss to get our manglement to see their folly. Tempted to invite an external audit of it. Scale of data teams is also interesting. Crossrail had 90 staff for one [large] project. We run a lot bigger asset base and have barely 9 staff. Go figure.

    Anonymous for job security.

  11. Nick Ryan Silver badge
    Unhappy

    Office 365

    ...and then along comes Microsoft Office 365 ensuring that with it is near impossible to stop the creation of even more duplicated copies of data in unmanageable data silos.

    If there was granular control, or just equivalance to allow data files to be stored in a local or a fixed and controlled location, then it would be OK. But currently it almost enforces the creation of many copies of siloed data.

  12. potatohead

    It's much worse than suggested

    Data is wrong, and gets fixed (or more usually the missed upload sits in some failure queue for a few weeks before someone finally realises and uploads it).

    In a perfect world you'd maintain the old and new versions of the data so that you can re-run reports and get the same results. If the data keeps changing you'll never understand where differences come from. It's the equivalent of 'value date' in accounting systems, where you want to be able to separate when a report is run for, from what set of corrections to include (show me the year end as of the year end vs the year end with the corrections we've subsequently applied).

  13. GXH

    Another approach . .

    One could also try creating a "data warehouse."

    Not a new idea. Also not cheap, not realtime (usually updated nightly via batch), but, depending on situation, can offer big benefits with little disruption of existing systems.

    Look it up.

This topic is closed for new posts.