back to article UK uni KCL spunks IT budget on 'reputation management' after IT disaster headlines

While front-line techies at King's College London have been slaving away to solve the problems of last October's IT disaster, their embarrassed superiors have been dipping into the IT budget to “reputation manage” the incident. Last October, a one-fault-tolerant RAID array running the entirety of the UK university's IT estate …

  1. ArrZarr Silver badge
    Pint

    > You see that stable door, mate?

    > Yeah.

    > S' open.

    > Maybe we should close it.

    > Nah, horses are all gone now.

    > Oh.

    > Wanna grab a pint?

    > Sure.

    [Exeunt Omnes]

    1. Anonymous Coward
      Anonymous Coward

      Shurely:

      > You see that stable door, mate?

      > Yeah.

      > S' open.

      > Maybe we should close it.

      > Nah, horses are all gone now.

      > Oh. Does anyone know we lost the horses?

      > Yeah, it was all over the news

      > Oh, OK, let's hire someone to wallpaper over the doors and we can then deny we ever had any stables and horses in the first place.

      > Sounds like a plan.

      1. This post has been deleted by its author

  2. Alister
    Facepalm

    Maybe, instead of managing their reputation and depleting the existing IT budget, they should be considering an increase in the IT budget to allow the installation of a robust, tested, verified backup regime?

    Oh no, silly of me, this is the real world we're talking about.

    1. Anonymous Coward
      Anonymous Coward

      I think that was the point of the article.

    2. AMBxx Silver badge

      It's bizarre that there's no off-site backup or anything beyond a single RAID-5 array! Can't be right - need an insider to tell us if there's just been a problem with backup that was ignored.

      1. Anonymous Coward
        Anonymous Coward

        It was a firmware update that borked the arrays and killed all stores. Then the restores failed because they had not been tested for months and the backups had been failing.

      2. Anonymous Coward
        Anonymous Coward

        RAID was never supposed to be a backup solution, it's an availability solution. If you want RAID and backup, well, having a synchronised pair would be a good start, split between two sites if your network can handle it (although it can reduce network load too if everyone is using their local one). Although a real-time synchronised pair won't save you from a ransomware encryption (that will nuke both), you need a proper incremental or grandfather/father/son time based backup for that.

        Although a lot of backup solutions leave a lot to be desired too... I remember once (in a previous life) we had a problem and needed to load a backup from the tape system. Only to find that the tape system had a bug, and it needed an update to fix the bug... The update made it incompatible with the backups already made... Who the hell comes up with these things?!

        1. TRT Silver badge

          There are offsite backups and there are archives and rotations etc. It's just that a lot of the more recent, and therefore useful, ones had either backed up shit or hadn't been tested for their ability to actually restore the digital-estate. Probably. I don't know. Looks that way anyway.

          1. John Smith 19 Gold badge
            FAIL

            "either backed up shit or hadn't been tested for their ability to actually restore "

            Here's the thing.

            Untested backups are not backups.

            They are a lucky charm you stroke for good luck.Management that does not require this to be tested regularly is incompetent to the point of delusional.

            They should work, but bitter experience taught me they don't always.

            Yes it's time consuming to prove what should be a null result (backup restores original data) but imagine how good you feel when you find out it doesn't before you need it?

    3. phuzz Silver badge
      Trollface

      Don't be daft! It's much more important that the management at KCL aren't embarrassed than to actually fix the underlying problem in the first place.

    4. veti Silver badge

      Or maybe they should spend a few grand on Dropbox. Just sayin'.

      It's strange, when outsourcing goes TITSUP, commentards are quick to point the finger. But when an in-house service explodes in far more costly and spectacular fashion, it's "maybe you should spend more money on it".

      Well, maybe they should. But they should also consider the possibility that, just maybe, someone else could do it better.

    5. dajames

      Oh no, silly of me, this is the real world we're talking about.

      No, it's a university ... it's where people go to avoid the real world.

      1. GruntyMcPugh Silver badge

        You've clearly never worked in a University research department. I supported a bunch of PhD Astrophysicists for several years, including some folks that designed and built satellite based X-Ray imaging systems, developments from which went on to improve imaging of humans and lessen their exposure. Real world benefits from space science,.... again.

      2. Anonymous Coward
        Anonymous Coward

        What are Universities For?

        The 'real world' is a garbage filler. In universities, people are employed to imagine and suppose, doing the tasks - clinical trials; peer review; trial and error; iteration - that advance and pass on knowledge to society. And, in many ways, it's only after this that many things in the world exist.

    6. Avatar of They

      Silly you, not like this involves a place of learning, where supposedly clever people work and learn on a daily basis. That prides itself on being filled with clever people who can help teach other people important stuff..

      Just saying.

  3. thesykes

    really?

    The entirety of the IT Estate running on one RAID array?

    I'm not a hardware expert but even I know that that isn't a really good idea.

    1. DaLo
      Headmaster

      Re: really?

      But how many arrays were they running? If they were using a Redundant Array of Independent Disk Arrays the question has to be asked whether the individual Array were also redundant, i.e. was it a Redundant Array of Independent Disk RAIDs or a Redundant Array of Independent Disk non-Redundant Arrays?

      Or is it just a bad case of RAS Syndrome?

      1. Anonymous Coward
        Anonymous Coward

        Re: really?

        It was a firmware update that borked the arrays and killed all stores. Then the restores failed because they had not been tested for months and the backups had been failing.

        1. stephanh

          Re: really?

          They should have put their stuff on Gitlab instead.

          1. Anonymous Coward
            Anonymous Coward

            Re: really?

            "They should have put their stuff on Gitlab instead."

            Absolutely! Everything's better in the cloud ... until it's not.

      2. jamie m

        Re: really?

        One array with everything on with single drive failure resiliency.

        1. Colin Millar

          Re: really?

          Yup - and because that wasn't quite nailbiting enough someone updated the firmware just for the LULZ

    2. TRT Silver badge

      Re: really?

      It wasn't quite everything. But it was a lot. Even took the phones out too, so I heard.

  4. A Non e-mouse Silver badge

    Blame

    The important information to glean here is the exact cause of the failure. Blaming a person, whilst being nice from a PR viewpoint as an easy scapegoat, isn't really a good answer. You have to identify the processes and systems which ultimately resulted in the catastrophic failure.

    1. thesykes

      Re: Blame

      You don't pay consultants to write a report that states that the fault lies with senior management who have repeatedly refused funding for a decent IT estate. They write reports absolving management of all blame, sack a few low ranking techies and say "lessons have been learned and you can trust that we will have policies in place to ensure this doesn't happen again in the future."

      And nothing will change.

      1. Anonymous Coward
        Anonymous Coward

        Re: Blame

        And nothing will change.

        Therein is the BIG problem.

        Until there is a complete removal of the top layer of the IT department management, including P45s and banning from any other job in the university, it will remain a potential disaster waiting to re-emerge at a later date (the leopard does not change his spots).

        1. Anonymous Coward
          Anonymous Coward

          Re: Until there is a complete removal of the top layer of the IT department

          There is of course one layer of staff at universities which is systematically made redundant, and on an ongoing, continual basis. Would anyone like to guess who they are?

        2. Anonymous Coward
          Anonymous Coward

          Re: Blame

          Problem with that is if you remove the top-layer of management, then next layer down becomes the top layer and their brains turn to mush, so you have to remove those too and so on until there's no-one left but the tea boy...

          1. Anonymous Coward
            Anonymous Coward

            Re: Blame

            until there's no-one left but the tea boy...

            Since he is the one working that could be the best outcome.

            You missed the point. If management is not doing its job, managing and making sure that everything works as it should, then why is it there sucking up money that could be better used in improving the systems. From what information that has been released it is obvious that the management did not know what they were doing and should therefore be handed their P45s.

      2. Sooty

        Re: Blame

        My experience of consultants is that they come in, speak to all the techies, write up what the techies have been telling management for years into a report, and submit it.

        Management look at it and go, oh, that's what the techies have been saying, we really should implement that.

        and never is the report heard from again! back to the status quo

        1. Anonymous Coward
          Anonymous Coward

          Re: Blame

          "My experience of consultants is that they come in, speak to all the techies, write up what the techies have been telling management for years into a report, and submit it." --- Sooty.

          As a consultant, I can confirm that is exactly what I do. However, I always credit the Techs in the document (by name if they're happy for me to do so), and I always tell the client they could probably have got that information by talking directly to the techs. Gratifyingly often, however, the management have told me that they think the additional synthesis, collation, analysis and - most importantly - business level summaries and strategies, are worth my fee; and similarly, many Techs tell me that they're glad it's me rather than them getting a grilling in the boardroom.

          I always make sure any significant concerns from the front line* are relayed to management. Depending on the preference of the Technical Staff I will either throw my weight behind theirs or I will front it up as my own concern. And I always make sure that management realise just how reliant they are on the Techs, and that I am really little more than a translator.

          * not "from below"

    2. John Smith 19 Gold badge
      Unhappy

      "The important information to glean here is the exact cause of the failure."

      Sometimes called a "root cause" analysis.

      The SW upgrade was the final event in the chain.

      Break that chain anywhere before that and it would not have happened.

      Examples being

      Why no regular testing of backups? Why no test SAN to check software updates? Why no hot backup system? A single SAN,even with data stripped across multiple hard drives and with multiple PSU's is still a single point failure if the control software is bricked.

  5. frank ly

    Reputation Management

    "... their bosses brought in reputation management business RiskEye, which set about trying to expunge news of the incident from the web.

    After The Register declined to remove its coverage, ..."

    How does that work? Do they invite you to their club for dinner, drinks and a chat or do they send a couple of bruisers to your office to tell you how sad their client is feeling?

    1. Alexander J. Martin

      Re: Reputation Management

      I've heard stories of bruisers being sent over to chat to the editor of a publication in the olden days, though I don't think that happens so much now. RiskEye made a number of phone calls to every department at Situation Publishing except for editorial, aiming to panic financial and sales staff by claiming that an article was wrong and needed to be taken down. Fortunately, the article wasn't wrong and didn't need to be taken down, and we hire very sturdy folk who redirected the chap calling to us in editorial, where we said the article would not be taken down. I've been informed that the guy responsible is no longer with RiskEye too, although the firm didn't explain why.

      1. Chris King
        Facepalm

        Re: Reputation Management

        "I've been informed that the guy responsible is no longer with RiskEye too, although the firm didn't explain why"

        Aww, did they have to hire another reputation management service to deal with that incident ?

    2. Pliny the Whiner

      Re: Reputation Management

      "'After The Register declined to remove its coverage, ...'

      How does that work? Do they invite you to their club for dinner, drinks and a chat or do they send a couple of bruisers to your office to tell you how sad their client is feeling?"

      Nice pair of kneecaps you have there. It'd be a shame if they were broken.

      1. Ashley_Pomeroy

        Re: Reputation Management

        I remember Stuart Campbell writing about something similar happening in the days of Amiga Power, especially given the magazine's uncompromising stance. Basically publishers would threaten to pull advertising and refuse to supply pre-release copies; the other magazines at the time were publishing reviews based on unfinished demos.

        Does KCL advertise with The Register? In this case the college could threaten to dismiss anyone found browsing The Register using a university PC. There's probably a clause in the employment contract about using social media and news websites etc.

        Also I'm suddenly hungry for fried chicken.

        1. Anonymous Coward
          Anonymous Coward

          Re: Reputation Management

          King's respects the opportunity for its staff to engage with online communities to promote the work of the university. King's position is set out in its Social Media Communications Policy:

          http://www.kcl.ac.uk/governancezone/Assets/InformationPolicies/Social-Media-Communications-Policy.pdf

          The King's IT Acceptable Use Policy sets out the acceptable, and unacceptable, use of university IT facilities. The IT facilities are provided for academic and business purposes. Personal use should be incidental and modest, and must never interfere with university work:

          http://www.kcl.ac.uk/governancezone/Assets/InformationPolicies/IT%20Acceptable%20Use%20Policy.pdf

        2. TRT Silver badge

          Re: Also I'm suddenly hungry for fried chicken.

          No... that's KFC. KCL is a completely different bucket of vertebrates.

    3. veti Silver badge

      Re: Reputation Management

      I don't think £1,000 would buy much of a club dinner. Consider how many people they would have to cater for...

      I also don't think it would run to kneecapping, or threats of kneecapping. Or even, for that matter, a single hour of lawyer time.

      For that kind of money, I would expect someone to Google the story, and send off polite emails to everyone they find who's covered it - and that's about it. Even calling it "reputation management" is a stretch.

      1. TRT Silver badge

        Re: £1000 wouldn't run to kneecapping.

        The job goes to the lowest bidder. Or in the 'biting them on the kneecap' game, the shortest bidder.

  6. Anonymous Coward
    Anonymous Coward

    Mountain out of a mole hill

    £1,000 for a bit of rep management is small beer compared to the total IT budget for KCL which was approximately £20m in 2015. Lets also make no mistake, the borking that happened at KCL was down to KCL operations technicians. You could argue there was a lack of management oversight, but fundamentally someone wasn't testing the backups.

    Anonymous because the internets is bullies and I'm hiding behind big brother.

    1. Linker3000

      Re: Mountain out of a mole hill

      >> down to KCL operations technicians. You could argue there was a lack of management oversight, but fundamentally someone wasn't testing the backups.

      No, it was down to the Managers who either did not implement proper procedures or did not oversee any checks and balances to confirm that **working** backups were being performed. That's why you have Managers.

      1. Doctor Syntax Silver badge

        Re: Mountain out of a mole hill

        "That's why you have Managers."

        Managers might disagree. The reason you have managers is ... pay?...bonuses?

        1. JASmith

          Re: Mountain out of a mole hill

          unless of course they are being told it is all working? Do you tell your boss every little wrinkle that would just worry him? I don't, and my guess is the same for many other techies.

          1. Doctor Syntax Silver badge

            Re: Mountain out of a mole hill

            "Do you tell your boss every little wrinkle that would just worry him? I don't"

            1. Backups not working is more than a little wrinkle.

            2. Managers need to be told significant stuff, especially if it's bad news. Managers who discourage being told bad news are very bad managers and wide open to any passing disaster looking for a place to happen.

            1. Phil O'Sophical Silver badge

              Re: Mountain out of a mole hill

              Managers who discourage being told bad news are very bad managers

              True, but technicians who knowingly operate procedures that they know aren't correct just because their manager tells them to are very bad technicians. Whether the fault with the backups was due to technician error, or management orders, it doesn't reflect well on the techs.

              1. Prst. V.Jeltz Silver badge

                Re: Mountain out of a mole hill

                20 million ??? for one year.? and a single raid . blah blah blah?

                How big is this college? how big is the data?

                I bet i know what happened, they are using some ancient (and shit) software "because we cant change now" for some core task , and are paying a million a year for support?

                jeez even that wouldnt make a dent.

                100 IT staff on 50k each? thats only 5 million.

                maybe this year with the spare £10m they could nip down to Maplin and buy a shitload of external harddrives with usb leads on.

                1. Anonymous Coward
                  Anonymous Coward

                  Re: Mountain out of a mole hill

                  £10m at Maplin might get you four HDDs and an LED flashlight thesedays.

                  / And no more green, orange and white vouchers to save up for a discount off your next order

                  // I have a 4 digit Maplin customer account number you know.

                  /// Get of my germanium diodes

                2. Nick Ryan Silver badge

                  Re: Mountain out of a mole hill

                  KCL are often ranked within the top 10 universities in the world (depends on the measure and who's doing the ranking). They have 27,600 students, 10,500 post-grad's (like students but either considerably more demanding or beyond caring/jaded - probably 50/50) and 6,800 staff to provide the support for this lot. Don't t consider that this is a high number of staff as the facilities and estate alone take up a lot of staff, let alone adminsitration across a very wide range of areas, much more than a normal business would have or require.

                  Detailed numbers are available on their "KCL by numbers page".

                  1. Prst. V.Jeltz Silver badge

                    Re: "KCL by numbers page".

                    Interesting page

                    "A £1 billion redevelopment programme is transforming King's estate"

                    Perhaps that includes a server room and some usb memory sticks

                3. Anonymous Coward
                  Anonymous Coward

                  Re: Mountain out of a mole hill

                  On a positive note, there could be good opportunities for King's students to learn about big data:

                  http://www.kcl.ac.uk/bigdata/About-Big-Data-@-Kings.aspx

                  1. TRT Silver badge

                    Re: Mountain out of a mole hill

                    I wonder how many people went to that Big Data thing, given the date was exactly two weeks into the outage? I can picture it now...

                    "What are we doing with Big Data, here at KCL? Well, I would have had a slide at this point, but..."

              2. stephanh

                Re: Mountain out of a mole hill

                As was already discussed in the Gitlab case, test recovery of backups is a big problem. On one hand, it is very expensive; it essentially requires you to have a duplicate of your entire production hardware. On the other hand, if you don't do it, you probably actually don't have backups at all.

                Who prevented the costly test recoveries from happening? Did the engineers push hard enough? Did management listen enough?

              3. Triggerfish

                Re: Mountain out of a mole hill

                Yes but when a manager refuses to make a decision because the buck would now stop with them when an issue is brought up, I find it hard to blame a tech who has a choice between implement something and get fucked for it because you went past your remit, or leave it as is.

              4. Red Bren

                Re: Mountain out of a mole hill

                "Technicians who knowingly operate procedures that they know aren't correct just because their manager tells them to are probably more concerned about feeding and clothing their families, than taking a principled stand on an issue that they have sufficient arse-coverage for"

                FTFY

          2. Anonymous Coward
            Anonymous Coward

            Re: Mountain out of a mole hill

            > Do you tell your boss every little wrinkle that would just worry him? I don't,

            In that case your boss should fire you as you are a lying, arse-covering piece of shite and not worth the salary they are paying you.

            it is YOUR responsibility to report problems with systems you work with your your management, then if something goes wrong its their fault. If you don't and something goes wrong its YOUR FAULT!!!

      2. Anonymous Coward
        Anonymous Coward

        Re: Mountain out of a mole hill

        Someone left. That someone had the backup reports going to their personal email. Those got black holed after leaving, so no-one noticed that they were failing. Now, you would argue that a proper monitoring system should have been in place with regular testing etc etc etc, but it wasn't and everyone knew that. So I think there is a professional responsibility of technicians to tell people that there is a risk and there is a responsibility of managers to respond to the risk. Seems to me that managers didn't know and technicians weren't telling. Fail all around.

        1. TRT Silver badge

          Re: Someone left. That someone had the backup reports going to their personal email.

          Which indicates that their system for staff departure and hand-over wasn't properly risk assessed. Which is a management activity ultimately.

          As is positive verification of DR on a regular basis.

          Not sitting there saying "No news is good news!"

    2. Anonymous Coward
      Anonymous Coward

      Re: Mountain out of a mole hill

      Don't blame the techs, it appears that they were only doing what they were told. It is management that sets the policy and it has to be management that checks that the policy does what it is supposed to do.

    3. Doctor Syntax Silver badge

      Re: Mountain out of a mole hill

      @A/C

      "Lets also make no mistake, the borking that happened at KCL was down to KCL operations technicians. You could argue there was a lack of management oversight, but fundamentally someone wasn't testing the backups."

      Are you by any chance in management?

  7. Anonymous Coward
    Anonymous Coward

    Has it occurred to anyone that the reputation management could have been bought to support a staff member erroneously fingered in public rather than the organisation? Wouldn't it be strange for the IT department to buy reputation management for the whole organisation. Feels like we're not hearing the actual story here yet.

    1. Mr Dogshit
    2. cd

      "Feels like we're not hearing the actual story here yet."

      Not so much a story as a portrait. A person in an influential position, largely incompetent and indifferent, perhaps even angry at KCU for past slights. Also greedy, and concerned about being outed, so very controlling. To the point that no one but their staff were allowed to do preservation backups. And the staff were limited to means understandable by the person, so they couldn't be tricked out of their skim. Now they are using some of the funds they were going to skim from to protect themselves further.

  8. Doctor Syntax Silver badge

    Point of information

    "departments across the university"

    Although in at least one place on its website Kings calls itself a University it is, in fact, still a college of the University of London.

    1. TRT Silver badge

      Re: Point of information

      It was but it was granted its own degree awarding powers. They've got a ceremonial mace to prove it an' evryfin. That makes it both a university in its own right and a college of the University of London.

  9. Dabooka
    FAIL

    The Streisand Effect in action

    Will they never learn, just get on with it and keep schtum.

    And what exactly would £1,000 get you exactly? An intern on the phone for a week pleading with folk? Oh, my mistake it get's you there 'Standard' protection service for 12 months.

    Maybe it was the awesome website which attracted KCL to hire them, either that or the outfit belong's to a member of the SLT's relative

    1. TRT Silver badge

      Re: The Streisand Effect in action

      Oi! RiskEye are OK. Or at least, I've never heard a bad word said about them.

    2. Flywheel

      Re: The Streisand Effect in action

      We use analysts in conjunction with our unique software 24/7/365]

      Bad luck if it's a leap-year then...

  10. Fruit and Nutcase Silver badge
    Coat

    Expert in Reputation Management

    Looking for new challenges.

    https://www.theregister.co.uk/2017/02/01/dido_of_carnage_steps_down_from_talktalk/

  11. creepy gecko

    RiskEye

    I wasn't fully aware of the IT disaster at KCL last October.

    I would like to give my sincere thanks to RiskEye for bringing these events to my attention.

    Well done RiskEye, you're doing a grand job.

  12. Anonymous Coward
    Anonymous Coward

    My uni need not worry ...

    All marking and student progress data outside of the central Oracle DB is held securely within Google Docs spreadsheets, forgetting that free and open-source software has providence free SIS software.

  13. Anonymous Coward
    Anonymous Coward

    I worked for Kings Colleage once

    Two contractors putting together a small piece of software for them, it was to hold their submissions for their grants. Simple web system that managed applications, not the actual research data. Maybe 10 years ago now.

    The project was 3 months but with every meeting pushed to a committee (who can't decide anything) it soon went to 6 months. The IT dept was dismissive of us entirely, ordering that no local databases could be used even in initial development. We had to use the single Oracle cluster and go through the least helpful person to do ANYTHING (even an insert statement).

    We had no access to anything sensitive, we hardly had access to the internet, using the guest wifi.

    Everyone is watching their own backs and will avoid a decision to avoid the responsibility. The easiest route is to form a committee!

    It was my last foray into academia, those folks are more up themselves than marketing and PR people!

    1. Halfmad

      Re: I worked for Kings Colleage once

      That's a fantastic way of having nobody to blame whilst increasing the risk of something happening exponentially. Worked in environments like that, best practices is a pipe dream, you're working day to day hoping nothing needs a fast decision or funding as it'll take months to organise.

  14. EnviableOne

    IMHO, chances are IT pushed for 2 storage arrays for redundancy and Management baulked at the cost an said you can have one. Then Risk Eye were hired by the manager to cover his behind

  15. DeVino
    Headmaster

    Maybe KCL should have looked up what the acronym means ?

    Inexpensive ?

    1. TRT Silver badge

      Re: Maybe KCL should have looked up what the acronym means ?

      Oh! You know, you're going to laugh at this, but... in the meeting... I accidentally used the word "infallible".

  16. Tom Paine
    Pint

    bosses brought in reputation management business RiskEye, which set about trying to expunge news of the incident from the web.

    After The Register declined to remove its coverage, we filed a request to the public university under the Freedom of Information Act

    Good journalism; Ingrams would be proud. Take a virtual pint, and -- if I may be allowed to suggest a new meme: -- *snookerclap

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon