back to article 'Major incident' at Capita data centre: Multiple services still knackered

A major outage at a Capita data centre has knocked out multiple services for customers – including a number of councils' online services – for the last 36 hours. Some of the sites affected include the NHS Business Services Authority, which apologised on its website for the continuing disruption and said it hoped its systems …

Page:

  1. wolfetone Silver badge
    Coat

    Captia has crapped out, now to be know officially as Crapita.

    1. Loyal Commenter Silver badge

      You're a couple of decades late to that party.

      I had the misfortune of working with them some fifteen years ago, and we called them that back then.

    2. Steve K

      Private Eye always has

      Already is - see Private Eye magazine passim!

      1. Derezed

        Re: Private Eye always has

        ad nauseam

        1. Mike Pellatt

          Re: Private Eye always has

          Your "Private Eye ad nauseam" is my "Private Eye told the world all about <x>, repeatedly, years before anyone else woke up to it"

    3. Anonymous Coward
      Anonymous Coward

      I know someone who works for them, they have had no internals e-mails or on-line tools since the outage as well. Also their own staff call it CRAPITA.

    4. Anonymous Coward
      Anonymous Coward

      "The remainder of services are now being robustly tested"

      Translation "Sorry guys, but we pay such low wages that we get the lowest grade staff and they couldn't be bothered to test the generators. Rest assured that those responsible are now busy playing a datacentre sized game of switch it on and pray it comes up..."

      1. macjules

        Capita: "Our service outage is a minor impediment that we occasionally encounter on the road to providing better services to our clients"

        Translation: "Hey. its not OUR fault! We use the same maintenance company as British Airways."

    5. Oh Homer
      Trollface

      "Robustly tested"

      I hope they "robustly test" the cobwebs with a firm brush first.

  2. Lee D Silver badge

    Stop relying on one datacenter to be up.

    This is WHY Windows Server and lots of other OS have HA functionality.

    Hell, it's not even that hard to enable. Or just provide a secondary system somewhere else that does the same even if you don't have fancy connections between them.

    If your platform is not virtualised, why not?

    If your platform is virtualised, turn on the HA options so that the VM replica in another data center just starts up and becomes the primary and your domain names, etc. resolve to all IPs that can offer the services.

    I still don't get why ANY ONE FAILURE (one datacentre, one computer, etc.) is still a news item nowadays. It shouldn't be happening.

    Even if you deploy on Amazon Cloud or something, PUT THINGS ELSEWHERE TOO. It's not hard.

    1. FrankAlphaXII

      It seems that Crapita don't believe in Business Continuity otherwise an outage at one datacenter wouldn't take down part of the NHS and a number of local governments. As you stated, there should be no such thing as a single point of failure in 2017. That doesn't bode well for UK emergency preparedness at the most important level. If something as simple as internet communications get taken down that easily what happens when more than one of their datacenters fails and can't/won't be restored for weeks or months?

      I work in Emergency Management for a government agency at a local level plus I develop BC/DR plans for SMBs on the side so I see this kind of shit out of government outsourcing contractors all the time, beancounters that run businesses like Crapita (looking at you Serco, Egis, and Leidos) don't get really simple preparedness and mitigation concepts and if they do understand them, they'll be first to balk at the price tag associated with them. Until they've had their "effeciencies" blow up in their faces. Thing is, in this day and age fault tolerance and providing an emergency level of service for data when something does happen isn't hard or expensive and it's really unforgivable that a supposedly first in class outsourcing contractor can't provide it's expected level of service because their infrastructure's shit and their planning's worse.

      1. Alan Brown Silver badge

        "beancounters that run businesses like Crapita (looking at you Serco, Egis, and Leidos) don't get really simple preparedness and mitigation concepts"

        Then you ensure that the SLAs they sell you hold water and have penalty clauses.

    2. GruntyMcPugh Silver badge

      "Stop relying on one datacenter to be up."

      Indeed, a couple of years ago I did an audit at a well known bank, each of it's datacentres, which were almost identical. For some reason the door on the gents in one had a glass panel, and the other didn't, and the vending machines in the break area were further apart in one,.... but the IT equipment, mirrored exactly.

      1. Anonymous Coward
        Anonymous Coward

        > a couple of years ago I did an audit at a well known bank, each of it's datacentres, which were almost identical.

        N data centres costs N times as much.

        As an outsource provider, why would you do this when your liability to your customers is limited to giving them a free month's rental?

        1. TheVogon

          "N data centres costs N times as much."

          No it costs even more than that for full resilience. You need all the replication licenses for arrays, software, etc, the testing, the design, the recovery plans, the fast low latency interconnects between DCs, etc, etc.

        2. CrazyOldCatMan Silver badge

          N data centres costs N times as much.

          It's actually a bit more - N data centres cost (N+extra kit to do synch) times as much..

        3. GruntyMcPugh Silver badge

          Well, it's rather down to the procurement process, and to make sure there's a real financial disincentive wrt downtime. It wouldn't surprise me to learn it would just be paid in service credits however, the relationship between Capita and our Govt is less than healthy.

    3. Anonymous Coward Silver badge
      FAIL

      But infrastructure redundancy eats into profit margins, so why would they?

    4. Anonymous Coward
      Anonymous Coward

      "Stop relying on one datacenter to be up."

      Having 2 DCs and designing for no single point of failure costs ~ 3 times the money. This is government IT we are talking about. The DR plan is probably to build a new DC!

      "If your platform is not virtualised, why not?"

      Because it's such a large system that it uses the resources of complete physical servers is usually the answer in these type of systems.

      1. robidy

        > This is government IT we are talking about.

        No this is crapita's over priced, under resourced....over promised and under delivered service in operation.

    5. Halfmad

      Thing is with these companies that although they may include agreeing to have failover sites etc when sh!t happens and those don't work they just say "hey sorry, won't happen until the next time it happens" and as the NHS is f*cking awful at contract law they have no monetary clause to hammer them with.

      Seen this so often in the past 10 years.

      1. CustardGannet
        Devil

        "Stop relying on one datacenter to be up."

        They would probably listen to your sound advice, if you put it on a sheet of 'Accenture' headed paper and charged them a 7-figure consultancy fee.

        1. handleoclast
          FAIL

          Re:consultancy fee

          My county council pissed away 7 figures to Price Watercloset Coopers to come up with ways of saving money.

          My suggestions:

          1) Don't piss away 7 figures to PwC

          2) Hire staff capable of coming up with suggestions themselves (suggestions other than asking PwC what to do).

          Ooooh, where's the IT angle? My county council uses Crapita for their payment systems. Who'd have expected that?

    6. Tom Paine

      It's not hard, but...

      ...it does cost money. Twice the money, in fact, plus the design overhead.

      1. John Brown (no body) Silver badge

        Re: It's not hard, but...

        "Twice the money, in fact, plus the design overhead."

        Surely Crapita have multiple data centres so spreading and mirroring the resources should be part of the standard service. Except when it affects the bigwigs bonuses.

        1. Linker3000

          Re: XML is so 1990's

          My rule for any situation that makes me want to start a sentence with "You'd think that...." is to STOP and take a reality check.

      2. Anonymous Coward
        Anonymous Coward

        Re: It's not hard, but...

        Especially when they're handcuffed to internal suppliers that are bleeding money. (SH)ITES have to claw it back somehow so Uncle Andy makes everyone play nice.

      3. SB37

        Re: It's not hard, but...

        It costs more than double the money. If you want true mirroring for disk storage you'll need four copies of the data - 1 at your source and 3 at your remote datacentre.

    7. Rob D.

      It's not hard but ...

      Actually it is hard because it isn't that simple. In the real world, most of the problems around business continuity come up because someone has tried to turn a tricky problem requiring attention to details in to something that has a simple answer which is easier to understand and by definition is cheaper.

      Commonly this sounds something like, "We paid to virtualise everything so we can just move it if we have a disaster to the other data centre. Easy - please explain why we have to pay for anything more?"

      Reality bites early in the requirement for budget up front for the significant additional planning, design, implementation, testing, training and infrastructure costs. The details house many devils here. Throw in time required for testing, operational training, operational proving in production, and by now the System Integrator is wishing you'd never shown up to explain what is missing while they work out how can they get past User Acceptance without anyone realising the business continuity isn't really there.

  3. Anonymous Coward
    Anonymous Coward

    Probably got their own staff to install the back up generators

    And then to test them. What could possibly go wrong.

    1. m0rt

      Re: Probably got their own staff to install the back up generators

      Bets on diesel in the generators being a couple of years old? The fact they are now having an issue with parts suggests that the sudden loss of power cause some great failures.

      Today we shall mostely be Capitalising on the Capitulations of the PITA that is Crapita.

      1. Anonymous South African Coward Bronze badge

        Re: Probably got their own staff to install the back up generators

        Not taking any bets, but regular testing of diesel generators need to be done.

        Heck, just kick out the mains CB and let the genny take over (for 30 minutes each week), this way you can weed out any old and dodgy UPS'es as well.

        1. Roger Varley

          Re: Probably got their own staff to install the back up generators

          "Not taking any bets, but regular testing of diesel generators need to be done."

          But they did. They sub-contracted the testing to Atos who declared them "fit to work" ......

          1. handleoclast
            Pint

            Re: Atos

            ROFL.

            Too true.

            And a week after Atos declared them fit to work, they died.

            Have a pint for making me laugh.

          2. PNGuinn
            Mushroom

            Re: Probably got their own staff to install the back up generators

            The gennys were tested weekly but noone thought to buy any fuel and they ran dry aftter 3 mins?

            They did buy fuel but it was petrol / bunker oil because that was cheaper?

            They went green and bought a load of cooking oil cheap?

            No - Crapita aren't even **that** capable.

            >> This might have helped.

        2. Anonymous Coward
          Anonymous Coward

          Re: Probably got their own staff to install the back up generators

          "Heck, just kick out the mains CB and let the genny take over (for 30 minutes each week)" - but don't, as has happened else where, do this many, many times and forget to refill the tanks once in a while.

        3. Anonymous Coward
          Anonymous Coward

          Re: Probably got their own staff to install the back up generators

          Ours are tested by the power failures hitting some weeks apart, lately.... just last time our small lab datacenter was kept alive by the UPS and its generator, the main one failed. Later they discovered scheduled maintenance was no longer active. Still, after asking several time, I don't know who's in charge of re-filling the diesel tank (I'm not authorize to perform it myself, you know, the dangers of handling dangerous chemicals and operating on machines I was not trained for...)

          1. katrinab Silver badge

            Re: Probably got their own staff to install the back up generators

            Isn't the refilling done by the tanker driver who delivers the stuff?

            1. Stoneshop

              Re: Probably got their own staff to install the back up generators

              Isn't the refilling done by the tanker driver who delivers the stuff?

              As the Germans say 'Jein' (contraction of yes and no): first someone[0], having been notified by Facilities that the tank is running low, has to call the supplier for delivery, then with the tanker arriving someone[1] has to unlock[2] the gate/hatch/trap door to the tank neck.

              [0] from Finance, or Contract Manglement[3]

              [1] from Security[3]

              [2] you don't really want someone peeing down the filler neck, or dropping sand or sugar in.

              [3] in extremely enlightened cases these responsibilities will have been delegated to Facilities as well.

            2. Alan Brown Silver badge

              Re: Probably got their own staff to install the back up generators

              Which is fine until someone shuts off the feed to one of the tanks (vandalism) and said driver pumps N amount of fuel because that's what he's expecting to pump instead of looking at the fill gauges and stopping when they say "stop"

              Cue multiple thousand litres of diesel not being in the tanks, but instead in the stormwater system and lots of people asking "what's that smell?"

        4. Stoneshop
          Flame

          Re: Probably got their own staff to install the back up generators

          Heck, just kick out the mains CB and let the genny take over (for 30 minutes each week)

          Ingredients: one power grid with regular shortish (30 minutes or less) outages, one computer room floor with various systems, one UPS powering the entire floor running at ~15% capacity, one diesel genny. Due to the regular power dips, we were quite sure the UPS and diesel were functioning as intended; fuel was replenished as needed. Then came the day that the power consumption of the computer room doubled due to an invasion of about 45 racks full of gear. And then came the next power dip. Which made the UPS (powering the computer room; the generator was hooked up so that it basically kept the batteries charged) suddenly work quite a bit harder. And longer; for a number of reasons. Which caused the temperature in the UPS room rise quite a bit more than previously. Environmental monitoring went yellow, and several pagers went off, and Facilities managed to keep the UPS from shutting down through the judicious use of fans scrounged from a number of offices.

          Moral of this story: cooling is important too, not just for the computer room, but also for the UPS room.

        5. This post has been deleted by its author

      2. Anonymous Coward
        Anonymous Coward

        Re: Probably got their own staff to install the back up generators

        "Bets on diesel in the generators being a couple of years old? "

        The staff probably nicked it all to fill their cars!

      3. Doctor Syntax Silver badge

        Re: Probably got their own staff to install the back up generators

        "Bets on diesel in the generators being a couple of years old?"

        Or the wrong sort of diesel.

      4. CrazyOldCatMan Silver badge

        Re: Probably got their own staff to install the back up generators

        Bets on diesel in the generators being a couple of years old?

        Or, in a very old situation, diesel in the under-carpark tank has seeped away into the subsoil because of a flaw in the tank..

        Which was fun when the generator did kick in for real but only ran for ~20 minutes before exhausting their local tank..

        No-one was checking the levels of diesel in the bigger tank. Oops.

  4. Chris G

    Just wondering

    If Wannacrypt has crapped Crapita

    1. Anonymous Coward
      Anonymous Coward

      Re: Just wondering

      You wish !

  5. batfastad

    Well!

    Well you don't think that the money their customers (NHS Trusts, Councils etc) pay actually gets spent properly and proportionally on the infrastructure backing their services do you?!

    Look it's contract renewal time... lets take the money and sweat the assets of our existing platform for a few more years. After all, we've got executive pay reviews coming up soon.

    The fact that a DC has gone down and that has taken out production service is unforgiveable in this day and age.

  6. adam payne

    Single point of failure again, well done.

    1. Terry 6 Silver badge

      All the eggs

      in one slimy rotten basket

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like