back to article OVH services still not fully restored as boss rates ongoing recovery efforts a 'real nightmare'

OVH is yet to bring all customers affected by its Strasbourg data center fire back online – and the French cloud operator's CFO has described ongoing restoration efforts as a "real nightmare." The fire took place on March 10 and destroyed the SBG2 hall of the Strasbourg data center, damaged SBG1 so badly it won't be revived, …

  1. IGotOut Silver badge

    DR plans?

    This is a fantastic lesson on how not to run a data centre company.

    Still, I'm sure someone at the bottom will get the sack for not putting the lid on the pen correctly, or something like that.

    1. John Sturdy

      Re: DR plans?

      On the other hand, they seem to be fairly frank in keeping people informed of their progress, saying it's a "real nightmare" rather than corporate positive messaging. Still, not as good as better risk reduction in the first place.

      1. Anonymous Coward
        Anonymous Coward

        Re: DR plans?

        "On the other hand, they seem to be fairly frank in keeping people informed of their progress, saying it's a "real nightmare" rather than corporate positive messaging. Still, not as good as better risk reduction in the first place."

        You're apparently not reading the french press it seems.

        Where Police and Fire Brigade have explained the nature of the shipping containers holding part of the DC didn't help, neither, obviously, the lack of Innergen or similar system, neither the wooden board nature of some "DC" floors.

        This was not translated into english.

        PS: one non critical system of an organization I work for is impacted. Nothing restored. Web site still down and people don't even know if it will be restored. We're now going to find another provider.

    2. Peter-Waterman1

      Re: DR plans?

      If you don't have good DR plans in the Cloud then you have no business in IT.

      The Cloud makes things so simple, there is no excuse. Want a load balancer to run your load across two availability zones, yes, click this box, done. Want to replicate those backups to a different region, tick this box, done. Want to replicate that VM to another region, tick this box, done. There are zero excuses when compared to on-prem where you have to make massive investments to run active-active data centres, along with specialist knowledge required to run them.

      Of course, not all clouds are equal, not all clouds have Availability Zones, or some only have them in a few regions - I guess that's what you pay for and go with a lower-cost provider at your peril. I guess in the case of OVH, customers got burned here..

      1. Pascal Monett Silver badge
        Stop

        Re: customers got burned

        Only the customers that didn't check the appropriate box.

        OVH has a DR plan, there are a number of other datacenters over Europe. Anyone who had included a DR zone is not having any problem.

        Of course, it's the many, many OVH customers who looked at the price of the option and decided it wasn't worth it who are now crying in the corner.

        No sympathy from me.

        1. tip pc Silver badge

          Re: customers got burned

          Why are OVH not running those loads in other DC's while they fix the burn't facility?

          are they not shelling out outage fees for the downtime?

          are they making there customers go through pain to ensure the value & differential between single and multi homed is legitimate?

          1. Yet Another Anonymous coward Silver badge

            Re: customers got burned

            Because it's a business decision by the customer.

            They bought the cheapest service because it wasn't worth it to them to pay more.

            Arguing that multi-site DR should be included in all plans is like saying only mechanics that offer free courtesy cars should be allowed. Or in this case, only spambots that make enough revenue that they can afford 5-9s uptime should be allowed.

            1. Doctor Syntax Silver badge

              Re: customers got burned

              "They bought the cheapest service because it wasn't worth it to them to pay more."

              Or thought it wasn't.

            2. tip pc Silver badge

              Re: customers got burned

              I’m not suggesting DR should be included in all plans, just wondering why OVH are continuing to pay outage fees to customers when they could be up and running in a different dc.

              Interesting comment that some customers DR plans may involve a separate provider, getting outage refunds from OVH while not paying for DR options st their primary or DR solutions actually looks like they cheaper way to go, as well as the most diverse.

              If I was OVH I’d want those customers up and running ASAP so they don’t go elsewhere and stop haemorrhaging outage fees, I can only think the only reason Ovh are not running those loads in other dc’s is because they don’t want to undermine their DR offering.

              Surely 4 days outage on a non DR product should be the maximum before it’s spun up in a different DC.

              I’d be off to Amazon by now.

        2. katrinab Silver badge
          Meh

          Re: customers got burned

          And of course, some of the customers who didn't tick that box didn't tick it because their DR plan involves failing over to a different hosting provider, which is probably a very sensible idea.

          But yes, others just looked at the € on the bill.

        3. Anonymous Coward
          Anonymous Coward

          Re: customers got burned

          "Only the customers that didn't check the appropriate box.

          OVH has a DR plan, there are a number of other datacenters over Europe. Anyone who had included a DR zone is not having any problem.

          Of course, it's the many, many OVH customers who looked at the price of the option and decided it wasn't worth it who are now crying in the corner.

          No sympathy from me."

          Even for Pro customers, a fire propagating like it did, is no excuse.

          If it were for a plane crash, then yes, I'd agree. But plane crashes on DC is quite unusual, even if it happened (09/11 for example).

          You base your resiliency on a highly unprobable risk, disasters, tsunami, those things; not on some shit company having removed any risk-removing measure that could cost any money like OVH did.

          OVH was basically the risk, not the event itself.

          Different thoughts for resiliency.

          1. Anonymous Coward
            Anonymous Coward

            Re: customers got burned

            "Even for Pro customers, a fire propagating like it did, is no excuse.

            If it were for a plane crash, then yes, I'd agree. But plane crashes on DC is quite unusual, even if it happened (09/11 for example).

            You base your resiliency on a highly unprobable risk, disasters, tsunami, those things; not on some shit company having removed any risk-removing measure that could cost any money like OVH did."

            Riiiiight, so your resiliency relies on the cloud hosting company doing your resiliency planning for you using services you don't pay for to mitigate the effects of an improbable event, oooh let's say a fire in a DC, happening?

            Just like people who actually have a clue don't.

      2. Potemkine! Silver badge

        Re: DR plans?

        The Cloud makes things so simple, there is no excuse. Want a load balancer to run your load across two availability zones, yes, click this box, done. Want to replicate those backups to a different region, tick this box, done. Want to replicate that VM to another region, tick this box, done

        If you pay for this.

        If you want the cheapest service, you get what you pay for, that is the minimalistic one.

    3. heyrick Silver badge

      Re: DR plans?

      "This is a fantastic lesson on how not to run a data centre company."

      They suffer so everybody else can learn.

      I wonder how many other providers have been quietly changing things behind the scenes as a result of this?

    4. tin 2

      Re: DR plans?

      I think quite the opposite.

      Yes they've perhaps made some bad choices, and when they discovered and decided to move on from those bad choices, they didn't move quick enough to remove anything that still leans on the bad choices (the well documented power inadequacies, the "DC" made out of storage containers)

      However if I have a service that is basically one set of stuff on one server, I always run the risk that if it breaks, gets nicked, goes on fire, or a plane crashes into the building, it's gone. 100% gone. Not that it might come back or is backed up somewhere by someone who's not me, or will get restored in good time. That it's gone.

      That OVH is doing all this crazy reclaiming/cleaning/rehousing is - in most cases I understand - beyond what they're on the peg to do, and that's good for those that bought a service that might evaporate into thin air, crossed their fingers it wouldn't when they really ought not to have, and then their thing did indeed evaporate into thin air.

      That there's people with services - critical or not - that they've not been able to yet rebuild elsewhere is striking. Not even a backup?! In fairness I have a service like this, and should it go pop it will be proper inconvenient, but I will cry to myself not to the provider.

      That there's a high %age chance their data at least will be available once again to do something with seems to me to be very good service. I could easily see something like 1&1 or HostEurope going "ahh well there's that lot gone, sorry about that" and inviting you to start afresh with perhaps a few days of contractual service credit at best.

  2. Mike 137 Silver badge

    Assessment of risk

    DR plans are (or should be) based on prioritised risks. But, as normally conducted, assessment of risk is a still a highly subjective process. They probably thought the fire was very unlikely.

    The currently common assumption that minimum likelihood times maximum consequence is equivalent to maximum likelihood times minimum consequence is fundamentally flawed. In reality an extreme of either parameter should be considered as overriding, regardless of the level of the other parameter. So, provided it's realistically possible, a total FUBAR consequence should be controlled independent of its likelihood because you don't want to be wiped out, and a massively frequent event with even trivial consequences should be controlled because otherwise you'll spend all your time responding to it. Contrary to the model suggested by the infamous risk matrix, likelihood and consequence do not necessarily interact linearly.

    1. Neil Barnes Silver badge

      Re: Assessment of risk

      No, and yet the whole of project management still teaches that horrendous completely subjective risk matrix.

      Have you ever seen anyone sit down and look at that matrix, and instead of 'low, medium, high' actually talk to an actuary and plug some real probabilities in? Do they talk to the financial officers and get an accurate cost to recover from a particular event? I suspect not (and back in the day when I was a PM I was as guilty as the rest: if it's got some green bits and some yellow bits it's probably gonna be ok).

      1. Anonymous Coward
        Anonymous Coward

        Re: Assessment of risk

        I am currently a practicing PM I can be accused of not including a global pandemic in the risk log of my last previous project but then again i don't include being hit by a meteorite either (I did work with someone who included that as his initial risk in every project)

        My project was within a social care department so was heavily impacted by the Covid outbreak. We had to pivot then pivot again as resources were re-assigned to urgent Covid related developments, managing PPE provision, recruiting and utilizing volunteers etc. Then the council had to react to initially support our care providers on more favorable payment terms, which meant that I has to implement new functionality earlier than planned, to ensure that providers continued to receive the income they needed to stay afloat, then pivot yet again when government support was removed and we had to return to payment on services delivered.

        The infrastructure team scaled up over a few days from supporting 300 ish concurrent remote workers to supporting many thousands, this included deploying every available laptop to allow home working for people with desktop PC's and then developing a logistics service to rebuild and deploy desktop systems when supplies of additional laptops became unavailable. Outstanding infrastructure project like the electronic post room were completed within weeks rather than months allowing more staff to work at home. The core infrastructure continued to cope well with a mix of on prem servers (with DR split across physically disparate DC's) and cloud based services with full DR.

        During this time the main office became a warehouse and distribution centre for PPE for our care workers, CCG's care homes etc with the council sourcing PPE directly and via government supplies.

        To put contingency plans in place for things like this at the project level is a complete waste of time, completely appropriately all resources which could be used to support the Covid response were taken up and used and every project the council was running was either paused, pivoted to support the Covid response, or limped along for a few months with minimal resources. All operational resources were fully occupied in either direct service delivery or supporting our care providers to ensure that our vulnerable citizens continued to receive care when out service providers were experiencing high levels of sickness absence. I was very proud to have been able to contribute to the response.

        1. iron Silver badge
          Thumb Up

          Re: Assessment of risk

          I left a social care provider that employes thousands of care workers across Scotland about 4 months before lockdown. I'm glad I did because it sounds like they have had a hard time of it.

          Unfortunately I joined a company that writes software for the catering industry. Doh! Bad timing. Thankfully after several months of furlough followed by WFH on reduced pay I was able to move on to a company less affected by pandemics & lockdowns.

          Anyhoo, it sounds like you have done a great job keeping vital care services running.

        2. Yet Another Anonymous coward Silver badge

          Re: Assessment of risk

          >I can be accused of not including a global pandemic in the risk log of my last previous project

          But probably will be going forward.

          Not because another pandemic is any more likely, but because you know what the reaction to it will be.

          It's like terrorism, the risk of death is probably lower than the 1970s, but the risk that you won't be able to access your building in the middle of $MAJOR_CITY because of a terrorist 'alert' is real.

        3. Doctor Syntax Silver badge

          Re: Assessment of risk

          "I am currently a practicing PM I can be accused of not including a global pandemic in the risk log of my last previous project ...My project was within a social care department"

          A global pandemic might not have been in the risk log but given the project maybe a national epidemic should have been. It might still not have accounted for initial difficulties in getting PPE but I'd have thought it would have covered a good deal of what happened.

        4. Greybeard_ITGuy
          Thumb Up

          Re: Assessment of risk

          As a fellow PM I salute you! I am in a completely different industry, but you navigated the waters in roughly the same way I did.

  3. Doctor Syntax Silver badge

    Having been in that situation myself, anyone trying to clean up from a major fire has my sympathy.

  4. rcxb Silver badge

    Fire protection

    It's almost as if it might be worthwhile to have proper fire protection in place. Because recovering after is very difficult and expensive.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like