back to article Talk about a Blue Monday: OVH outlines recovery plan as French data centres smoulder

Customers of European cloud hosting provider OVH have been told it plans to restart three data centres on its French campus in Strasbourg next week, following a massive fire on site this morning that destroyed one bit barn. The SBG1 and SBG4 data centres are scheduled to reopen by Monday 15 March and the SBG3 DC by Friday next …

  1. Danny 14 Silver badge

    Please insert tape number 363 of 4087

    1. chivo243 Silver badge
      Boffin

      363? 363? Why do I have a bad feeling?

    2. Doctor Syntax Silver badge

      It's bad enough when you get "Please insert tape 11" and you've only got 10.

      1. Anonymous Coward
        Anonymous Coward

        Or the first version of Office for Mac 4.2 that had a fault on disk 29 of 30 (I think, it was more than quarter of a century ago...).

        Fortunately Microsoft could provide a snail mail update and we were able to get up and working 2 months later.

        1. gerdesj Silver badge
          Gimp

          "Or the first version of Office for Mac 4.2 that had a fault on disk 29 of 30"

          I doubt it.

          Windows 3.11 for WG was six 1.44MB FD. OS2/Warp was about 30 floppies, Windows 95 about 25 or so. Word 2 was two floppies. Perhaps Macs had tiddly floppies back in the day (still do)

          My back is dim, my eyes are bent etc ...

          1. Anonymous Coward
            Anonymous Coward

            It was actually 36...

            ...and you can download them here:

            https://macintoshgarden.org/apps/microsoft-office-42

          2. Slabfondler

            OS2/Warp

            Disk 18 was - shall we say sensitive/ think it took me a dozen tries to get the full installation to work. Ah, those were the days. My first Linux took a couple of weeks just to download (a diskette or two a day).

            1. teknopaul Silver badge

              Re: OS2/Warp

              My first Linux install involved a pop down to the Red Bricks in Hulme Manchester where all the flats had a lan, you could netboot slackware with no removable media required.

              I seem to remember that Manchester Council did not like that.

          3. Eclectic Man Silver badge
            Facepalm

            Insert next disc

            Yeah, well, I expect YOU were one of the smart Alecs who remembered to remove the other 28 discs before inserting the 29th!

            (I do remember thinking at the time that the slot was a bit thin for all that media, but the manual said...)

      2. John Brown (no body) Silver badge
        Joke

        If you can't ramp it up to 11, you're doing it wrong!

    3. Dan 55 Silver badge
  2. This post has been deleted by its author

  3. Norman Nescio

    Shipping containers?

    I wonder if the fire-resistance of the data-centre was affected by the design: the data-centre was basically stacked shipping containers.

    DataCenter Knowledge: Design: OVH Deploys "Container Cube" Data Center

    I thought as much seeing the pictures taken by the Sapeurs-Pompiers (Great name!)

    In principle, putting things in metal boxes sounds pretty fire-resistant, but once you start cutting holes in them for the D/C infrastructure, you might be building a giant brazier. I'll be interested to read the results of any investigation, if they are made public.

    NN

    1. joepie91

      Re: Shipping containers?

      SBG1 is made of shipping containers, SBG2 is not. It does use similar materials, but is actually a custom building design: https://pbs.twimg.com/media/BKfAgXZCEAEkJOl?format=jpg&name=large

      1. Norman Nescio

        Re: Shipping containers?

        Thank you for the clarification.

        Having read elsewhere that the facility had wooden floors (!), which I can't quite believe, the concept of wood inside a metal container with holes in strikes me as 'bold' design for a data centre.

        More pictures of the burnt out SBG2 in this article:

        DataCenter Knowledge: Fire Has Destroyed OVH’s Strasbourg Data Center (SBG2)

        A comment in that article says OVH have started swapping out C13/C14 power supply cables due to possible insulation defects. Perhaps they have an insight into the cause of the fire.

        Tweet with video of damping-down operations:

        https://twitter.com/abonin_DNA/status/1369538028243456000

        I'm really interested to see any investigation report.

        NN

      2. Roland6 Silver badge

        Re: Shipping containers?

        Some more under-construction picture giving a different view in this discussion thread.

        https://lafibre.info/datacenter/incendie-sur-un-site-ovh-a-strasbourg/12/

        And another here which shows just how lightweight the construction was:

        https://twitter.com/olesovhcom/status/335448359525552128

        Aside: Octave Klaba's stream contains more information about what is going on (technical activities) to restore services. Maybe useful to those looking to see if there is anything they can learn.

    2. Danny 14 Silver badge

      Re: Shipping containers?

      metal containers with holes in them? We call those "braziers".

      1. Roland6 Silver badge

        Re: Shipping containers?

        > the facility had wooden floors (!)

        Many offices and data centers have "wooden" suspended floors - the main component of the floor tiles being an inch or so of plywood...

        The big question would seem to be whether there was a steel underfloor/ceiling and thus firebreak between floors.

        From the few pictures on the web, it seems that fundamentally the OVHCloud DC's are designed for passive airflow and thus the brazier design with a central chimney would seem to serve the purpose.

        Obviously, in a brazier, the airflow is used to enhance combustion rather than cool.

        What is going to be of general interest is what was it that caused the fire to burn for so long.

        Also it will be interesting to see the pictures of the burnt-out interior - have the floors collapsed?

        1. Xalran

          Re: wooden floors

          Here it's full plain, massive wooden floors... not just plywood raised floor tiles.

          And no fire control/prevention systems.

          ( since that's something OVH don't waste money on unless there's a local law that forces them to do it. )

          1. Roland6 Silver badge

            Re: wooden floors

            >And no fire control/prevention systems.

            Given the discussion elsewhere about Halon etc. I think the OVHCloud DC's could not use an inert gas as the building is designed to be leaky. Which would seem to leave water and foam as the only options, neither of which would be appealing in a DC, so wouldn't be surprised if it was decided to not fit...

  4. Androgynous Cow Herd

    my money is on a UPS fire

    I've seen one. They are impressive.

    1. Danny 14 Silver badge

      Re: my money is on a UPS fire

      I worked at british leyland and saw a battery room fire, we feared a cook off but the fire suppression worked - everything was coated in powder afterwards. It was filled with 2v lead acid batteries. It was recorded on the vhs cctv.

      1. IGotOut Silver badge

        Re: my money is on a UPS fire

        You sure it was a battery fire and not just another bunch on strike with the burning bins to keep them warm?

        Still, they probably still shipped any fire damaged ones out.

  5. This post has been deleted by its author

  6. John Brown (no body) Silver badge

    Disaster recover?

    I think the marketing hype of "cloud" has just taken another beating.

    Or am I the only one who remembers that "cloud" was supposed to invulnerable to outages according to the marketing wonks. "It's in the cloud and the cloud is dispersed and duplicated, it all "just works". They never mention that clouds can be stormy and cause lighting strikes and that the "invulnerability cost a hell of lot more than the baseline advertised price.

    1. Danny 14 Silver badge

      Re: Disaster recover?

      indeed, asynchronous replication for the poor people and synchronous for the bigger players was the selling point.

    2. DrG

      Re: Disaster recover?

      yawn... "Damn you kids and your cloud!"

      Should people not use datacenters? Is the average website needing punter expected to build it's own fire-suppression system?

      Hosting your own website, on premise, is generally a pretty dumb idea. Having no backup is a dumb idea too...

      That chess website is a pretty good example of doing cloud correctly. On their own, they could never have been distributed like they are now, and barely bleep at a "once per generation" event like your datacenter being razed by fire.

      Doing it wrong, and doing it right are concepts that will survive everything. Shacking your fist at the clouds does not accomplish much.

      1. Peter-Waterman1

        Re: Disaster recover?

        Not all clouds are equal and you should consider the Cloud providers ability to maintain uptime in the event of a data centre outage. Availability Zones are designed for this and are miles apart, but close enough for synchronous replication and no data loss.

        Azure is starting to build Availability Zones into its regions, which allows synchronous replication between them and will allow you to keep going in the event of an outage in a single DC. They have about 20% coverage of their regions today but plan to increase this in the coming years.

        AWS has three availability zones in all of its regions

        Don't know about GCP

        So its a case of you get what you pay for. There is a tonne of small cloud providers out there, but I don't know if I would trust them all to run my production workloads.

        1. teknopaul Silver badge

          Re: Disaster recover?

          Don't ovh generally rent boxen, rather than iaas or saas solutions?

      2. John Brown (no body) Silver badge

        Re: Disaster recover?

        "Should people not use datacenters?"

        I didn't say that. Datacentres were around before "cloud". I was having a go at the marketing and media types.

      3. Anonymous Coward
        Anonymous Coward

        Re: Disaster recover?

        Woohoo! Lichess FTW!

    3. Anonymous Coward
      Anonymous Coward

      Re: Disaster recover?

      "Or am I the only one who remembers that "cloud" was supposed to invulnerable to outages according to the marketing wonks."

      Yes, I think you are the only one...

      According to the Amazon CTO, Everything Fails all the time. So you need to make use of things like AZ's, Loadbalencers, replication if you want uptime. Moving a VM to the cloud without this is not going to help you, and the cloud providers will tell you that, and warn you against it!

      https://www.allthingsdistributed.com/2016/03/10-lessons-from-10-years-of-aws.html

    4. EnviableOne Silver badge

      Re: Disaster recover?

      Data centres are not a cloud, although a cloud is made up of them.

      At the end of the day, if you had all your data in SBG2 (or the part of SBG1 that burned) and no-where else, "because its cloud" I have little sympathy.

      That is not cloud, its Other Peoples Tin.

      Now the ones we dont here about are those who did it right, and beause SBG went down there services came up in one of OVHs othe campuses or someone elses cloud in some other country for added resilience and saved themselves the downtime and red faces....

      Now if all the Boards could see the extra cash u[pfront iss worth the savings if something like this happens...

  7. ChipsforBreakfast
    Alert

    It's not the incident that's important

    It's what you do afterwards that really counts. We're OVH customers, with servers in the destroyed SBG2 DC. We have redundancy (I've been playing this game far too long not to have!) but that depends on OVH's network actually passing packets correctly, which is isn't right now. I'm perfectly willing to give their support teams the benefit of the doubt for today but that runs out at 9.30am tomorrow because that's the time when I need to make the call on whether to initiate an expensive bare-metal restore to Azure.

    If we DO need to do that it'll be entirely down to OVH's lack of support and not down to the fire. It'll also be the end of our relationship with OVH.

    Accidents happen but falling over afterwards is avoidable....

    1. TonyJewell

      Re: It's not the incident that's important

      Yes, I was wondering about that. I'm a very minor OVH customer but was surprised that one of the services I use is currently offline and not just automatically routed to another data center. As you say, give it a day or so for the engineers to do their best in this difficult time for them. I can wait but for some this is more serious.

    2. ChipsforBreakfast
      Thumb Up

      Re: It's not the incident that's important

      And credit where it's due, OVH got in touch just after 9 this morning and the issue was resolved by 10am.

      Customer's happy, we're happy - can't really ask for more (and again, the level of flexibility in the OVH network is really quite surprising for the price point)

    3. Doctor Syntax Silver badge

      Re: It's not the incident that's important

      "It's what you do afterwards that really counts."

      It's amazing how reactions vary. I had experience of a fire in the '70s. The fire affected one wing of the building. The top level management came to the decision that those of us who occupied that wing had to remain on the site for security reasons, quickly arranged for alternative accommodation for the others and decanted them out, and arranged for a few portacabins. Almost every department set to work sorting themselves out in their new space allocations, cleaning up, replacing equipment etc. and got back to work ASAP. One department whose equipment had survived unscathed just piled it up in their allocated space and sat there for days, apparently waiting to be told what to do.

    4. Anonymous Coward
      Anonymous Coward

      Re: It's not the incident that's important

      If you want to migrate to Azure for better uptime, you might want to re-think that choice!

  8. anothercynic Silver badge

    6 hours...

    ... That's a whole lot of burning... 500m2? Only 500m2? Over 5 floors? That's what... 10x10m? Sounds a little too... little.

  9. Dog Eatdog

    Those nice colourful panels on the outside of the building seem to have melted. I wonder if they were supplied by Arconic?

    1. anothercynic Silver badge

      Great minds... ;-)

  10. YetAnotherJoeBlow Bronze badge

    cooling

    I wonder what OVH used for cooling - I hope it was not oil...

    1. Down not across Silver badge

      Re: cooling

      Joking aside as long as things don't leak (into air resulting in fumes) that shouldn't really be an issue. In fact oil has higher boiling point than water.

      Nevertheless as far as I know OVH use water cooling at board level and try to avoid A/C by sucking air from outside, through servers into "hole in the middle of the building". So your answer is water and air (unless I've misunderstood their design).

  11. Claverhouse Silver badge
    Meh

    Worse Things Happen...

    I was hosting at Dreamhost when something went wrong [ nothing dangerous, something to do with email ? ] and various loonies went bitchcakes over the temporary loss of services.

    Bad things happen and I can't imagine webmasters whining about a gap in hosting rather than being thankful no-one was injured...

    .

    .

    Also, disgusting, hideous fugly modern architecture.

    1. john 103

      Re: Worse Things Happen...

      you mean webmainers

  12. Potemkine! Silver badge

    I was thinking that when something was 'in the cloud', it meant there were replications in different locations. So maybe not after all....

    It may be a wake-up call for many companies who were sold blatant lies by unethical salespeople (pleonasm, I know)

    1. Anonymous Coward
      Anonymous Coward

      It's down the the customer to order or arrange a suitable DR facility. We use one very good UK data centre for our primary site, and I would be very surprised if it ever failed or caught fire such is the very high quality M+E and the general processes and people involved (and the very well known names we share it with who have all conducted extensive checks), but we do still have a DR site in another data centre, owned and operated by a different supplier with networking from another UK supplier not related to the others and supplied by a different part of the national grid. We push encrypted backups of several kinds to AWS and the DR site.

      It costs a lot of money to maintain but if we were off-line for more than 8 hours, I suspect we'd lose a lot of customers, probably the most profitable ones. People have short memories when you let them down.

      1. Persona Silver badge

        People have short memories when you let them down.

        On the contrary, people can have very long memories when you let them down.

        1. Anonymous Coward
          Anonymous Coward

          I don’t know why I said short ... I meant long ! Bad day ....

          1. Hero Protagonist
            Coat

            That’s ok, we’ll have forgotten by tomorrow

            1. Doctor Syntax Silver badge

              Forgotten what?

        2. John Brown (no body) Silver badge

          "On the contrary, people can have very long memories when you let them down."

          But when you let them down twice, they quickly forget. See TalkTalk data breaches.

  13. Evil Auditor

    «Noooo!!!! F4ck!!!»

    I stopped counting the times a client tells me that their data and systems are safe 'cause it's all in the cloud - that is their distaster recovery plan. The "clever" ones of them even thought of having a mirrored site with the same cloud provider. Backup? Nothing they need to care about, 'cause it's in the cloud. Risk of the provider failing? Stop the crazy talk; these are bit corps, they never fail.

    And literacy isn't widespread either, apparently. Time and again I find it clearly written in their SLA - and not in the small print - that e.g. backup is explicitely excluded and so are restoration tests. But the client didn't bother to read it. Or to think. Until the "noooo! fuck!" event.

    1. Anonymous Coward
      Anonymous Coward

      Re: «Noooo!!!! F4ck!!!»

      To be fair, their data remained in a cloud. The problem is that the black ones tend to be a one way ticket for data.

      Now, not to make light of what is a disaster for many people, but automated attacks on some of my sites are way down. I'm guessing they're moving over to AWS whose IP addresses even pops up in our 404 list.

      I must see if I can't get a "fake WP" plugin for Joomla to keep the hackers distracted.

    2. Doctor Syntax Silver badge

      Re: «Noooo!!!! F4ck!!!»

      "Time and again I find it clearly written in their SLA"

      Why read the SLA when the salesman was so convincing?

  14. spireite Bronze badge

    Oh, we still have to backup in the cloud?

    It never ceases to amaze me that people think they don't have to consider backups themselves, I'm sure a few people will discover their backups don't exist as a result.

  15. Anonymous Coward
    Anonymous Coward

    Beautiful building! oh the colours!

    The famous architectural style called Bidonville Revival.

    ...

    Too bad it does not protect the equipment, the systems, etc...

    I wonder who approved this...

    I wonder who is the insurer?

    1. A.P. Veening Silver badge

      I wonder who is the insurer?

      At this scale that is not a real problem, I wonder who the re-insurer is as that will be the one footing this bill.

  16. Anonymous Coward
    Anonymous Coward

    On-prem off-prem?

    If el reg can't make up its mind I've got no chance.

  17. steelpillow Silver badge
    Coat

    How kind...

    ...of the EU to warn us Brits to get out before they firebombed themselves in the foot.

    1. Grease Monkey Silver badge

      Re: How kind...

      How is this "the EU"?

      Remember the massive outage in Harbour Exchange last august the one that actually made the national news? Did you blame that on "the UK"?

      1. steelpillow Silver badge
        Facepalm

        Re: How kind...

        @GreaseMonkey You mean this one I presume: Outage: Faulty UPS at data centre housing London Internet Exchange causes grief for ISPs and telcos alike?

        "The incident was caused by a faulty UPS system followed by a fire alarm (there was no fire)"

        Yeah, a great firebomb that one. Really makes the joke work.

  18. Grease Monkey Silver badge

    You've got to love shit like this...

    "Noooo!!!! F4ck!!! Me like the most part of clients does not have any disaster recovery plan... My server is in Rack 70C09 - how to see if it is safe?"

    If you don't have a DR plan there's only one person to blame when there is a shit/fan interface in the DC.

    Clue: it's not the hosting company.

    1. Doctor Syntax Silver badge

      "Clue: it's not the hosting company."

      No, just all those who made reassuring noises about "the cloud".

  19. DWRandolph

    3 weeks to source 10,000 servers?

    The various corps I've been at take longer to get the purchase order approved. And the equipment to support the severs: racks, switches, breaker panels, cabling, ... Shipping all that from wherever. Okay, they have some ready to be deployed for normal growth, but that would at most be a few hundred in the staging areas?

    Then 10K units in 30K minutes (3 weeks * 7 days * 24 hours * 60 minutes = 30,240 minutes) means either my math is bad, or they are going to receive/rack/cable/provision a server every 3 minutes? Starting yesterday?

    What am I missing for a place to put them? Assuming 1U servers at 40 per rack, 250 standard racks. Okay that is not bad, at 600mm x 1070mm only 160 square meters of floor space? Should be no problem finding a gallery with free space, though still need aisles for access and air flow. Did not a building at this site just burn down? Two other buildings severely damaged. Only one building available to take new load? Maybe other sites, then. Increased power / cooling / network capacities at those sites? How long to get those provisioned from the local utility companies?

    Too bad for customers with legal reasons to stay within a certain region?

    Where therapists have talked about cynicism and negative attitude, I see realistic expectations from experience. Or do I just not understand data center processes at this scale?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2021