Please insert tape number 363 of 4087
Talk about a Blue Monday: OVH outlines recovery plan as French data centres smoulder
Customers of European cloud hosting provider OVH have been told it plans to restart three data centres on its French campus in Strasbourg next week, following a massive fire on site this morning that destroyed one bit barn. The SBG1 and SBG4 data centres are scheduled to reopen by Monday 15 March and the SBG3 DC by Friday next …
COMMENTS
-
-
-
-
Thursday 11th March 2021 00:31 GMT Anonymous Coward
"Or the first version of Office for Mac 4.2 that had a fault on disk 29 of 30"
I doubt it.
Windows 3.11 for WG was six 1.44MB FD. OS2/Warp was about 30 floppies, Windows 95 about 25 or so. Word 2 was two floppies. Perhaps Macs had tiddly floppies back in the day (still do)
My back is dim, my eyes are bent etc ...
-
-
-
-
This post has been deleted by its author
-
Wednesday 10th March 2021 20:43 GMT Norman Nescio
Shipping containers?
I wonder if the fire-resistance of the data-centre was affected by the design: the data-centre was basically stacked shipping containers.
DataCenter Knowledge: Design: OVH Deploys "Container Cube" Data Center
I thought as much seeing the pictures taken by the Sapeurs-Pompiers (Great name!)
In principle, putting things in metal boxes sounds pretty fire-resistant, but once you start cutting holes in them for the D/C infrastructure, you might be building a giant brazier. I'll be interested to read the results of any investigation, if they are made public.
NN
-
-
Thursday 11th March 2021 09:47 GMT Norman Nescio
Re: Shipping containers?
Thank you for the clarification.
Having read elsewhere that the facility had wooden floors (!), which I can't quite believe, the concept of wood inside a metal container with holes in strikes me as 'bold' design for a data centre.
More pictures of the burnt out SBG2 in this article:
DataCenter Knowledge: Fire Has Destroyed OVH’s Strasbourg Data Center (SBG2)
A comment in that article says OVH have started swapping out C13/C14 power supply cables due to possible insulation defects. Perhaps they have an insight into the cause of the fire.
Tweet with video of damping-down operations:
https://twitter.com/abonin_DNA/status/1369538028243456000
I'm really interested to see any investigation report.
NN
-
Saturday 13th March 2021 14:53 GMT Roland6
Re: Shipping containers?
Some more under-construction picture giving a different view in this discussion thread.
https://lafibre.info/datacenter/incendie-sur-un-site-ovh-a-strasbourg/12/
And another here which shows just how lightweight the construction was:
https://twitter.com/olesovhcom/status/335448359525552128
Aside: Octave Klaba's stream contains more information about what is going on (technical activities) to restore services. Maybe useful to those looking to see if there is anything they can learn.
-
-
-
Thursday 11th March 2021 12:53 GMT Roland6
Re: Shipping containers?
> the facility had wooden floors (!)
Many offices and data centers have "wooden" suspended floors - the main component of the floor tiles being an inch or so of plywood...
The big question would seem to be whether there was a steel underfloor/ceiling and thus firebreak between floors.
From the few pictures on the web, it seems that fundamentally the OVHCloud DC's are designed for passive airflow and thus the brazier design with a central chimney would seem to serve the purpose.
Obviously, in a brazier, the airflow is used to enhance combustion rather than cool.
What is going to be of general interest is what was it that caused the fire to burn for so long.
Also it will be interesting to see the pictures of the burnt-out interior - have the floors collapsed?
-
-
Friday 12th March 2021 02:55 GMT Roland6
Re: wooden floors
>And no fire control/prevention systems.
Given the discussion elsewhere about Halon etc. I think the OVHCloud DC's could not use an inert gas as the building is designed to be leaky. Which would seem to leave water and foam as the only options, neither of which would be appealing in a DC, so wouldn't be surprised if it was decided to not fit...
-
-
-
-
-
This post has been deleted by its author
-
Wednesday 10th March 2021 23:26 GMT John Brown (no body)
Disaster recover?
I think the marketing hype of "cloud" has just taken another beating.
Or am I the only one who remembers that "cloud" was supposed to invulnerable to outages according to the marketing wonks. "It's in the cloud and the cloud is dispersed and duplicated, it all "just works". They never mention that clouds can be stormy and cause lighting strikes and that the "invulnerability cost a hell of lot more than the baseline advertised price.
-
Thursday 11th March 2021 15:40 GMT DrG
Re: Disaster recover?
yawn... "Damn you kids and your cloud!"
Should people not use datacenters? Is the average website needing punter expected to build it's own fire-suppression system?
Hosting your own website, on premise, is generally a pretty dumb idea. Having no backup is a dumb idea too...
That chess website is a pretty good example of doing cloud correctly. On their own, they could never have been distributed like they are now, and barely bleep at a "once per generation" event like your datacenter being razed by fire.
Doing it wrong, and doing it right are concepts that will survive everything. Shacking your fist at the clouds does not accomplish much.
-
Thursday 11th March 2021 17:15 GMT Peter-Waterman1
Re: Disaster recover?
Not all clouds are equal and you should consider the Cloud providers ability to maintain uptime in the event of a data centre outage. Availability Zones are designed for this and are miles apart, but close enough for synchronous replication and no data loss.
Azure is starting to build Availability Zones into its regions, which allows synchronous replication between them and will allow you to keep going in the event of an outage in a single DC. They have about 20% coverage of their regions today but plan to increase this in the coming years.
AWS has three availability zones in all of its regions
Don't know about GCP
So its a case of you get what you pay for. There is a tonne of small cloud providers out there, but I don't know if I would trust them all to run my production workloads.
-
-
-
Thursday 11th March 2021 17:29 GMT Anonymous Coward
Re: Disaster recover?
"Or am I the only one who remembers that "cloud" was supposed to invulnerable to outages according to the marketing wonks."
Yes, I think you are the only one...
According to the Amazon CTO, Everything Fails all the time. So you need to make use of things like AZ's, Loadbalencers, replication if you want uptime. Moving a VM to the cloud without this is not going to help you, and the cloud providers will tell you that, and warn you against it!
https://www.allthingsdistributed.com/2016/03/10-lessons-from-10-years-of-aws.html
-
Thursday 11th March 2021 18:07 GMT EnviableOne
Re: Disaster recover?
Data centres are not a cloud, although a cloud is made up of them.
At the end of the day, if you had all your data in SBG2 (or the part of SBG1 that burned) and no-where else, "because its cloud" I have little sympathy.
That is not cloud, its Other Peoples Tin.
Now the ones we dont here about are those who did it right, and beause SBG went down there services came up in one of OVHs othe campuses or someone elses cloud in some other country for added resilience and saved themselves the downtime and red faces....
Now if all the Boards could see the extra cash u[pfront iss worth the savings if something like this happens...
-
Wednesday 10th March 2021 23:31 GMT ChipsforBreakfast
It's not the incident that's important
It's what you do afterwards that really counts. We're OVH customers, with servers in the destroyed SBG2 DC. We have redundancy (I've been playing this game far too long not to have!) but that depends on OVH's network actually passing packets correctly, which is isn't right now. I'm perfectly willing to give their support teams the benefit of the doubt for today but that runs out at 9.30am tomorrow because that's the time when I need to make the call on whether to initiate an expensive bare-metal restore to Azure.
If we DO need to do that it'll be entirely down to OVH's lack of support and not down to the fire. It'll also be the end of our relationship with OVH.
Accidents happen but falling over afterwards is avoidable....
-
Thursday 11th March 2021 05:58 GMT TonyJewell
Re: It's not the incident that's important
Yes, I was wondering about that. I'm a very minor OVH customer but was surprised that one of the services I use is currently offline and not just automatically routed to another data center. As you say, give it a day or so for the engineers to do their best in this difficult time for them. I can wait but for some this is more serious.
-
Thursday 11th March 2021 11:43 GMT ChipsforBreakfast
Re: It's not the incident that's important
And credit where it's due, OVH got in touch just after 9 this morning and the issue was resolved by 10am.
Customer's happy, we're happy - can't really ask for more (and again, the level of flexibility in the OVH network is really quite surprising for the price point)
-
Thursday 11th March 2021 12:36 GMT Doctor Syntax
Re: It's not the incident that's important
"It's what you do afterwards that really counts."
It's amazing how reactions vary. I had experience of a fire in the '70s. The fire affected one wing of the building. The top level management came to the decision that those of us who occupied that wing had to remain on the site for security reasons, quickly arranged for alternative accommodation for the others and decanted them out, and arranged for a few portacabins. Almost every department set to work sorting themselves out in their new space allocations, cleaning up, replacing equipment etc. and got back to work ASAP. One department whose equipment had survived unscathed just piled it up in their allocated space and sat there for days, apparently waiting to be told what to do.
-
-
-
-
Thursday 11th March 2021 12:23 GMT Down not across
Re: cooling
Joking aside as long as things don't leak (into air resulting in fumes) that shouldn't really be an issue. In fact oil has higher boiling point than water.
Nevertheless as far as I know OVH use water cooling at board level and try to avoid A/C by sucking air from outside, through servers into "hole in the middle of the building". So your answer is water and air (unless I've misunderstood their design).
-
-
Thursday 11th March 2021 08:28 GMT Claverhouse
Worse Things Happen...
I was hosting at Dreamhost when something went wrong [ nothing dangerous, something to do with email ? ] and various loonies went bitchcakes over the temporary loss of services.
Bad things happen and I can't imagine webmasters whining about a gap in hosting rather than being thankful no-one was injured...
.
.
Also, disgusting, hideous fugly modern architecture.
-
-
Thursday 11th March 2021 11:50 GMT Anonymous Coward
It's down the the customer to order or arrange a suitable DR facility. We use one very good UK data centre for our primary site, and I would be very surprised if it ever failed or caught fire such is the very high quality M+E and the general processes and people involved (and the very well known names we share it with who have all conducted extensive checks), but we do still have a DR site in another data centre, owned and operated by a different supplier with networking from another UK supplier not related to the others and supplied by a different part of the national grid. We push encrypted backups of several kinds to AWS and the DR site.
It costs a lot of money to maintain but if we were off-line for more than 8 hours, I suspect we'd lose a lot of customers, probably the most profitable ones. People have short memories when you let them down.
-
-
Thursday 11th March 2021 13:17 GMT Evil Auditor
«Noooo!!!! F4ck!!!»
I stopped counting the times a client tells me that their data and systems are safe 'cause it's all in the cloud - that is their distaster recovery plan. The "clever" ones of them even thought of having a mirrored site with the same cloud provider. Backup? Nothing they need to care about, 'cause it's in the cloud. Risk of the provider failing? Stop the crazy talk; these are bit corps, they never fail.
And literacy isn't widespread either, apparently. Time and again I find it clearly written in their SLA - and not in the small print - that e.g. backup is explicitely excluded and so are restoration tests. But the client didn't bother to read it. Or to think. Until the "noooo! fuck!" event.
-
Thursday 11th March 2021 13:48 GMT Anonymous Coward
Re: «Noooo!!!! F4ck!!!»
To be fair, their data remained in a cloud. The problem is that the black ones tend to be a one way ticket for data.
Now, not to make light of what is a disaster for many people, but automated attacks on some of my sites are way down. I'm guessing they're moving over to AWS whose IP addresses even pops up in our 404 list.
I must see if I can't get a "fake WP" plugin for Joomla to keep the hackers distracted.
-
-
-
-
Saturday 13th March 2021 12:29 GMT steelpillow
Re: How kind...
@GreaseMonkey You mean this one I presume: Outage: Faulty UPS at data centre housing London Internet Exchange causes grief for ISPs and telcos alike?
"The incident was caused by a faulty UPS system followed by a fire alarm (there was no fire)"
Yeah, a great firebomb that one. Really makes the joke work.
-
-
-
Friday 12th March 2021 16:55 GMT Grease Monkey
You've got to love shit like this...
"Noooo!!!! F4ck!!! Me like the most part of clients does not have any disaster recovery plan... My server is in Rack 70C09 - how to see if it is safe?"
If you don't have a DR plan there's only one person to blame when there is a shit/fan interface in the DC.
Clue: it's not the hosting company.
-
Saturday 13th March 2021 05:50 GMT DWRandolph
3 weeks to source 10,000 servers?
The various corps I've been at take longer to get the purchase order approved. And the equipment to support the severs: racks, switches, breaker panels, cabling, ... Shipping all that from wherever. Okay, they have some ready to be deployed for normal growth, but that would at most be a few hundred in the staging areas?
Then 10K units in 30K minutes (3 weeks * 7 days * 24 hours * 60 minutes = 30,240 minutes) means either my math is bad, or they are going to receive/rack/cable/provision a server every 3 minutes? Starting yesterday?
What am I missing for a place to put them? Assuming 1U servers at 40 per rack, 250 standard racks. Okay that is not bad, at 600mm x 1070mm only 160 square meters of floor space? Should be no problem finding a gallery with free space, though still need aisles for access and air flow. Did not a building at this site just burn down? Two other buildings severely damaged. Only one building available to take new load? Maybe other sites, then. Increased power / cooling / network capacities at those sites? How long to get those provisioned from the local utility companies?
Too bad for customers with legal reasons to stay within a certain region?
Where therapists have talked about cynicism and negative attitude, I see realistic expectations from experience. Or do I just not understand data center processes at this scale?