Who knew data centres were tinder boxes?
In the olden days they used to shut off the power and the oxygen before the precious hardware could burn.
One of cloud provider OVH’s data centres is on fire and services are severely disrupted. An incident report posted at 2342 UTC on March 9, and updated at 0353 on the 10th, states: We are currently facing a major incident in our Strasbourg datacentre, with a fire declared in the SBG2 building. Firefighters intervened …
In the olden days it was /your/ data you were protecting.
Hardware is cheap and the data isn't theirs.
Customers look at price and tickboxes. For many, IT has been outsourced for that long there's no one in charge who even knows what questions to ask.
Fires will happen where there is a lot of electricity about.
"Losing DC2, Mr. Worthing, may be regarded as a misfortune; to lose part of DC1 and shutting down DC3 & DC4 looks like carelessness"
The problem is systemic. You try not to have a fire or lose a DC - but containment is the be all and end all in fire situations. I would expect to see impregnable firewalls outside the box as well as within them. Failures will happen in IT and buildings. The trick is to make sure they don't cascade.
I had OVH on my bucket list as the first stop if I encountered issues at my current DCs. They just dropped off.
The problem is of course when the staff decide to turn them off.
This relates to the Ocado fire in Andover:
"A fire that destroyed an Ocado warehouse spread because a detection system failed and staff turned off the sprinklers, a report has found. The distribution centre for the online food retailer in Andover, Hampshire, burned for four days in February. Hampshire Fire and Rescue Authority's report found there was an hour's delay in dialling 999 and staff initially tried to tackle the blaze themselves."
That would suggest there was no fire suppression.
Maybe the fire was outside that data hall hence the "tackle it themselves" and it quickly escalated. So many modern buildings appear to be death traps when it comes to construction materials these days it is not in conceivable that a simple electrical failure in a kettle could have caused this. Fire suppression using gas only works if the room integrity is maintained and are usually concerned with a fire in the protected space, not one starting out side.
I am sure there will be much navel-gazing as people try to figure out what happened and went wrong.
it will be interesting to see if a fire suppression system had been installed.
According to at least one article in the French computer press they just had ordinary water sprinklers every two racks, no misting or gas system. Sounds more like a level of protection suitable for a shop or hotel than for a DC.
"Was the building purpose built as a DC?"
If it wasn't why was a large Cloud company using it as a DC?
Why are all their DC's in the same building on the same power supply?
Anyone that uses this company should move to another supplier and ask the questions that they should have asked before contracting with the OVH amateurs.
Water systems come in different flavors. Dry and Wet. It is hard to test a dry system. When a dry sprinkler system goes off, the pressure in the pipes drops, a valve opens, and water flows through the pipes and out to extinguish the fire. The whole system is dry until an Automatic or manual valve is open. Pretty expensive bauble to have break and possible single point of failure. If you water the servers and the wrong time. You have to plant new Servers. ;-)
Not necessarily once personnel were evacuated. While at college I did work experience at a large company with a massive mainframe + couple of minis, the place was like a rabbit's warren of kit. The guy showing us the room explained about the halon system and pointed at the alarm and warning lights, and we were told that if those went off 1) there was a fire, and 2) we have a certain number of seconds to get out of that room before the halon turned on. Not it would turn on once we left, rather it was turning on in n seconds (60 I think) and we REALLY didn't want to be in there when it did!
If it ever gets discharged though they won't be able to refill it with Halon. Last I looked a few years ago, you could keep existing installed systems, but if they were discharged then they had to be refilled with something else.
One DC we used to use, used Argon gas. Part of our induction tour was the same, "If you're in here at night and the alarm goes off, you've got 60s before the place is flooded with Argon and you won't be able to breathe". During the day when more people were about, the discharge was on a manual system.
One place I visited many years ago had a limit on the number of people allowed in areas protected with fire suppression systems. This was, according to the person I went to see there a safety measure. They had masks/hoods which were supposed to allow you to exit these areas and then the building safely. There were only a certain amount in each area hence the limits on people numbers in those areas . Don't know/remember what the system was using as a suppressant. My contact said it really chilled the blood to be in one of those areas there when the alarm goes off even if it's just a test. Thankfully we didn't have to go there and there were no alarms during my visit.
I would like to see the risk assessment of anywhere that allowed personnel to enter a confined space with an active fire suppression system. All HV switchrooms that I have entered, and there have been more than a few daily over many years, had an interlock system to ensure you could not enter unless the fire suppression was locked off. A bit of a pain but a lot better than trying to breath CO2.
This would explain why the system might have been switched off during maintenance.
Back when I were a trainee operator, one of the first things to be rammed home to me was to keep the computer room tidy at all times - no loose items left on top of various cabinets, desks, etc.
Not because of some OCD tendency of the chief operator, but because of the sheer hurricane-level force of the fire suppression gas being released.
A 2400' tape reel flying across the room and hitting you in the neck would severely hamper any attempt to get out in an emergency
Not because of some OCD tendency of the chief operator, but because of the sheer hurricane-level force of the fire suppression gas being released.
Yup. I was working in a comms room once when there was a dump in the computer hall. There were windows between and it was pretty spectacular. I only saw a bit of the dump heading for the fire exit. Fire brigade came, no fire found but apparently triggered by a fault. Interesting part was discovering a slight snag. Hall originally had windows, but those got covered with sheet rock, so there were a few halligan marks where they probed to find windows to break so the hall could be vented. Once declared safe, the hall was in quite a mess with printer paper & manuals blown everywhere, panels blown off some of the kit.
From chatting with the fire people, one of the biggest risks with being inside a room when that happens is the risk of getting disoriented by the sound & fury of a discharge, making it harder to find a fire exit, especially given the time constraint to find an exit before being suffocated.
I watched a halon discharge in the server room at Marconi Instruments, from the safety of outside, through the windows.
Our poor Computer Manager was caught inside, I think he had a discharge of his own, but he did escape. It was quite hilarious though watching him running around!
How about false floor tiles flying around?
I wasn't working on the computer test floor then, but they tested the new Halon system with some other gas. They said that some of the floor tiles "blew up". I think I'd rather be hit by a tape reel then a floor tile.
60s is generous. My first job the secure vault had a halon system on a 30s timer and was behind a safe door that took 20s to open. The door was required to stay shut when you were working in the room, so I don't think it would be possible to get to the door, open the door, get out, then close it behind to stop the halon getting out within the time available.
The same job also had a bomb drill plan that was basically "get everyone in one place so its easier to account for the bodies". They tested it one Friday afternoon at 6pm after 3/4 of the people had gone home and they still couldn't fit everyone into the "safe areas".
60 seconds does sound like an awfully long time. My dazed memory from a large DC seems to recall 20 - when you hear the alarm, head for the exit and if you don't see the exit, hit the floor. Thinking of a domestic fire demonstration I once witnessed (less dazed memory), there might be not that much left worth extinguishing in the room after 60 seconds. I'm far from being an expert in firefighting or data centre fires but found quite convincing in-cabinet fire suppression systems with early detection: as soon as some component start to emit smoky stuff, the power to the affected cabinet is cut and the cabinet flooded with fire supression agent. I assume that is not what OVH had installed.
You don't hit the floor, that's the best way to get killed.
Modern fire protection systems are some inert gaz that replace the oxygen of the air... by flooding the area explosively. 800psi bottles that gets release gaz from the top and in some cases also under the raised floors and in the hanging ceiling, so laying on the floor will only get you :
- die from suffocation,
- die from being hit by stuff falling from the ceiling
- die from being thrown in the air with the raised floor tiles you were laying on. and falling back with all the debris from both above and below flying with you.
When the alarm is heard, there's only one thing to do : run to the nearest door, don't think, don't try to pick up stuff, just run.
Back in the early 80's when I worked on the KSA TEP4 contract a halon cylinder 'accidentally' discharged and came adrift from the wall it was securely bolted to. It wrecked a large part of the equipment floor where the new Jeddah International Switching Centre was being installed - an Ericsson AXE-10 derivative if my fading memory is correct.
>Of course, a chimney fire is less worse than a roomful of plastics and toxic chemicals going up in smoke.
The modern house is more flammable than a pre-war house with open coal/wood fire, because of all the
plastic, that s why you need good smoke alarms - in the home you don't have to get out quick because the Halon, it's the toxic smoke.
"there is a 'holdoff' button next to the doors in case anyone can't get to them in the 60 seconds allowed."
Well if you've managed to get to the button, go through the door?
(I know, I know - it would hopefully be someone else who presses it for you)
"Which is itself overridden by the "deadlock doors and release halon now" button under the BOFH's desk."
I thought that switch supplied natural gas to the blaze. A BOFH can have all sorts of fun when a new DC goes in. An extra box of cable here, a few spare drives there and the new flight simulator at home starts competing with a Cray.
..there is a 'holdoff' button next to the doors in case anyone can't get to them in the 60 seconds allowed.
Some I've worked in also have a key or control to disarm the system when there's people inside the room. Snag then is making sure people re-arm it when they've finished. One had an interlock system where there was a padlock to lock the switch safe, and that had to be removed to exit the hall.
Making the escape doors inoperable seems like a fundamentally bad idea. :| A previous employer had what was effectively an anti-thievery system on the fire door which prevented it being opened by not-particularly-obvious means; cue fire alarm and large crowd of people at the bottom of the stairs unable to get it open. I managed to do so (definitely a case of "stand aside, determined idiot on the scene" and lacerated my hand in the process) and complained to the building manager about it not being fit for purpose. She insisted she could see no problem with what I described. Sigh.
Ah yes, I remember a 'test' emergency evacuation a work once.
We were on the 2nd floor (3rd floor for USAfolk). And were all set to go. The employee designated to smash the glass tube keeping the emergency exit door shut could not be found (it was considered an honour so nobody else was allowed to touch it) for a while. Then it was decided that the padlock would be unlocked instead to avoid the broken glass and cost of replacement.
So 5 Minutes after the alarm was sounded, the key was found and we all trooped to the bottom of the stairs to find the external door locked, and no key and no emergency override. (Oh, and no other way out, either.)
What we learnt was that there was a serious deficiency of management, and that someone had to do something.
What we didn't learn was whether the metal hammer provided to smash the glass cylinder keeping the door to the emergency exit shut would actually work in an emergency.
It was the only evacuation test we did. But on the bright side we never had a fire, so that's ok.
One place I worked at the fire alarms went off, which we presumed to be a fire drill, but this time there was a chap at the bottom of the stairs blocking the main entrance telling people that they weren't heading for the proper fire exit and couldn't use it, to head for the fire escape door instead. There were a fair number of people milling around unsure of what to do, to the point the stairs were backing up. I headed for the fire exit door, broke the glass tube and opened it, we then filed out.
There was a brief discussion that I might have done something wrong as it was "only a drill", which indeed it was, but they seemed to miss whole point of a fire drill being to ensure that people do the right thing in a real one, which just might possibly include breaking some glass to save your life.
A data center had a sort of 2-stage fire exit to prevent it from being propped open. Standard fire door with crash bar to unlock it, then a second door a few feet beyond. I assumed the second door worked the same (never needed to use it myself).
Later, I talked with a fellow who used the fire exit. He forgot to grab his access card on the way out. And he wasn't carrying a cell phone. The halon didn't discharge, and someone eventually tracked down his banging and shouting and let him out of the man trap.
"He forgot to grab his access card on the way out."
Umm, when a fire alarm sounds you are only supposed to go back for other people and essential prescription medication. It is not supposed to be worth dying for an access card or mobile phone.
A colleague had a heart condition, and was on medication, eventually getting a pacemaker. I did ask him if he could evacuate the building from his 4th floor office were there to be a fire. He said it would be a struggle. I then told him that if he's not fit enough to use the emergency exit in an emergency then he is not fit enough to be at work.
I then told him that if he's not fit enough to use the emergency exit in an emergency then he is not fit enough to be at work.
Depends on the location of the nearest emergency exit in relation to the desk he's usually at. At the office I would be 5 meters away from the door to the emergency stairs, and it's just one floor down to the street level exit. Even with an uncooperative leg I can manage that in less time than a fully fit person starting from the most unfavourable location on the third floor; we timed it. For disabled people on higher floors in another part of the building there are stair-chairs (and cow-orkers).
>For disabled people on higher floors in another part of the building there are stair-chairs...
These (the stair-chairs) are a real nuisance...
For a client with potentially lots of low mobility people, I suggested they investigate installing either a couple of airplane slides (the building is only 2 storeys with a reasonable amount of land around it), or an auto-belay based system. Both of these would be fun and likely to be used at other times and so staff would maintain their skills...
You have to consider that you might be on the toilet when the alarm sounds, so it is not just your normal desk location that matters. In any case he would clearly have struggled to descend the many flights of stairs from the 4th floor (there were no 'stair chairs' available as far as I know).
I use the past tense because he died suddenly, while on holiday abroad, about 5 years ago.
I used auto-belay devices at the climbing wall, they are ok, but you need to ensure the harness is properly used or you'll fund yourself descending by free-fall.
(I'm curious for the reasons for the downvotes of my post above, but I guess they will remain a mystery.)
He was, and is greatly missed. He was the person I asked when I did not know what to do (and so did quite a lot of other people). He had the knack of being a calm voice of reason and pragmatism when others were panicking. Good sense of humour and knew all the best pubs in London. Many were the times when we had a meeting in London, and afterwards he would usher us all past several perfectly respectable ale houses to the obscure one down some alley that did the best beer or cider in the locality. He once put in an expenses claim for two people for a lunch of six pints of beer and a lump of cheese!
When he had his pacemaker he was ok to be in the office.
Feeling quite sad now.
Umm, when a fire alarm sounds you are only supposed to go back for other people and essential prescription medication. It is not supposed to be worth dying for an access card or mobile phone.
Absolutely! I think they installed a crash bar on the exterior fire door after that incident. (The card was only to re-enter the DC after he discovered that exit was locked.)
In the UK: "Where someone meets the definition of a disabled person in the Equality Act 2010 (the Act) employers are required to make reasonable adjustments to any elements of the job which place a disabled person at a substantial disadvantage compared to non-disabled people"
In the US: The Americans with Disabilities Act, and the Rehabilitation Act, provide the same protection.
If said colleague was unable to evacuate from the 4th floor, due to his condition, then its the companies responsibility to make an adjustment in his work environment to allow him to continue. So move the team to the ground floor, insert fire boxes (refuge areas) in the escape path, provide assistance ...
Hence the downvotes
Thanks for the ex[planation. He was working in a purpose built secure environment, which had to be approved by the relevant HMG agency, so not practical to move the facility. His condition was being treated, and once he had a pacemaker he was ok. My point was that if you are temporarily so unwell that you cannot use the emergency exit in an emergency then you should not really be at work (as this potentially puts yourself and other people at unreasonable risk).
Making the escape doors inoperable seems like a fundamentally bad idea. :|
That's what intrigued me. It passed fire safety and worked a bit like the interlocks on some power/hazardous kit, ie being able to lock it in a safe position so someone couldn't wander along and turn the machinery on while people were working on it. Once they're done, they can unlock it and put it back into service. I guess the theory went that whoever disarmed the supression system could re-arm it and still exit, plus there were other fire doors that could still be used.
A previous employer had what was effectively an anti-thievery system on the fire door which prevented it being opened by not-particularly-obvious means;
Same as the Shirtwaist Factory Fire.
Thank God we learnt and progressed over 100 years
Ours was one of them, back in the day. I recall one of our ops had to be rescued from the back of the computer room after a halon dump incident. I admit I rather uncharitably wondered at the time if she was having a sneaky cigarette back there...
Halon is gone - production stopped - Use banned with a few exceptions. There are other gases eg Novec 1230. but you can not just change the gas. You need to replace the installation as the new gasses take more space.
The space/weight issue is why there are exceptions in transport.
Given the photos of the data centre, I have this strange sense of déjà vu. Grenfell anyone? A pretty DC wrapped in - what appears to be aluminium cladding, possibly with a pretty, flammable, polyethylene wafer in the middle, perhaps? Maybe like that Arconix cladding... you know, the stuff that failed flammability tests... the stuff that killed 78 people while the manufacturer squirmed its way through the inquiry trying to avoid getting the blame...
"the stuff that killed 78 people while the manufacturer squirmed its way through the inquiry trying to avoid getting the blame..."
Don't forget "light touch regulation" and the "bonfire of the red tape" (c) 2014-2019, the Cameron/Johnson Chumocracy.
That's the least of the problems. Arconic (not Arconix) should have withdrawn its product from market, yet they continued to flog the product... for another decade. A highrise in the UAE (the Tamweel Tower) went up in flames in 2012, yet... they kept on reselling that rubbish.
I feel that some beancounters may have been involved.
25 years ago, I used to work in large electricity substations, the large indoor ones were all protected by CO2 systems, all mechanical and a right pain in the arse to disengage when you were working in them.
Some beancounter worked out that the CO2 systems cost so much, it would be financially beneficial to take them all out, and if a substation burned it burned.
That bean counter didn't seem to give a shit that it would take over six weeks to rebuild such a big substation, and many electricity customers would be powered off for three or so weeks whilst alternatives were put in place.
"Customers told to activate DR plans"
Customer has an emergency board meeting:
'But...but... we're paying them to look after our data so data recovery is their responsibility!' screams an about-to-be-sacked exec.
Another customer: 'We outsourced all our IT. The backup schedule wasn't defined so our single backup dates to the start of the contract. Responsibility for restoring the backups wasn't defined and so they won't do it. We don't have anyone who even /knows/ how to do it. We're doomed'.
"we have loads of backups, but restore? Whats that?"
In the 1980's worked with a software development group with its own network. Backups were done religiously to TAPE each night and each Friday a week of backups again to TAPE. They were S L O W and a short straw was drawn for who would have to stay late and reset everything after the backup was finished.
A fumble fingered programmer deleted the wrong file one day - and 'it is OK it is backed up' - however although the tape listed the file as there - it would not restore. Nor would any of the tapes restore. Turns out that nobody had thought to test the restore would work!
From then on the person doing the backup would be required to choose a file at random (make a safety copy) - delete the file - then restore it.
I feel lucky that my first job (Philips, around 30 years back) had an IT department staffed by a bunch of long-time mainframe beards who had a lot of experience with that sort of thing and took it very seriously. 24 hour ops did daily, full backups, the tapes were kept in a fire safe and alternated with another set (I think they kept four sets altogether) in their other nearby data centre on a weekly basis and the recovery procedure was fully tested. An engineer visited regularly to check the tape units were in proper working order and aligned correctly. My own contribution was to change the system to use incremental backups six days a week; given that each of the dozen minis needed 2-3 QIC tapes to do a full backup the ops seemed happy with the reduction in faffing about.
What's surprised me is that so many places I've been (or encountered) since aren't anywhere near as fastidious with their backup strategy. Considering so many organisations are just one incident away from a crisis I'm surprised disaster doesn't strike more often.
One thing I've noticed during the lockdowns thankfully is I don't get the usual callers to my front door. So the religious groups, the knife sharpeners, the mobile butchers/fishmonger, political canvassers etc. have all gone. My usual ploy of coughing profusely and saying
"Don't get too close I think it's contagious."
Would have been much more believable though.
Someone I know answered the door to a pair of Jehova's Witlesses, while in the process of turning a pig into freezable chunks, so, bloodied apron, bare chest and still holding the knife.
They didn't bother him ever again.
No need for knife sharpeners' services either.
GoDaddy is very distant from NameCheap. GoDaddy has more domains under their control and yet NameCheap is almost always ranked #1 for illegal activity by Spamhaus. If NameCheap is number 1 in a quarter, they will be again by the next one.
NameCheap does nothing about abuse reports. I just quit reporting them and go directly to the gTLD owner and they usually take care of it. ICANN does nothing about NameCheap and the failures of NameCheap to comply with ICANN requirements.
The world would truly be a better place without NameCheap. Then again their "abuse" team that they like to call their legal team is in Eastern European countries.
GoDaddy isn't even on the list, OVH is though.
namecheap.com Namecheap, as detailed earlier in this report, is the most abused domain registrar when it comes to botnet C&Cs. Sadly, Namecheap also managed to get into the Top 10 list of Networks hosting the most botnet C&Cs in Q2.
I did wonder about that. But considering OVH has nearly 30 DCs all over the world I bet that having your data in Strasbourg and your back-up DC in Strasbourg too sounds intuitively wrong enough for most people to have selected a different city for disaster recovery. Even if you did not know that all four Strasbourg DCs are effectively next to each other.
>But considering OVH has nearly 30 DCs all over the world
But only 11 physical locations/sites...
>Even if you did not know that all four Strasbourg DCs are effectively next to each other.
From the reports and pictures, it would seem the data centres are merely separate adjacent halls.
However, it is not clear whether OVHCloud had configured or sold the cluster of data centres as some form of DR offering. Just goes to show that even with cloud, physical location of your live and DR servers is still a necessary consideration.
OVH has some very clever networking capabilities that in most situations make life much easier for us poor buggers who have to make it all work without upsetting the bean-counters too much.
Said capabilities make it easy(ish) to have a primary site in one DC and replica in a completely different country. You'd think that would be enough DR for just about any reasonable situation... except...
That same clever networking is the Achilles heel. Long and short of it, it's broken. That DR site is now sitting there, up and laughing at my futile attempts to get it to talk to the world.
OVH support, naturally enough, can't give any sort of clue when this is likely to be fixed. I can't blame them for that - I really wouldn't want to be working for OVH's network engineering team right now.
Next up, restore from backup - to another provider!
Lesson : there is NEVER enough DR.
>That DR site is now sitting there, up and laughing at my futile attempts to get it to talk to the world.
If you want to scare your colleagues and managers, you could try asking: how long (with no IT and thus no business being conducted) before the company is unable to recover. Depending on the business, the window is to get some form of IT back up and running again can be quite short and may also vary depending on when in the accounting month/period it occurs.
Useful information to have when trying to get sign-off for your seemingly expensive business continuity plans...
We have a (tested) 48-hr RTO for a full bare-metal restore from backup to an alternate provider (Azure in this particular case since these are all Windows VM's). Actually activating that process is expensive so the decision to do it will be made tomorrow depending on whether or not we can get any traction with OVH support.
To be honest, if we DO have to do it we're unlikely to return to OVH. Accidents happen, what's a lot more important to us is how the aftermath is handled (or not handled...)
"Our two main DCs were until very recently only a mile or so apart, [...]"
IIRC a customer had two DCs - but both in line with the local airport's approach path.
Another customer acquired a custom-built DC that was on a large river's flood plain. The exterior low-walled compound had large electric water pumps in the event of a storm surge. It was discovered that the pumps were only on the mains electricity supply - not on the site's emergency diesels.
.... is if after this we actually get to learn the stories of some of the businesses that go to the wall due to this fire.
Then I can put those stories in front of Manglers in the future when the demands to push everything into the Cloud come along.
Too obvious! ------------->
Many years ago, a professional body for the insurance industry, published a heavily reported paper on the outcome of a fire in a business.
They said that over 75 percent folded.
Now the fire is in someone else’s business but the outcome may well be the same.
"data centre destroyed by fire in Strasbourg" implies that a fire that was in Strasbourg happened to destroy a data centre.
"data centre in Strasbourg destroyed by fire" implies that a data centre which happens to be in Strasbourg was destroyed by fire.
Unless this fire was more widespread (which is not what your report says), you wanted the second of those, not the first.
-- UK Pedants unite!
Thank goodness you are not editing El Reg!
Headlines do not follow normal English conventions. Sometimes for brevity, other times (like this) to make it more interesting. You can't deny that "data centre in Strasbourg destroyed by fire" would NOT be a very good headline. Just deleting the "in Strasbourg" would have improved it a little but the chosen headline is much better than either.
The article then makes it clear the fire was not the whole of Strasbourg.
Writing isn't about rules: it is about goals and how best to achieve them. Violating rules requires good reasons, but it occurs very frequently for emphasis and other communications reasons. Fortunately El Reg has professional editors who understand that.
Unless there's more than one Strasbourg, and you're referring to the one which has been destroyed by fire...
Guess what, natural language is ambiguous (which is why you'll never get a true natural language parser that doesn't also understand context and cultural references). Move on.
Just by curiousity, as english isn't my mothertong
"data centre destroyed by fire in Strasbourg" or "data centre in Strasbourg destroyed by fire" or "Strasbourg, data center destroyed by fire" or "By fire, a data center destroyed in strasbourg", isn't exactly the same ?
english is pretty easy, even with Grammatical / conjugated / orthography error, a sentence is most of the time understandable, unlike some other language where strict rules is mandatory to be understood.
I once visited a sensitive HMG site. One of those where you park your car outside the gate, go to the gatehouse and tell them who you are and who you are visiting and they then check with them and if ok, let you (but not your car) in. Unfortunately their computer system was down at the time. The phones were working, but the directory ... was on the computer system, so unless you knew the extension number if your contact you couldn't get in.
I was there for a meeting to interview them and discuss the project, so working would not have been problem. Also the really important equipment (coffee machines, microwave ovens) were still operational.
I am also able to perform the arcane and obscure 'witchcraft' known as 'handwriting', so was even able to take notes on something called 'paper' which I carried with me*.
*(Yes, I am a bit of a smart-arse, sorry, but it comes in handy sometimes.)
Ah yes ...
a long time ago in DEC-land there was a full power outage to the data center and they discovered that all of the backup and emergency procedures were documented... well you can guess where
Then there was the fire that took out the building in Basingstoke and the machine room with it, though the fire was not connected with anything IT.
Followed by the data centre sales and planning departments holding their heads in their hands as the IT ops manager, when asked about DR was quoted as saying "we have the no-plan plan" ...
We'll probably find that it's still popular
Re: "we have the no-plan plan"
I once did a BS7799 / ISO 27001 review of a small organisation (Govt. Dept.). They had a floor in an office run by another, much bigger, Govt. Department. When I asked what would happen if the building was permanently unavailable (e.g., a fire) they were of the opinion that the larger Department would find them somewhere. When I asked if that agreement was written down anywhere, they said it was not, they had a 'gentlemans' agreement.
Strangely, my report was not as complimentary as they were hoping.
>When I asked what would happen if the building was permanently unavailable ...
Yes business continuity is difficult, it also seems to fall between traditional IT and "the business". I expect lockdown has enhanced many companies ability to carry on working whilst the office is unavailable.
Yeah, I worked at a bank where the new (in 2005ish) data centre deliberately didn't have any company logos on it. It just looks like a warehouse building, although the gated security and barriers give a bit of a hint that it's something more serious. Thousands of people will know where it is (most current and ex IT bods, delivery drivers, 3rd party engineers, etc, etc, etc), so the obscurity is paper thin, but there you go.
Yeah, I worked at a bank where the new (in 2005ish) data centre deliberately didn't have any company logos on it. It just looks like a warehouse building, although the gated security and barriers give a bit of a hint that it's something more serious.
There's a bank DC up the road from me. The casual observer wouldn't know what it was (or at least whose it is if they made a guess from the cooling gear visible on Google Earth - at street level it's all very discreet and screened off with embankments and planting).
I have no business knowing either, I'm just a nosy git who poked around to see who owned the address!
The data centre used, until 3-4 years ago, is housed in a very tatty part of Milton Keynes, on an industrial estate (yeah, industrial estate in MK doesn't exactly narrow it down). To gain access it's a side door, like you'd have to get into your garage, with a letterbox and a doorbell. The little nametag under the doorbell simply says "private morgue".
I have avoided the cloud like the money-raising pandemic that it is. The reason? All of your eggs in someone else's basket is risky.
This fire will send a lot of ripples into a lot of ponds. As we speak there are board-level questions being asked about this incident such as "does it affect us?", "do we have backups?", "are there alternatives?", "are we insured", "is this what 99.99% availability looks like?" and "what happened to our own data centre - and it's staff?" Sadly, some business may even fail.
Sooner or later something like this was going to happen. I like my spinning rust and noisy power supply fans where I can see and hear them!
The difference is that the DR centre is only activated when you need it, not all the time. One of the advantages of being prepared in that way is that you should have a regular rehearsal event as part of the contract. That enables you to test your backup-recovery cycle.
I worked for a client who simply had a hot standby at the opposite end of the factory. In the event of a fire big enough to destroy both ends of the factory losing IT was the least of their worries.
>The difference is that the DR centre is only activated when you need it, not all the time.
The only way to ensure the DR centre is there when you need it is to regularly use it.
I panicked one customer by proposing to run a new system in active-active-standby mode with additional capacity in each active DC to handle the full load, with the standby largely being offline capacity to be spun up if an active DC had failed. They, having mostly come from a batch background (hence had operated a pair of traditional lights on and lights off DC's) hadn't fully appreciated the service level issues associated with real-time processing of large numbers of online customer transactions and having a relatively minor backend outage being made visible to customers.
> I have avoided the cloud like the money-raising pandemic that it is. The reason? All of your eggs in someone else's basket is risky.
Er, putting some *clones* of your eggs in some distant baskets doesn't preclude you from keeping your own egg baskets.
You say you want to see your baskets at all times, but that limits you to keeping all your baskets where they are vulnerable to a single event, such as flooding, fire, stoat invasion, etc etc
I've always felt that a good setup would be to have most of it local, with "backup" going onto 3rd party services which can provide you DR, and for a bit of flexibility in scalability/high load etc.
Most people NEED some kind of DC setup - though don't want to have to deal with over and under provisioning problems.
>We always used to keep a set of trained attack dogs onsite! It was very successful at repelling the stoat invasions!
A grain of truth in that. Living in the countryside we have lots of small furry visitors. However, since we've had a Jack Russell, we've not seen any live ones and in recent years only occasionally a dead one. An additional advantage, the neighbours cats only enter the garden in a dire emergency with no intention of lingering...
All of your eggs in someone else's basket is risky.
It depends on how highly you value your eggs. There are cloud providers who will allow you to replicate your eggs across geographically diverse, or even continentally divers, data centres. They'll charge you more for a greater degree of resilience but if you need that resilience it still likely works out cheaper than setting up multiple datacentres of your own
There are cloud providers who will allow you to replicate your eggs across geographically diverse
And then one day the cloud company will decide that for some reason they dislike whatever service you provide and cancel your contract overnight, like AWS did to Parler. What you are going to do then?
Or just a little less.
Mid 1980's, and your data centre was just your computer room, although it would have a fire suppressant system, an UPS and offsite tape storage, but no backup DC elsewhere. So I happened to be inside when I notice the soundscape changing, but it wasn't immediately obvious what the cause was. Took a couple of seconds before it clicked that the low rumble emitted by the aircon was gone, immediately followed by the sense of urgency caused by realizing that the systems were still belching close to 100kW total into the hall, but none of that was taken out. Storming into the sysadmin pen I had no problem at all conveying that urgency to the ones present, half of which were dispatched to round up any air moving device from the office areas, and the others shutting down and switching off any system not utterly extremely necessary; I think they left three area routers and their routers for Northwestern Europe.
It was the only time I saw a thermograph pen move: Ten deg C in as many minutes. Luckily there was very little loss of hardware; three RA81 HDAs out of over a hundred, and two systems started throwing memory errors a couple of days after.
I had a very similar experience at about the same time. The on-site DEC engineer pointed out the RA81s would be out of warranty if their temperatures exceeded a certain value, so the Ops Manager got the loading bay doors open* and 'a number' of large fans running. No loss of service (it was some VAXclusters and an IBM mainframe clone) and no hardware loss.
*I'm pretty certain he had the building main entrance open, and the multiple security doors through to the machine room open as well to get airflow through - in through the entrance, out through the loading bay, with people stationed to prevent unwanted visitors.
Actually, recovering from 'no sound at all' is much, much easier that recovering from 'crackling and roaring'. I've had to recover from 'no sound at all' twice. The first was because of an unplanned UPS shutdown in th middle of the day. The second time was because I personally shut down every system in the room at 2am because it was over 85F and rising, in the computer room.
>I have avoided the cloud ...All of your eggs in someone else's basket is risky.
Actually it depends on who 'all' is.
From a national viewpoint, there is an issue with having a significant number of businesses using the same cloud provider. For an individual business, there is little difference: either you are responsible or you've outsourced that responsibility - typically to a single provider.
With only a handful of cloud providers whilst we might be using IT more efficiently, we have also increased our dependency on things working.
But where is your DNS? Switching email servers is fundamental to disaster recovery these days. In the old days you could just use the blower to get round any local difficulty, in theory.
Reminds me of the Rackshack incident when their only connection to the power grid blew up. The UPS kicked in just fine - not a blink on the servers. But then one of the generators started overheating. They had been both tested but not together - and the exhaust systems kinda got compromised. No problem, a quick adhoc redesign, lots of plywood would sort it - but where were the carpenters?
Oh - the switchboard wasn't connected to the UPS and, being the US, there wasn't any mobile coverage in the area. They had to drive up a hill to get a signal and call in the troops. The users weren't told the amazing story until a couple of weeks later when the DC was finally reconnected to the grid. Amazing damage control by the bottom of the bargain server companies.
Sadly the architect of the first company to break the $100 server price point is no longer with us:
My DNS is at OVH.
I can still get in to change things, but I have to go to the billing section first, find the bill for the last renewal of that particular domain, then go into settings from there.
Going by the obvious route leads to error pages.
I tried a DNS query for a subdomain I don't frequently use, so shouldn't be cached anywhere, and it did respond reasonably quickly.
I remember when they decided to test the UPS & generator systems at a certain county council.
First the generator wouldn't fire up due to water contaminated fuel.
Once drained & resupplied, on the weekend of the rescheduled test the generator caught fire.
This is seriously weird - every DC I've ever been in had fire suppression systems to remove the oxygen element from the fire triangle, whether that was halon or one of the newer compounds - which is why of course humans have to get the hell out with extreme rapidity, as it isn't just fires that need oxygen. Be interesting to see how the building was fire compartmented as well, did someone skip on the proofing, like in the Twin Towers? Or do we have a cladding issue that's fuelled it and assisted the spread to the second DC?
The warning that I have seen in places I've visited that have something like this is along the lines of:-
"If you smell lemons then exit immediately."
Is this similar to the function of mercaptan to alert people of gas leaks, or does halon really smell of lemons?
It is a lemon scent added to the fire suppressant so that us humans can tell when there is a leak or activation of the system.
Of course pressurised systems usually also provide a 'whistle' distribution nozzle, so that there is an ear-splitting scream if the fire-suppressant system is activated, so anyone who cannot smell still has a sporting chance of surviving.
Well, you are assuming a lot, like, competent management, spending money on fire detection and suppressant technologies. Testing of said technologies on a regular basis, and of course maintenance of said technology as per the manufacturers' instructions. All of which costs $$$money$$$, and has to come off someone's budget.
Just remember that HMS Antelope was hit by an Exocet missile in the Falklands / Malvinas war. The warhead did not explode, but the engine and unspent fuel did cause a major fire in a 'fire-sealed' compartment. Unfortunately the engineers installing equipment on HMS Antelope had breached the sealing walls, so the fire spread and the ship sank.
And then a story: I once was asked to approve, as an IT Security consultant, a very large data centre for use by a govt Department. They claimed to have lots of other Departments' equipment there so it must be ok. The could not, however, provide any certificate from any HMG Accreditor as to the security status of the site, or the details of any Accreditor or Departmental Security Officer who would independently vouch for the datacentre. After about there weeks of me pestering the supplier, I had to formally advise my boss not to go with them as it was likely they did not have adequate security, despite having lots of HMG contracts and equipment there already.
Hi, Ozumo, you are right, the Antelope was hit by bombs. Very sorry to hear about your school-friend being killed.
The ship I was thinking of was HMS Sheffield:
"She was struck and heavily damaged by an Exocet air-launched anti-ship missile from an Argentinian Super Étendard aircraft on 4 May 1982 and foundered while under tow on 10 May 1982."
Wooden floors. So says LeMonde.
Thick, heavy hardwood floors are among the least likely to combust or weaken in an intense fire. Steel beams, however, deform and sag like warm taffy in a sufficiently hot fire. FPSE (Fire Prevention and Safety Engineering) people are interesting to talk with - you would much rather have oak beams and floors, and wool insulation than glass and steel with fiberglass insulation after a chat with them.
Well, yes and no. With sufficient ventilation hardwood burns quite nicely: just ask any wood-burning stove owner. With insufficient ventilation, it takes a while, but will make hardwood charcoal, which is 'rather brittle', so not something I'd like to trust to structurally.
But the FPSE people do have a point: hardwood chars nicely, and char is rather a good insulator, so a structural element will form a layer of char on the outside while maintaining good strength inside for a long time. Which is why the Chinese used (impregnated) white oak re-entry shields for some of their recoverable surveillance satellites (Fanhui Shi Weixing).
The capsule for the FSW, like that of the US Discoverer/KH-1 spy satellite, was mounted heat shield-forward on top of the launch vehicle. The ablative impregnated-oak nose cap covered electrical equipment. The spherical aft dome contained the recovery parachute. The film reels for the camera were located in an intermediate compartment.
The FSW did have an oaken heatshield. I saw one on display at the bicentennial airshow (Richmond, Sydney, 1988). This craft was launched on 1987 August 5, and was previously known in the west as China 20, then variously called FSW 0-9 or FSW 0-10, depending on whose chronology you refer to. Cospar 1987 067A, NORAD 18306. The oak was charred, and some broke off when display attendants moved the spacecraft. They simply vacuumed most of it up so that the charcoal didn't mess the carpet! I was able to photograph both the exterior and interior (equipment had been removed).
I wonder if it was a 'sustainable' construction building, with wooden walls and wool insulation
Doesn't look like a traditionalist dream to me.
Rather more like a modernist's delight, with plenty of plastics, cladding, foam ceiling tiles and exciting new materials recommended in Bauhaus Weekly.
When the fire brigade turn up, all your previous bets around proximity to other local DCs are off. I bet some companies decided to back up to another DC on site in order to save on comms costs, so will be completely unavailable until the fire marshalls agree to power up again. Of course if the power supply wasn't sufficiently compartmentalised then it would be OVH's fault.
Another potential issue is if the any local safe holding backup tapes is operated electronically. Yes some orgs are still glued to tape backups.
What if they also located the backup catalogs on line in one of the DCs - even cross connected makes no difference if it's all down.
However, in the defence of colo DCs such as OVH, some of the FUBARs from cloud providers with respect to DNS outages, storage configuration admin errors and so forth have caused entire regions to go down with poor/slow back-out options when it happened - an even bigger blast radius.
Tape backups are fine if the copies are remote (and not in the building next to the one on fire).
Reason people still use tape is because its the cheapest medium and can be taken out of that expensive tape library and stored (in the correct environmental conditions) for years if needed.
In the St Marys Axe devastation, one of the problems was that tape backups were in a fire safe in the building. Although perfectly ok, the Fire Brigade refused entry until they were sure the building was structurally sound, which took several days, during which time, of course service could not be resumed.
Along with the warning icons available to warn for Trolls, pedants and jokes, maybe there should be one for total lunacy:
There was a military site, when the Internet had just been invented, that had an internal network connected to the 'outside' via a firewall (just the one, it was before dual homing and DMZs). There was also a connection to the outside world bypassing the firewall "in case the firewall breaks".
"This begs the question - was there any space between the buildings to prevent the spread of a fire?"
My bet is they screwed up this part. Anti-fire compartimentation is very costly, particularly in the cabling columns of a building. And of course, you need to redo them every time you add cabling.
I expect Octave got a good deal on the land as it looks like it's in the old docks area. Also really close to the German border for a resilient fibre connection or two. At least there was a plentiful supply of water to suppress the fire...
I've got a cheap VPS with them, fortunately at Erith in Kent. It gets more activity from Digitalocean than other OVH IP addresses, although the subnet it's on occasionally gets blocklisted...
One of the main Birmingham Universities found that one of it's Computer Rooms fire suppression cylinders were empty recently, there wasn't anyone maintaining the system and topping them up as required. It took quite a period to source the required gas in the quantity required and get them charged!
Good news about everybody being safe, but...
Update 5:20pm. Everybody is safe.
Fire destroyed SBG2. A part of SBG1 is destroyed. Firefighters are protecting SBG3. no impact SBG4.
— Octave Klaba (@olesovhcom) March 10, 2021
We appear to be reading this tweet before it has been posted. If people move quickly enough, they may be able to migrate away from SBG2 before the fire starts.
I'm not an architect, or construction engineer for that matter, but the video and pictures of the DC make it look like it was built out of shipping containers. Shirly that cannot be correct? Does anyone know how the building was constructed?
On a personal level I have received literally 0's of messages of concern and I can confirm that my local backup disk, sat on my desk just under my computer is perfectly safe and well, and my collected digital legacy is perfectly safe (although I should probably get around to checking that it is actually storing backup files some time, but hey, it's only been 5 years since I last checked, so it must be ok).
I wonder if they upgraded their power supply after that incident:
"The SBG site is powered by a 20kV power line consisting of 2 cables each delivering 10MVA. The 2 cables work together, and are connected to the same source and on the same circuit breaker at ELD (Strasbourg Electricity Networks)."
Looking from above at SBG2 the building looks like number 8 as displayed with a classic 7 segment display. Only these segments are the building.
There are two voids in this building as in the number 8. These voids go down from top floor to almost ground floor. The height of the floors are lower than expected.
Looking at pictures of the fire these inner voids acted as chimneys.
The building seems to be a metal structure mostly. Were the data halls only on ground floor level? Why this kind of design? Was it done this way to be energy efficient/ green?
The adjacent massive building looks like a typical data center building.
The building that burned down definitely appears to be of a modular metal construction:
Not your usual concrete floored 'old telephone exchange' concrete and brick cladding type building at all.
It seems to be a metal framework, with wooden floors, and then clad on the outside.
I imagine that the wooden floors provided very little fire containment once it was established - and looking at pictures, at least each side of that 8 was one long corridor.
In short... I don't think they had a chance once this fire got established.
I asked a friend of mine who worked in supporting physical infrastructure for a tech company. His assessment is interesting:
"...they seem to have stacked a number of UPS / generator containers right up tight to the building (give away is the chiller radiator on the outside). You would normally place these some distance from the build.......as they are a serious fire risk!!!!!! Either diesel or chance of combustable fumes from the batteries"
I imagine even those with backups most of them are either on same site (or even same server) or using OVH backup addon service (which may well be hosted in same DC).
Personally I implement a local backup which is also copied to a remote location, the remote location been an entire different company, so I can be sure its a different physical location.
Just make sure that the connection between them is via diverse paths. A long time ago, in a gaxaxy far, far away (yes, that company), we had to get representatives of both ISPs on a call so they could confirm that their fiber paths didn't share the same interduct or conduit or trenches between our facility and each of their COs.
confirm that their fiber paths didn't share the same interduct or conduit or trenches
That reminds me of some conversations with $Major Investments when I worked for a vendor. They asked questions at a level of detail that my company hadn't considered. We delivered our service over leased lines with 2 separate carriers who used separate trenches, but we didn't know whether the lines might share a trench somewhere in the middle. And when other clients specified "redundant feed" they hadn't expected that the second feed came from a data center in a different geographic region.
To be fair, we all worked to a high standard. It's just that $Major's standards were exceptional. I learned a lot about operational rigor. Also about conflict management (PFY style).
LeMonde says the fire started at 1am.
"The fire quickly spread through the building. 'We put in place a large hydraulic device, using a high-powered pump-boat [which drew water from the Rhine], to prevent propagation to adjoining buildings,' Damien Harroué, commander of relief operations, told AFP. 'The floors are wood, and well-heated computer equipment will burn. These are plastics, it generates significant smoke and flames,' he added, to explain the large amount of smoke and the rapid spread of the fire." LeMonde also said it was not a Seveso-classified site meaning no dangerous chemicals were present and air-quality testing showed no pollutants had been released.
Depends on the rationale. I've had clients who have a number of "datacentres" adjacent to each other on the same 'production' site. In this instance they were referred to as "halls" and each had a different security designation and purpose. The DR versions of these halls being located across various locations around the country.
So I expect each OVHCloud DC is actually a specific size to allow for modular manufacture and easy kit out etc. The mistake (other than whatever caused the fire) seems to have been insufficient separation (in all its forms) between DCs/halls.
Are there other language translations of cloudtrastrophie?
My bet is 50% of big customers will have no backup, or only backups and DR that is at least one year out of date. Every DR exercise I saw, failed, but they told management all was good anyway. Meh, with 'agile' configuration done by waves of contractors and no documentation, there will be gnashing of teeth.
Your data will NOT get any 'priority'. The customers who will, will be the big ones - carsales, realestatesales will be at the front.
In addition INSURANCE will not pay out either. In a legal spat they will subpoena the company documents and minutes that had signoff that backups and DR were in place.
....in a universe far away, large companies with mission critical systems and data used to build their own data centres (plural):
- DC1 - A main data centre somewhere near the head office
- DC2 - A secondary data centre at least a time zone distant
Network connections from both DC locations had connections to a local POP and a backhauled connection to a remote POP.
In these archaic times there would be an annual disaster recovery trial every year, likely taking at least a week to complete and document.
Ah....but all this engineering on two sites COST FAR TOO MUCH MONEY!
So........we move everything to a CLOUD PROVIDER where the staff knew nothing about the business requirements, and had only a tenuous connection (if any) with staff within the business. BUT IT WAS CHEAP..........UNTIL.....................................