[citation needed]
From the article: And as humans need oxygen
See, that’s just shoddy journalism. Wild assertions, not backed up by any references or explanation. Pure speculation.
Poor show. I shall be canceling my subscription forthwith, etc etc.
A single availability zone in Amazon Web Services’ EU-Central region (EUC_AZ-1) has experienced a major outage. The internet giant's status page says the breakdown began at 1324 PDT (2030 UTC) on June 10, and initially caused “connectivity issues for some EC2 instances.” Half an hour later AWS reported “increased API error …
No fire suppression system is perfect. It is a business decision whether to resume operation with the system offline. As there is no reason to think a real fire is imminent, and the system has (presumably) not been activated for a long period of time, it seems a sensible decision to resume operation and reenable the suppression system as soon as reasonably possible.
I think that British buildings which were found to share the unfortunate feature with the notorious Grenfell Tower of being clad in candle-wax were told to get guards in to patrol through the place watching for fires, till it could be fixed. And as far as I know, they still have the guards. So... if the data centre supports human life now, then they may do that.
It's interesting that, in an industry that can have just about redundant everything (switches, servers, firewalls, you name it), it would appear that nobody bothered to plan redundant aircon (at least, not that I can tell from the article).
I know aircon is expensive, but now the question is : how much more expensive is a day of 100% downtime ?
You might want to design a second aircon system as backup, just to be sure.
You might want to design a second aircon system as backup, just to be sure.
I'm not sure that that's directly practicable. What you need is a resilient aircon that might have different pumps on different power circuits but a completely separate airflor is difficult.
And, at some point, all that extra redundancy means additional complexity, particularly when it's at a single site: load-balancing across separate data centres, or at least buildings on a site is probably easier.
I did see an alert from one service I know of (but don't manage) but the ops team said there was no downtime.
There are a lot of things that could go wrong within an Availability Zone that could cause an application to fail.
If you want your application to survive anything bad enough to take out zone, then you need to architect it to use at least two zones, and it's your problem to make sure you've addressed every component and service needed to make it work.
If that's still not good enough for you, then you need to look at using two separate regions, again making sure to address every component and service needed to make things work.
It quickly gets complex and expensive to cover off every possible risk. And the more complex it gets, the more the risk of something going wrong.
And if there's an event bad enough to take out an entire region, of a level more than a comms glitch, do you really think your staff are going to be rushing to sort the problem or frantically combing though the rubble looking for their loved ones?
Hetzner
OVHCloud
1&1
And quite a few more, according to https://www.websiteplanet.com/fr/web-hosting/
I used Hetzner and OVHCloud. Both worked very nicely and reliably.
And yes, always have a suitable backup strategy. Data centers do burn down then and now. You need at least three copies of each important record/file. Each copy in a different location or preferrably, a different service provider.
Now that all their equipment has been stress-tested to, what, 45C, 60C while powered on I wonder how long it will be before high failure rates happen.
Note that even (or especially) if you cut power at 40C ambient the internal temperatures still rise due to heat flow from the memory sticks and CPU modules.