AWS Frankfurt experiences major breakdown that staff couldn’t fix for hours due to ‘environmental conditions’ on data centre floor • The Register Forums

Friday 11th June 2021 01:45 GMT David 132

[citation needed]

From the article: And as humans need oxygen

See, that’s just shoddy journalism. Wild assertions, not backed up by any references or explanation. Pure speculation.

Poor show. I shall be canceling my subscription forthwith, etc etc.

37 0 Reply

Friday 11th June 2021 01:59 GMT swm

Re: [citation needed]

"From the article: And as humans need oxygen"

Clearly controlled tests are needed.

17 0 Reply
1. Friday 11th June 2021 08:20 GMT KarMann
  
  Re: [citation needed]
  
  Controlled tests, and quicklime. Lots of quicklime.
  
  11 0 Reply
2. Friday 11th June 2021 17:38 GMT NoneSuch
  
  Re: [citation needed]
  
  I thought Frankfurters were supposed to be hot.
  
  (ahem...)
  
  6 0 Reply
Friday 11th June 2021 08:18 GMT Velv

Re: [citation needed]

Humans don't need oxygen, it's just highly addictive. One breath and you're hooked for life.

22 0 Reply
1. Friday 11th June 2021 12:20 GMT Ozumo
  
  Re: [citation needed]
  
  And the withdrawal syndrome is invariably fatal.
  
  10 0 Reply
Friday 11th June 2021 22:12 GMT anothercynic

Re: [citation needed]

I know, many of us sysadmins are not humanoid in nature, but... ;-)

0 0 Reply

Friday 11th June 2021 02:05 GMT Yet Another Anonymous coward

environment will be safe for re-entry within the next 30 minutes

Currywurst ?

15 0 Reply

Friday 11th June 2021 03:39 GMT PeteS46

And when a fire does break out in the affected area? With an off-lined protection system? What then occurs?

4 0 Reply

Friday 11th June 2021 06:15 GMT The Mole

Agree, but I imagine part of the reason it is still offline is it needs resealing and refilled with new gas.

6 0 Reply
Friday 11th June 2021 13:16 GMT Graham Cobb

No fire suppression system is perfect. It is a business decision whether to resume operation with the system offline. As there is no reason to think a real fire is imminent, and the system has (presumably) not been activated for a long period of time, it seems a sensible decision to resume operation and reenable the suppression system as soon as reasonably possible.

4 0 Reply
1. Friday 11th June 2021 21:30 GMT Robert Carnegie
  
  I think that British buildings which were found to share the unfortunate feature with the notorious Grenfell Tower of being clad in candle-wax were told to get guards in to patrol through the place watching for fires, till it could be fixed. And as far as I know, they still have the guards. So... if the data centre supports human life now, then they may do that.
  
  3 1 Reply

Friday 11th June 2021 05:48 GMT Ken Moorhouse

As soon as the fire brigade arrives...

...they are in charge, you can kiss any grand plans you have goodbye.

Demonstrates the fact that resilience is not just about pc-pc resilience, it is something much bigger than that.

4 0 Reply

Friday 11th June 2021 15:32 GMT Anonymous Coward

Re: As soon as the fire brigade arrives...

A client had that with the London bus bombings

They had a fantastic hot site a few streets away - which the police also cordoned off and wouldn't let them into.

6 0 Reply

Friday 11th June 2021 06:47 GMT Anonymous Coward

"failure of a control system which disabled multiple air handlers in the affected Availability Zone."

So, redundant control systems are necessary.

0 0 Reply

Friday 11th June 2021 13:17 GMT Graham Cobb

No - that is what an Availability Zone means: no redundancy promises within the zone. If you need redundancy, pay for a second Availability Zone.

7 0 Reply
1. Friday 11th June 2021 19:06 GMT Ken Moorhouse
  
  RE: pay for a second Availability Zone.
  
  But then, in theory, there could be an internal replication problem between Availability Zones, meaning data is out of chronological sequence.
  
  1 2 Reply

Friday 11th June 2021 07:11 GMT Pascal Monett

Once again, Single Point of Failure failed

It's interesting that, in an industry that can have just about redundant everything (switches, servers, firewalls, you name it), it would appear that nobody bothered to plan redundant aircon (at least, not that I can tell from the article).

I know aircon is expensive, but now the question is : how much more expensive is a day of 100% downtime ?

You might want to design a second aircon system as backup, just to be sure.

10 2 Reply

Friday 11th June 2021 12:43 GMT Anonymous Coward

Re: Once again, Single Point of Failure failed

Cooling systems do tend to be resilient - where I work is not a big budget kind of place and we have A+B cooling in our datacentre. You can still get a commo-mode failure even in a resilient system - e.g. if a control is set incorrectly.

3 0 Reply
1. Friday 11th June 2021 19:17 GMT Ken Moorhouse
  
  Re: Did you say Commode Failure?
  
  Is this what the Victorians talked about prior to the invention of the modern-day fan?
  
  (See Airplane for a visual demonstration).
  
  3 0 Reply
Friday 11th June 2021 13:16 GMT Charlie Clark

Re: Once again, Single Point of Failure failed

You might want to design a second aircon system as backup, just to be sure.

I'm not sure that that's directly practicable. What you need is a resilient aircon that might have different pumps on different power circuits but a completely separate airflor is difficult.

And, at some point, all that extra redundancy means additional complexity, particularly when it's at a single site: load-balancing across separate data centres, or at least buildings on a site is probably easier.

I did see an alert from one service I know of (but don't manage) but the ops team said there was no downtime.

1 0 Reply
1. Friday 11th June 2021 19:11 GMT Ken Moorhouse
  
  Re: all that extra redundancy means additional complexity
  
  That goes with the territory. Anyone providing Cloud infrastructure will (should?) be intimately familiar with the risks.
  
  0 1 Reply
Saturday 12th June 2021 21:58 GMT The Basis of everything is...

Re: Once again, Single Point of Failure failed

There are a lot of things that could go wrong within an Availability Zone that could cause an application to fail.

If you want your application to survive anything bad enough to take out zone, then you need to architect it to use at least two zones, and it's your problem to make sure you've addressed every component and service needed to make it work.

If that's still not good enough for you, then you need to look at using two separate regions, again making sure to address every component and service needed to make things work.

It quickly gets complex and expensive to cover off every possible risk. And the more complex it gets, the more the risk of something going wrong.

And if there's an event bad enough to take out an entire region, of a level more than a comms glitch, do you really think your staff are going to be rushing to sort the problem or frantically combing though the rubble looking for their loved ones?

1 0 Reply

Friday 11th June 2021 07:11 GMT Logiker72

European Alternatives to Oligopoly

Hetzner

OVHCloud

1&1

And quite a few more, according to https://www.websiteplanet.com/fr/web-hosting/

I used Hetzner and OVHCloud. Both worked very nicely and reliably.

And yes, always have a suitable backup strategy. Data centers do burn down then and now. You need at least three copies of each important record/file. Each copy in a different location or preferrably, a different service provider.

3 3 Reply

Friday 11th June 2021 07:31 GMT mhoneywell

Re: European Alternatives to Oligopoly

OVH Cloud - have you missed something?

https://www.theregister.com/2021/03/10/ovh_strasbourg_fire/

To be fair, they are now probably a step ahead of AWS in terms of understanding the implications of DR.

13 0 Reply

Friday 11th June 2021 10:25 GMT UrbaneSpaceman

All the smart switches in my house stopped working,

I had to get up and press a button to switch off the bedroom light - LIKE A SAVAGE!!

14 0 Reply

Friday 11th June 2021 10:36 GMT AndrewB57

Back to The Dark Ages for you

I'll get my coat

10 0 Reply
1. Friday 11th June 2021 13:16 GMT TimMaher
  
  I’ll get my coat
  
  Looks like it’s not where you hung it up.
  
  It’s over there—————>
  
  2 0 Reply

Friday 11th June 2021 21:02 GMT Bitsminer

cooked datacentre?

Now that all their equipment has been stress-tested to, what, 45C, 60C while powered on I wonder how long it will be before high failure rates happen.

Note that even (or especially) if you cut power at 40C ambient the internal temperatures still rise due to heat flow from the memory sticks and CPU modules.

0 0 Reply

Saturday 12th June 2021 05:26 GMT Anonymous Coward

"the building needed to be re-oxygenated"

Ah, ye'll be wanting to open a window, so ye will.

3 0 Reply

Saturday 12th June 2021 06:12 GMT aldolo

fart-in-a-jar martin works there

now his turn is over

0 0 Reply

Topics

Special Features

Vendor Voice

Resources

COMMENTS

[citation needed]

Re: [citation needed]

Re: [citation needed]

Re: [citation needed]

Re: [citation needed]

Re: [citation needed]

Re: [citation needed]

environment will be safe for re-entry within the next 30 minutes

As soon as the fire brigade arrives...

Re: As soon as the fire brigade arrives...

RE: pay for a second Availability Zone.

Once again, Single Point of Failure failed

Re: Once again, Single Point of Failure failed

Re: Did you say Commode Failure?

Re: Once again, Single Point of Failure failed

Re: all that extra redundancy means additional complexity

Re: Once again, Single Point of Failure failed

European Alternatives to Oligopoly

Re: European Alternatives to Oligopoly

I’ll get my coat

cooked datacentre?

"the building needed to be re-oxygenated"

fart-in-a-jar martin works there

POST COMMENT House rules

Enter your comment

Add an icon

Other stories you might like

US-EAST-1 region is not the cloudy crock it's made out to be, claims AWS EC2 boss

AWS must pay $525M to cloud storage patent holder, says jury

911 goes MIA across multiple US states, cause unclear

Snowmobile, Amazon's truck-powered migration service, reaches the end of the road

Irish power crunch could be prompting AWS to ration compute resources

UK govt office admits ability to negotiate billions in cloud spending curbed by vendor lock-in

AWS severs connection with several hundred staff

Amazon to lure upstarts with $500K in AWS AI credits each

Sacramento airport goes no-fly after AT&T internet cable snipped

GenAI will be bigger than the cloud or the internet, Amazon CEO hopes

Cyberattack hits Omni Hotels systems, taking out bookings, payments, door locks

Datacenter outages are on the decline, but when they hit, they hit hard

About Us

Our Websites

Your Privacy