back to article Google, Oracle clouds still affected by UK heatwave

Cloud outages at Google and Oracle caused by the UK's heatwave have ended, but users have been warned some problems persist. Google's incident report states that the "cooling related failure" was resolved at 2043 Pacific Time on Tuesday (0343 UTC). But the ad giant's update also reports: "A small number of HDD backed …

  1. Pascal Monett Silver badge

    "pride themselves on over-engineering to ensure resilience"

    Pride themselves ?

    More like talk about it.

    Over-engineering costs money. The CloudTM is about making money, not spending it.

    And, as this episode has proven, they didn't over-engineer their cooling systems, they put in the bare minimum to ensure "normal" operations.

    Well they're going to have to go back and over-engineer that part a bit, because I'll wager that next summer, they'll be seeing those temperatures again.

  2. Anonymous Coward
    Anonymous Coward

    The Cloud can't work without clouds?

    Maybe it's time to re-launch the Sun server line?

    1. Neil Barnes Silver badge

      Re: The Cloud can't work without clouds?

      Set the controls for the heart of the sun! https://www.youtube.com/watch?v=8RbXIMZmVv8

      1. Ken Moorhouse Silver badge

        Re: Set the controls for the heart of the sun

        I wonder how the servers in New Guinea are coping...

        https://www.youtube.com/watch?v=16V-wNwlTw0

        Obscured By Clouds indeed

  3. elsergiovolador Silver badge

    Idling

    All those servers running idle "just in case there is a surge", surely don't help the environment?

    1. Tom Chiverton 1 Silver badge

      Re: Idling

      Their not drawing much power till switched on. Why do you think AWS is always banging on about how quick their VMs boot ?

    2. localzuk Silver badge

      Re: Idling

      That's not really how it works. "Spare" capacity will mostly be in a low power mode, aka switched off. As demand grows, batches of server swill be spun up in preparation for services to run on them.

      So, only a small number, comparatively, will be online doing very little.

      Compared to running it all in house, where you end up only using 10% of the resources 90% of the time.

      1. elsergiovolador Silver badge

        Re: Idling

        No you are wrong. If a server is running hypervisor, you can't really run it in low power mode if it has at least one VM up.

        Sure, completely empty server may be put to sleep, but do corporations like AWS release their VM distribution strategy?

        You wouldn't fill one server to the brim with VMs if you want to impress clients with performance. That would be somewhat distributed uniformly, which means many servers running low numbers of VMs as opposed to few servers running maximum number of VMs.

        Then you have storage servers that if being used by at least one VM can't really go to sleep either.

        1. Anonymous Coward
          Anonymous Coward

          Re: Idling

          >You wouldn't fill one server to the brim with VMs if you want to impress clients with performance...

          Actually this is almost exactly what you do. Almost all serious large-scale schedulers make use of scheduling strategies that are based on (or at least influenced by) gang-scheduling or co-scheduling. This is because the hypervisor already strongly constrains the CPU use of the guest OSs - you don't want your tenants getting "free" CPU, and nor do you want their performance to degrade based on factors beyond their control. So there is no benefit to naive, even distribution of VMs.

          In fact it's worse, because that near-even distribution will maximise any cross-VM communication costs, potentially bottlenecking your VMs, mostly sold per-CPU or per-RAM on a non-CPU/RAM factor.

    3. Anonymous Coward
      Anonymous Coward

      Re: Idling

      >All those servers running idle "just in case there is a surge", surely don't help the environment?

      They don't, but given the "spare" capacity of a cloud operator is its operating margin they are strongly incentivised to keep the figure as small as possible. They do a damned sight better job of it than anyone else on the planet.

  4. Mike Parris

    Fried Hardware

    It's looking bad.

    5 of my 6 sites running on Siteground have been down since yesterday and they cannot give me any ETA on a fix

    Siteground are using backups to transfer sites to their Amsterdam server farm.

    1. Mike Parris

      Re: Fried Hardware

      Ha, in the time it took to write my original comment, my sites started to run again. Thanks Siteground.

  5. Nate Amsden

    doesn't make sense

    Article says

    "Hard disks are seldom rated to run at more than 50 degrees Celsius, and as the mercury topped 40 degrees in London yesterday it’s not hard to imagine that temperatures became so hot that mechanical disks in a densely packed device faced an extreme environment and suffered more than solid state components."

    It's very hard to imagine in fact, it's not as if these systems are running under a big tree outside there is air moving in the facility, and at the very least air moving in the chassis.

    Add to that most modern spinning drives (at least the ones I have from Western Digital which are "Enterprise" SATA) are rated for 60C / 140F AMBIENT. My own personal drives I have run as hot as 91.5F ambient and the drive temps ranged from 107-120F. Not sure how hot the drives would be running 50F hotter ambient than that.

    Add to that such systems must have thermal shutdown features to protect the systems from damage regardless.

    Seems like many failures here.

    1. David Nash
      Boffin

      Re: doesn't make sense

      And for the rest of the world, so we can compare with the numbers in the article...

      91.5F ambient=33.06 C

      107-120F=41.7 - 48.9 C

      50F hotter = 27 degrees C hotter (I think! more tricky, I have seen youtube FAILs by converting absolute F to C when what they really need is a difference)

    2. NeilPost

      Re: doesn't make sense

      Sounds like hw (fan?)failure in a storage chassis causing it to cook. Perhaps miscondigured for auto-shutdown on temperature thresholds ?

  6. Anonymous Coward
    Anonymous Coward

    What kind of idiot designs a data center that isn't able to be cooled to below 60F when it's 120F outside?

    If you don't have to put on a coat to go in the server room, your server room is too hot.

    1. Ken Moorhouse Silver badge

      Re: If you don't have to put on a coat to go in the server room, your server room is too hot

      Office I used to work in was on the same A/C circuit as the server room (that's the one I've written about before, where the smell of curry becomes evident as the day wears on). Even on the hottest days I had goose-bumps. At the end of a working day I'd go out the building wearing jumper and jacket. People on the train undoubtedly thought I was a nutter (people in the office thought I was a nutter, but that's a different story). It was only when I was nearly home that the heat finally registered.

    2. Martin an gof Silver badge

      If you don't have to put on a coat to go in the server room, your server room is too hot.

      I've never really understood that. Surely the key metric is component temperature within the server and supply air flow through the server. A single server with plenty of air flow could conceivably work perfectly well in a room at 25 or 30C. Surely the job of the room climate control is to ensure the server doesn't overheat and while in the Bad Old Days the eaiest way to do that was to keep the whole room at coat-on temperatures, these days there must be better solutions? A warmer room can also save a heck of a lot of energy.

      The trick then is to make sure that when heat load does increase there is enough capacity in the a/c to cope.

      Disclaimer: not a server bod and never designed a data centre :-)

      M.

      1. Nate Amsden

        Most server components are designed to operate up to 40C / 104F have been for a very long time(10-15+ years). Some components can go well beyond that(and some others can't run at 40C).

        I think Amazon did a test back probably before 2008 running I think a rack of HP hardware literally outside under a tree or something for some period of time just to see how it handled the temperatures/humidity and even dust etc, and from what I recall it worked fine.

        Microsoft had an incredibly innovative data center pod design(IT PAC) many years ago (I don't know if they ever used it at any sort of scale)

        https://www.youtube.com/watch?v=S3jd3qrhh8U

        Many hyper scalers at least at one point(probably still do in some cases) like(d) to run their stuff at 30C+ / 90F to reduce their cooling costs since the hardware can handle it, but of course gives less margin of error when there is an issue. Certainly would suck to work in such a facility.

    3. Anonymous Coward
      Anonymous Coward

      I take it you haven't been a modern datacentre recently?

  7. NeilPost

    Rust Pedantry

    “ Google understands the frailties of spinning rust so will presumably have replicated data on multiple devices and understand how to recover the devices. If data is lost, it will dent the G-Cloud's reputation.”

    Although it was a jovial sound-bite,…. Aluminium drive platters don’t rust - esp. sealed in a drive unit.

    1. Anonymous Coward
      Anonymous Coward

      Re: Rust Pedantry

      The rust referred to isn't the platter itself rusting. It refers to the thin magnetic coating, originally being iron (III) oxide (ie rust) that the platters are coated with.

      As far as I can tell it has been a long time since it's been actual rust and modern platters I believe are coated with an cobalt-based alloy.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like