back to article Microsoft admits 'power issue' downed Azure services in West Europe

Microsoft techies are trying to recover storage nodes for a "small" number of customers following a "power issue" on October 20 that triggered Azure service disruptions and ruined breakfast for those wanting to use hosted virtual machines or SQL DB. The degradation began at 0731 UTC on Friday when Microsoft spotted the …

  1. elsergiovolador Silver badge

    Future

    If we don't upgrade our grid and infrastructure, the future is clear:

    - Our database cluster is down!!!

    - Oh where?

    - In the UK region.

    - What time is there?

    - It's nearing 5.

    - Damn, they put their kettles on!

    1. Missing Semicolon Silver badge

      Re: Future

      Kettles are at 3:30. (and 10am of course).

      1. spireite

        Re: Future

        In my office, kettle time was as 15 minutes on average.

    2. Captain Scarlet
      Coat

      Re: Future

      Half way through Emerdale, Eastenders and Cornation Street is when we have peak Kettle demand.

      1. Anonymous Coward
        Anonymous Coward

        Re: Future

        Less so in these days of streaming. When Dinorwig was built, it was a massive thing.

        1. John Brown (no body) Silver badge
          Coat

          Re: Future

          "When Dinorwig was built, it was a massive thing."

          Has it got smaller with use? Worn down a bit? It was still massive when I last visited a few years ago :-)

    3. Zippy´s Sausage Factory
      Meh

      Re: Future

      Thing is all the power companies are privatised now. Meaning the shareholders get paid first, and improvements are only achieved by going to government with the begging bowl in hand.

      Which is much the same as before they were privatised, but now they're more "efficient". For the shareholders, obviously, not the customers.

      1. Anonymous Coward
        Anonymous Coward

        Re: Future

        Agreed. The current electricity market is insane. And then there's the constant battle between the generators trying to find new ways to game the balancing system and ESO/Ofgem trying to stop them.

  2. Anonymous Coward
    Anonymous Coward

    I wonder if this explains the extreme lag and multi-second freezes on our Azure VMs on Friday, and the need to get one of them rebooted by our overworked hardware guys, as the remote desktop session was stuck on saying "Please Wait"

    If these go down completely, then at least I have an excuse to stop working, but intermittent slowdowns are just like water torture, where I have to slow my brain down to the speed of someone in sales and marketing, so that the connection has time to handle the mouse clicks.

  3. Pascal Monett Silver badge
    FAIL

    "A subset of those generators supporting that section failed to take over as expected"

    That sounds suspiciously close to "we designed a failover system, but we didn't test it".

    That susbset of generators, might they have been missing fuel because nobody thought to fill the tanks ?

  4. Eye Know

    If only it were so simple

    We had a VM go down, Azure support were non-existent, a manager emailed after some time to say they had no one available to help, this went on for most of the day, I gave up at 7pm on Friday night.

    In the end I found a fix. The VM would not start because the load balancer's IP address was marked as already in use by the downed VM itself.

    After some screenshots of the load balancer I deleted it and recreated, and the VM finally started. Two minutes later I finally got a call from Azure support, so I compared notes and closed the case.

  5. Anonymous Coward
    Terminator

    An upstream utility disturbance ?

    "Due to an upstream utility disturbance, we moved to generator power for a section of one datacenter at approximately 0731 UTC. A subset of those generators supporting that section failed to take over as expected during the switch over from utility power, resulting in the impact."

    Shouldn't these storage nodes be designed to self recover in the event of an “upstream utility disturbance”.

    To speculate: The diesel generators failed to kick-in as they'd been contaminated with water. Water collects in the bottom of the fuel tank and when it's level rises to the level of the fuel take-off pipe, the water gets sucked into the diesel engine. I guess no-one was tasked with regularly checking the tanks. Generally uninterruptible power supplies (UPS) have about five minutes of battery power before the generators kick-in. Maybe this should be extended to a good few hours or a whole day.

    "Decades of innovation, investment and better management mean that, overall, critical IT systems, networks and datacenters are far more reliable than they were."

    If you knew how unreliable critical infrastructure really was you wouldn't sleep nights. Precisely down to lack of investment. You end-up with something like this:

    The passenger info boards went down in a major airport cause a digger dug-up the fibre cable to the nearest cloud provider. Why didn't they have a back-up cable you may ask. They did, except the back-up cable went through the same pipe as the primary.

    1. Anonymous Coward
      Anonymous Coward

      Re: An upstream utility disturbance ?

      I think it's simpler. I bet the power control systems are based on Windows, and somewhere a modal popup window had appeared, as usual hidden BEHIND all the windows on the screen (beause that's where it can do the most damage) which basically stopped the controls from working until someone found it and clicked. As it's Microsoft, it was probably over something useless like 'do you want to see more advertising?'.

      Or the controls were not updated and it got stuck on Windows Vista's "You moved the mouse. Accept Y/N?"

      The physical aspects are fixable. Microsoft's products are not.

    2. John Brown (no body) Silver badge

      Re: An upstream utility disturbance ?

      "Maybe this should be extended to a good few hours or a whole day."

      There was a story on El Reg sometime over the last week or three saying MS or AWS was looking at replacing diesel gennys with larger battery banks for their backup UPS. Not a bad idea in the face of it, but a battery backup will have an extremely sharp cut-off point when the batteries run out, whereas a diesel genny can be refuelled indefinitely in the case of a prolonged outage.

  6. DCdave
    Joke

    Any suggestion

    ...that the control systems were running on an Azure VM affected by the outage are pure speculation, if not entire fiction on my part.

  7. PeterM42
    FAIL

    Ah! - the pleasures of cloud computing.

    You know it makes sense!

    - OH! WAIT!........

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like