back to article Kubernetes kicks down Azure Front Door

If you struggled to access the Azure Portal or Microsoft Entra this morning, you weren't alone – Microsoft has blamed a Kubernetes crash for the outage. The Windows giant noted problems from 0740 UTC, with multiple regions around the world reporting issues with its services. "Our monitoring detected a significant capacity …

  1. Alister

    It wasn't just accessing Portals or Entra, any website behind a FrontDoor was suffering outages as well.

    I have a lot of unhappy clients today.

    1. IFYates

      Yes, for a service design to give "higher availability, reduced latency, increased scalability", it's caused us many hours of grief today. We'll not know how much business we've lost EOD, but it's a real blow to MS' trust for us.

      It didn't help that it took them 4 hours to even report it on the Azure Service Health page, leading us to think we had networking issues until I spoke to our MS handler, who immediately told us it was a known issue.

      1. Jibberboy2000

        Trust???

        … is that what you had in MicroSoft, ahhhh bless …. obviously you have not been working long enough with them yet!

        Prepare for more on going disappointment… sorry for your loss

        1. Screepy

          Re: Trust???

          Yeah it whacked our org website as well.

          Also had issues with our InTune portal.

          Was messy indeed. Web dev lobbing tickets at infrastructure team for two hours this morning. Infra team desperately trying to work out if it's us or a wider problem.

          MS really need to sort out their status pages, it's infuriating troubleshooting something that they just haven't admitted to yet.

          1. OhForF' Silver badge
            Boffin

            Re: Trust???

            A dedicated professional infra team with access to all the internal information and documentation needing hours to figure out if the problem is on their side or the network or a service provider speaks volumes about the design of modern systems.

            1. Anonymous Coward
              Anonymous Coward

              Re: Trust???

              OTOH it deprecated our BOFH excuse calendar.

              If some of the (non MS) sites show MS Problems, we can always say: There is a MS outage. Not our problem.

              Then we can verify, that it is really not our problem, but the curses are going in the redmond direction.

              Win-Win

  2. Essuu
    Holmes

    Bitten by Bitnami?

    We've had a lot of fun dealing with Broadcom's changes at Bitnami causing problems in Kubernetes deployments.

    Maybe Microsoft engineers didn't read the smallprint...

  3. Folda

    Not Microsoft's fault?

    The fault took down their Front Door CDN services. I can understand using Kubernetes for the control planes, but why does AFD run on Kubernetes? A CDN needs to have low level hardware access so that it can control TCP settings, optimise for hardware offloading, etc. The whole machine is dedicated to the job, so there isn't really a need for any containerisation. This fault should have been an inconvenience at best - not allowing AFD policies to be updated. It shouldn't have taken down the entirety of Microsoft's CDN infrastructure.

    1. Rob F

      Re: Not Microsoft's fault?

      This reminds me of the VMware PSOD from the E1000 Nics where the VM sent wonky commands to the kernel and caused the node to topple over. Got even better when the same problematic recovered to the next node and caused that to pop. One client ended up having the entire production environment dirty crashed because of a VPN concentrator. That was a fun explanation when it happened twice.

      1. Just a geek

        Re: Not Microsoft's fault?

        There was a similar issue on Broadcom FC cards. They had some sort of counter bug, once you sent a certain amount of data the card would crash bringing the host down. The traffic load would fail on one card, the next would pick it up, that would fail so the host would crash, the VM's would fail over then start the cycle again.

    2. Charlie Clark Silver badge

      Re: Not Microsoft's fault?

      Actually, I don't think it matters which bits of software were running on which systems. Microsoft contracts to manage all this for its paying users and pretends to ensure that it has the developers and testing programmes to make sure this is the case. It failed, should apologise to all, and offer some kind of recompense to those affected.

      Is anyone keeping score of outages at the various gatekeepers? I seem to recall that Microsoft is handily out front…

  4. cb7

    Put everything in the cloud. It's more resilient. We won't have to buy and maintain the hardware. Or the software. Someone else will have the headache of keeping it all running.

    The cloud is the future.

    What could possibly go wrong?

    1. Yorick Hunt Silver badge
      Trollface

      Don't worry, they just need to sprinkle some more AI onto it; that'll fix it for sure!

    2. Mike007 Silver badge

      Obviously a 2D chess player...

      If an airline loses their IT systems and has to cancel all flights, they have a major PR problem. When every airline goes down at the same time because an AV vendor pushed an untested update, they just take a small hit to their short term profits. People don't even really blame them for the fact that they are still offline a week after everyone else has fully recovered.

      This is why 3D chess players put their eggs in whatever basket everyone else is using.

    3. Jibberboy2000

      Cloud is the future ….

      …. just legacy code and archaic thinking patterns of always on blotted servers don’t fit into the cloud, and they should all be retired as soon as possible

  5. Just a geek

    We had cert issues across the intune page and I did wonder if that caused part of the stack to fall over.

  6. IGnatius T Foobar !

    Incompetent penguin wranglers

    If you struggled to access the Azure Portal or Microsoft Entra this morning, you weren't alone – Microsoft has blamed a Kubernetes crash for the outage.

    That means they're running it on Linux. Good for them, but leave it to Microsoft to find a way to make even Linux unreliable.

    1. Sparky7

      Re: Incompetent penguin wranglers

      Huh? Kubernetes runs on Windows as well

  7. Cloudia Shiffer

    My Azure portal sessions were fine - logged in at 08:15. However the webcert showed it expired in July25 yet carried on working. odd?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like