back to article Microsoft Azure challenges AWS for downtime crown

Microsoft Azure has been experiencing a global outage since around 1600 UTC, or 0900 PDT on Wednesday, October 29, 2025. The company expects that services will be fully restored by 23:20 UTC, or about 16:20 PDT this afternoon. The outage is occurring somewhat inconveniently as Microsoft reports its FY26 Q1 earnings, during …

  1. David 132 Silver badge
    Black Helicopters

    How many outages is that now?

    Are we at the "happenstance", "coincidence", or "enemy action" stage of the count now?

    1. Anonymous Coward
      Anonymous Coward

      Re: How many outages is that now?

      Regardless of the stage, brittle & fragile systems with critical dependencies globally …. that seem to complicated to manage.

    2. phils

      Re: How many outages is that now?

      I'm still at Hanlon's razor. Assuming stupidity, not malice.

    3. Dave559

      Re: How many outages is that now?

      So, where can we place bets as to which of Oracle or Google that it will be next week…?

      (If they’re being hacked failing in alphabetical order, then it’ll be Google next, right?)

    4. Rich Harding

      Re: How many outages is that now?

      You left out the intentional use of "friendly fire" to wake a few important people up to the current, unsustainable situation.

  2. IGotOut Silver badge

    Reverted to last know good.

    Customers are advised to roll back their clocks to 2003 in order to to access the services.

    1. Strahd Ivarius Silver badge
      Facepalm

      Re: Reverted to last know good.

      redirect traffic from Azure Front Door to their own servers

      2003 being the last time they had their own servers...

  3. Phil E Succour
    FAIL

    They should have asked co-pilot to handle the reconfiguration for them..

    Oh wait, perhaps they did?

    1. elsergiovolador Silver badge

      Re: They should have asked co-pilot to handle the reconfiguration for them..

      They almost certainly didn't add "This time ensure the change works."

  4. spireite
    Trollface

    They'll both employ Trump to promote them.

    After all, nobody knows cloud like he does - it contains many recoverable versions of the Epstein files.

    1. John Brown (no body) Silver badge

      Yeah, Trump is good at shouting at clouds!

  5. MatthewSt Silver badge
    Mushroom

    SPOF

    So much for rolling out changes region by region...

    1. Anonymous Coward
      Anonymous Coward

      Re: SPOF

      Well, it was nice of them to mostly create borkage during the USA region's main working hours for a change (apologies to Canada, Mexico and South America for the collateral damage), rather than screwing up everyone in Europe's morning coffees as the unwilling 'beta testers' (when leftpondia is blissfully still asleep), as usually seems to happen!

      On the other hand, it seems that the USA potentially no longer needs to actually physically invade a country to bring down governments and inflict regime change: they can just bring down their parliamentary voting systems instead (who on earth thought that it would be sensible to outsource - and presumably remotely host - something as mission-critical as that?!!).

      "Meanwhile, business at the Scottish Parliament was suspended because of technical issues with the parliament's online voting system.

      The outage prompted a postponement of debate over land reform legislation that could allow Scotland to intervene in private sales and require large estates to be broken up.

      A senior Scottish Parliament source told BBC News they believed the problems were related to the Microsoft outage."

      1. mirachu Bronze badge

        Re: SPOF

        If a parliament is using a cloud based voting system without a backup (paper still exists) they desrve what they get.

        1. Anonymous Coward
          Anonymous Coward

          Re: SPOF

          And they should check afterwards that nobody did change the votes...

  6. williamyf Bronze badge

    Homogeneous Hybrid+Multicloud is the answer

    The kerfuffle with AWS and microsoft is another evidence that most companies are doing "cloud" wrong.

    1.) A company needs to start with a solid HYBRID cloud strategy, where your most critical workloads never leave your DCs, and non critical workloads can move seamlessly between public (to free up resources en your DC for critical workloads) and private (to save on costs) Cloud

    2.) You need to have a multi-cloud strategy for your public cloud, where your workloads can move seamlessly from public cloud A to public cloud B.

    Problem is, many companies do not do this, and most of the ones that do, do it wrong.

    There are very few solutions that allow this to happen, and Amazon and Google ain't in the list. The biggest contenders are OpenStack and Azure (yes, even with this outage).

    If you do multicloud with AWS and GCP, you need to target the minimum common denominator AND re-invent multiple wheels. ¿OpenStack? "Mostly" the same everywhere. ¿Azure? The same everywhere.

    People say OpenStack is hard, and that's true (I know, I was an OpenStack Technical trainer), but is not harder than all the code and procedures to make two dissimilar clouds (like AWS and GCP) to dance together, let alone put them in your DC for the HYBRID part...

    Telcos and small cloud providers worldwide are very invested in OpenStack, it should not be hard to find two good ones for your Multi-Cloud needs.

    Ditto for Azure providers that use their own servers (and therefore, where not affected today), instead of reselling Microsoft services.

    Good luck in your migration, and if you need more help, contact me.

    1. Nate Amsden Silver badge

      Re: Homogeneous Hybrid+Multicloud is the answer

      that sounds like a lot of work and costs to try to make things seamless, just keep it simple if your critical stuff is on prem keep the non critical stuff there too, it won't cost much more, you'll probably end up saving a bunch anyway because you won't need all that extra work to wrangle multiple clouds and complex things like OpenStack and keep in mind you can't oversubscribe in any of the hyperscale IaaS clouds, you pay for what you provision not what you use. Unless you have a really good handle on provisioning stuff and deprovisioning when it is not in use on cloud providers(most don't, hell there are often dedicated roles for people that do nothing more than cost analysis/management for public cloud at some companies), vs on prem you can just let stuff sit idle, CPU/disk isn't used much so the capacity can be used elsewhere, memory is still used to some extent.

      At the end of the day it depends on the situation, no company I have worked at in the last 25 years would benefit from anything other than on prem. Though there have been PLENTY of people at different companies that WANTED to use public cloud, really for no other reason because they thought it was cool and "on trend"(same sort of folks pushing for kubernetes which solves problems that we didn't have). Others WANTED to use public cloud because they thought it would be cheaper but in the end they were proven wrong(by a hysterical amount of money).

    2. Claptrap314 Silver badge

      Re: Homogeneous Hybrid+Multicloud is the answer

      I realized, about twenty-five years ago, that the answer to EVERY interesting problem in IT is "it depends". I don't know if this has ever been more true than for the case of cloud architectures.

      1) Early stage startups are going to see a substantial negative ROI for doing a hybrid cloud. Where does the line cross? A long, long way out. Hybrid cloud means that you need one team who knows what they are doing for each "style" you are using. You're bleeding $100k/month to avoid a problem where the MTBF is more than a year? Not happening.

      2) Is Netflix still 100% AWS? It's been a while, but they certainly were well past the point that the cloud-avoidant would consider reasonable. Their CTO was not about wasting money.

    3. GNU Enjoyer
      Angel

      Re: Homogeneous Hybrid+Multicloud is the answer

      Someone else's computer is always more expensive than hosting yourself if you are hosting something for more than 3 months.

      If a workload is non-critical, I'm sure it can run fine on your computer(s) as well.

      Clown hosting is only ever useful for temporary testing if you don't have a suitable computer available at the moment.

    4. Anonymous Coward
      Anonymous Coward

      Re: Homogeneous Hybrid+Multicloud is the answer

      Ha ha hah ha, haven't laughed that loud in a long time. According to you the answer to increasing up time is to make things more complicated. Oh my. And let's create multiple single points of failure so that your apps fail if any one of the major players has an outage. Hah ha ha.

      Remind me not to call you when I want to increase reliability of my apps.

      Multi-cloud is a BS idea that increases cost, complexity, and potential failures at soooo many levels. By all means do multi-cloud, but not for those reasons.

      Architect things properly for the reliability you need, and forget the utopia of 100% up time - you don't need it and you certainly can't afford it. Plan for failure regardless of where your systems are and you'll be in a good (better) place.

  7. Pascal Monett Silver badge

    "due to their reliance on the Microsoft Azure platform"

    Well there's your problem. Stop relying on someone else's server and bring back your expertise in-house and secure.

    Oh, that would cost you too much ? Ain't that a shame. Too bad you didn't think about that when you shut down your own servers and fired your admins.

    I wonder if you're going to go back to that really smart guy who pushed for using The CloudTM and ask him for his bonus back ?

    Nah ? Didn't think so . . .

    1. Fred Daggy

      Re: "due to their reliance on the Microsoft Azure platform"

      ... no ... there's more.

      Burstable workloads are much better in the cloud. Specifically, a baseload "on-premises" and then overloading to the cloud.

      But that probably only works for the mega-corps. Anyone doing less than ca 10-billion zorkmids is not going to burst that much traffic. Not an edge case, but not far off.

    2. Anonymous Coward
      Anonymous Coward

      Re: "due to their reliance on the Microsoft Azure platform"

      Why does owning hardware increase reliability? Spoiler alert, it doesn't, but it does make some people 'feel' more secure. Don't care if it is AWS, Azure or some GSI, or even my own company...owning hardware has absolutely zero to do with reliability, other than perhaps being inversely proportional to.

      Own the reliability of your app and the people/skills to keep it running and recover it WHEN it DOES go wrong, because it will!! Owning hardware is for people who are insecure in their ability to manage their own systems properly.

      1. Anonymous Coward
        Anonymous Coward

        Re: "due to their reliance on the Microsoft Azure platform"

        Owning and properly managing hardware is for companies that keep seeing outages every week (looking at you, MS) that impact their business, from cloud providers who don't care at all about them.

        When you are in a factory, you can't have deliveries delayed because someone half the world away made an oopsie and now you can't print the order form because some bigwig decided to go all cloudy.

  8. Claptrap314 Silver badge
    FAIL

    Azure has never given up the crown

    A single dies horribilis for AWS doesn't even begin to challenge Azure.

  9. Felonmarmer

    "Microsoft says it has reverted to its "last known good" configuration"

    If only they did the same with the operating system.

    1. Roland6 Silver badge

      It probably is the same as the OS, just as we know restoring to a previous checkpoint isn’t without problems and far too frequently only restores the system so that you can do a repair reinstall rather than a factory reset.

    2. WolfFan Silver badge

      They’d lose too much stuff if they stepped back to 2009.

      1. seven of five Silver badge

        but nothing of value

  10. Anonymous Coward
    Anonymous Coward

    I assume there’s an Azure Back Door for completeness?

    1. Anonymous Coward
      Anonymous Coward

      That would have to be called Azure Back Orifice, I guess… >:-)>

  11. Blue Screen of Bleurgh

    "Rohit Chopra, a former FTC Commissioner and a former director of the Consumer Financial Protection Bureau, in a social media post, said the recent AWS and Azure outages have created chaos in the business community.

    "We need to accept that the extreme concentration in cloud services isn't just an inconvenience, it's a real vulnerability," he said."

    -----------

    Who needs a nuclear arms race these days, when all you have to do is pull a plug, flick a switch or upload some non-QA code into an already flaky cloud service, and wait for the House of AWS/Azure Cards to come tumbling down with an already mighty crash and taking half the world's major websites with it.

    But not to worry - Microsoft and Amazon top brass still feel the need to bin thousands of white-collar coders and data centre grunts with shedloads of hands-on experience, and replace them with AI bots and freshers. What could possibly go wrong!

    1. Strahd Ivarius Silver badge
      Devil

      Why would MS keep American white collars managing the DoD cloud, when Chinese (not Taiwanese) ones are cheaper?

  12. telveer

    These is a subtle difference.

    AWS recent outage was in us-east-1, a specific region. Our apps operating in us east 2 were not affected.

    We aso use Azure Azires outage yesterday and a few weeks ago was global because their global service Front Door broke. This means you are impacted regardless of the Azure region you operate from.

    1. Expect Great Things
      Headmaster

      The impact of a us-east-1 Route 53 outage can be global, though. To quote Amazon:

      Several AWS services create resources that provide a resource-specific DNS name(s). For example, when you provision an Elastic Load Balancer (ELB), the service creates public DNS records and health checks in Route 53 for the ELB. This relies on the Route 53 control plane in us-east-1. Other services that you use might also need to provision an ELB, create public Route 53 DNS records, or create Route 53 health checks as part of their control plane workflows. For example, provisioning an Amazon API Gateway REST API resource, Amazon ELB load balancer, or an Amazon OpenSearch Service domain all result in creating DNS records in Route 53. The following is a list of services whose control plane depends on the Route 53 control plane in us-east-1 to create, update, or delete DNS records, hosted zones, and/or create Route 53 health checks. <long list of Amazon services follows>

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon