How many outages is that now?
Are we at the "happenstance", "coincidence", or "enemy action" stage of the count now?
Microsoft Azure has been experiencing a global outage since around 1600 UTC, or 0900 PDT on Wednesday, October 29, 2025. The company expects that services will be fully restored by 23:20 UTC, or about 16:20 PDT this afternoon. The outage is occurring somewhat inconveniently as Microsoft reports its FY26 Q1 earnings, during …
Well, it was nice of them to mostly create borkage during the USA region's main working hours for a change (apologies to Canada, Mexico and South America for the collateral damage), rather than screwing up everyone in Europe's morning coffees as the unwilling 'beta testers' (when leftpondia is blissfully still asleep), as usually seems to happen!
On the other hand, it seems that the USA potentially no longer needs to actually physically invade a country to bring down governments and inflict regime change: they can just bring down their parliamentary voting systems instead (who on earth thought that it would be sensible to outsource - and presumably remotely host - something as mission-critical as that?!!).
The outage prompted a postponement of debate over land reform legislation that could allow Scotland to intervene in private sales and require large estates to be broken up.
A senior Scottish Parliament source told BBC News they believed the problems were related to the Microsoft outage."
The kerfuffle with AWS and microsoft is another evidence that most companies are doing "cloud" wrong.
1.) A company needs to start with a solid HYBRID cloud strategy, where your most critical workloads never leave your DCs, and non critical workloads can move seamlessly between public (to free up resources en your DC for critical workloads) and private (to save on costs) Cloud
2.) You need to have a multi-cloud strategy for your public cloud, where your workloads can move seamlessly from public cloud A to public cloud B.
Problem is, many companies do not do this, and most of the ones that do, do it wrong.
There are very few solutions that allow this to happen, and Amazon and Google ain't in the list. The biggest contenders are OpenStack and Azure (yes, even with this outage).
If you do multicloud with AWS and GCP, you need to target the minimum common denominator AND re-invent multiple wheels. ¿OpenStack? "Mostly" the same everywhere. ¿Azure? The same everywhere.
People say OpenStack is hard, and that's true (I know, I was an OpenStack Technical trainer), but is not harder than all the code and procedures to make two dissimilar clouds (like AWS and GCP) to dance together, let alone put them in your DC for the HYBRID part...
Telcos and small cloud providers worldwide are very invested in OpenStack, it should not be hard to find two good ones for your Multi-Cloud needs.
Ditto for Azure providers that use their own servers (and therefore, where not affected today), instead of reselling Microsoft services.
Good luck in your migration, and if you need more help, contact me.
that sounds like a lot of work and costs to try to make things seamless, just keep it simple if your critical stuff is on prem keep the non critical stuff there too, it won't cost much more, you'll probably end up saving a bunch anyway because you won't need all that extra work to wrangle multiple clouds and complex things like OpenStack and keep in mind you can't oversubscribe in any of the hyperscale IaaS clouds, you pay for what you provision not what you use. Unless you have a really good handle on provisioning stuff and deprovisioning when it is not in use on cloud providers(most don't, hell there are often dedicated roles for people that do nothing more than cost analysis/management for public cloud at some companies), vs on prem you can just let stuff sit idle, CPU/disk isn't used much so the capacity can be used elsewhere, memory is still used to some extent.
At the end of the day it depends on the situation, no company I have worked at in the last 25 years would benefit from anything other than on prem. Though there have been PLENTY of people at different companies that WANTED to use public cloud, really for no other reason because they thought it was cool and "on trend"(same sort of folks pushing for kubernetes which solves problems that we didn't have). Others WANTED to use public cloud because they thought it would be cheaper but in the end they were proven wrong(by a hysterical amount of money).
I realized, about twenty-five years ago, that the answer to EVERY interesting problem in IT is "it depends". I don't know if this has ever been more true than for the case of cloud architectures.
1) Early stage startups are going to see a substantial negative ROI for doing a hybrid cloud. Where does the line cross? A long, long way out. Hybrid cloud means that you need one team who knows what they are doing for each "style" you are using. You're bleeding $100k/month to avoid a problem where the MTBF is more than a year? Not happening.
2) Is Netflix still 100% AWS? It's been a while, but they certainly were well past the point that the cloud-avoidant would consider reasonable. Their CTO was not about wasting money.
Someone else's computer is always more expensive than hosting yourself if you are hosting something for more than 3 months.
If a workload is non-critical, I'm sure it can run fine on your computer(s) as well.
Clown hosting is only ever useful for temporary testing if you don't have a suitable computer available at the moment.
Ha ha hah ha, haven't laughed that loud in a long time. According to you the answer to increasing up time is to make things more complicated. Oh my. And let's create multiple single points of failure so that your apps fail if any one of the major players has an outage. Hah ha ha.
Remind me not to call you when I want to increase reliability of my apps.
Multi-cloud is a BS idea that increases cost, complexity, and potential failures at soooo many levels. By all means do multi-cloud, but not for those reasons.
Architect things properly for the reliability you need, and forget the utopia of 100% up time - you don't need it and you certainly can't afford it. Plan for failure regardless of where your systems are and you'll be in a good (better) place.
Well there's your problem. Stop relying on someone else's server and bring back your expertise in-house and secure.
Oh, that would cost you too much ? Ain't that a shame. Too bad you didn't think about that when you shut down your own servers and fired your admins.
I wonder if you're going to go back to that really smart guy who pushed for using The CloudTM and ask him for his bonus back ?
Nah ? Didn't think so . . .
... no ... there's more.
Burstable workloads are much better in the cloud. Specifically, a baseload "on-premises" and then overloading to the cloud.
But that probably only works for the mega-corps. Anyone doing less than ca 10-billion zorkmids is not going to burst that much traffic. Not an edge case, but not far off.
Why does owning hardware increase reliability? Spoiler alert, it doesn't, but it does make some people 'feel' more secure. Don't care if it is AWS, Azure or some GSI, or even my own company...owning hardware has absolutely zero to do with reliability, other than perhaps being inversely proportional to.
Own the reliability of your app and the people/skills to keep it running and recover it WHEN it DOES go wrong, because it will!! Owning hardware is for people who are insecure in their ability to manage their own systems properly.
Owning and properly managing hardware is for companies that keep seeing outages every week (looking at you, MS) that impact their business, from cloud providers who don't care at all about them.
When you are in a factory, you can't have deliveries delayed because someone half the world away made an oopsie and now you can't print the order form because some bigwig decided to go all cloudy.
"Rohit Chopra, a former FTC Commissioner and a former director of the Consumer Financial Protection Bureau, in a social media post, said the recent AWS and Azure outages have created chaos in the business community.
"We need to accept that the extreme concentration in cloud services isn't just an inconvenience, it's a real vulnerability," he said."
-----------
Who needs a nuclear arms race these days, when all you have to do is pull a plug, flick a switch or upload some non-QA code into an already flaky cloud service, and wait for the House of AWS/Azure Cards to come tumbling down with an already mighty crash and taking half the world's major websites with it.
But not to worry - Microsoft and Amazon top brass still feel the need to bin thousands of white-collar coders and data centre grunts with shedloads of hands-on experience, and replace them with AI bots and freshers. What could possibly go wrong!
These is a subtle difference.
AWS recent outage was in us-east-1, a specific region. Our apps operating in us east 2 were not affected.
We aso use Azure Azires outage yesterday and a few weeks ago was global because their global service Front Door broke. This means you are impacted regardless of the Azure region you operate from.
The impact of a us-east-1 Route 53 outage can be global, though. To quote Amazon:
Several AWS services create resources that provide a resource-specific DNS name(s). For example, when you provision an Elastic Load Balancer (ELB), the service creates public DNS records and health checks in Route 53 for the ELB. This relies on the Route 53 control plane in us-east-1. Other services that you use might also need to provision an ELB, create public Route 53 DNS records, or create Route 53 health checks as part of their control plane workflows. For example, provisioning an Amazon API Gateway REST API resource, Amazon ELB load balancer, or an Amazon OpenSearch Service domain all result in creating DNS records in Route 53. The following is a list of services whose control plane depends on the Route 53 control plane in us-east-1 to create, update, or delete DNS records, hosted zones, and/or create Route 53 health checks. <long list of Amazon services follows>