It kinda shows how now we have a few mega corps running large parts of the internet (Google, Microsoft, AWS, Cloudflare) that at outage at one of them brings problems for lots of sites and services. And the whole selling point of the cloud is that these thinks aren't meant to happen.
Google Cloud goes down, takes Cloudflare and its customers with it
Google Cloud went down hard on Thursday, and took Cloudflare and some of its customers with it. Google published its first status update 11:46 Pacific Daylight Time (PDT) when it reported over 40 of its locations and 26 services were “experiencing impact due to Identity and Access Management Service Issue.” Google’s Cloud …
COMMENTS
-
-
Friday 13th June 2025 02:22 GMT Anonymous Coward
Re: SPOF
"Distributed sprockets." ¿Qué? I must be from Barcelona. ;)
The temperature here is an unusually a low 16°C which slows the little gray cells somewhat and I didn't immediately grok SPoF although I vaguely recalled about fifty years ago spof(f) was a euphemism in a remote antipodean part of the Empire, for the English euphemism 'toss.' Unsure whether that's related to the cricketer Fred Spofforth.
-
-
-
Friday 13th June 2025 16:44 GMT MachDiamond
Re: Obligatory XKCDs
I'm seeing a lot of those things from companies that have based their software products on Google API's. For all intents and purposes, those vendors are indistinguishable from Google. I'm setting up yet another single application computer to run Chrome since I've boxed myself into a corner by offering a popular service based on one of those vendors and the Google API's are written to require Chrome. I picked up a ChromeBox at an estate sale for $1 so I'm not out real money for the hardware. I've stopped pushing the services so I only do it by request and that's mainly for my longer term customers. The downtime and other failures make those services hard to know if they'll work when I need them. I don't have time to install an unannounced update when I'm out in the field and have limited time on site to get the work done I need to do.
-
-
Thursday 12th June 2025 21:42 GMT billdehaan
The cloud is just someone else's computer
Unfortunately, that someone else is in Nebraska, and he went out to eat just before his machine crashed, so your internet won't be back until he finishes his lunch.
Whenever a customer refused to pay for redundancy/backup because "we've never had a problem before/how likely is that to happen", I named their server spof.companyname.com or projectspof.companyname.com. Invariably, some curious executive would ask what "spof" mean, and I'd explain it mean "single point of failure". They'd then ask what would happen when, or if, that computer failed, I'd do a rendition of Monty Python's dead parrot sketch, and they'd demand to know why I "allowed" such an oversight.
Then I'd show the emails ordering me to not install a redundant setup, there would be a flurry of activity, and a few days later, budget would suddenly appear for redundancy.
It looks like Google managers are at the dead parrot sketch phase.
I'm sure they have huge redundancy accounted for, but it's always the weakest link in the chain.
-
Friday 13th June 2025 06:23 GMT Claptrap314
Re: The cloud is just someone else's computer
Not likey. The core tenant of SRE at Google when I was there was that outages happen, both planned and unplanned. So plan to have an unplanned outage during your planned outage. (Yo, dawg, I heard you like outages...) This is also known as "N+2" (redundancy).
The problem is that you practically have to be as good as Google at resilience to get it right. Almost none of their customers are, so as GCP spun up, there was a lot of effort in the direction of making N+1 good enough.
Certainly, SOMETHING went wrong, but I would be quite surprised indeed if it turned out to be a SPoF. Far more likely someone fat-fingered a configuration change, or mistuned an internal watcher which triggered a bad capacity change.
--
OTOH, I should probably add CDNs to EMS as pretty much the only businesses that it makes sense to be multi-cloud. If AWS has a big outage, Cloudflare doesn't want GCP-based systems to suffer, and vica versa. In other words, this feels like Cloudflare may have been insufficiently redundant.
-
Friday 13th June 2025 13:50 GMT Cris E
Re: The cloud is just someone else's computer
Agree, although there's no multi-cloud architecture to save you when google's own IAM fails and you can't log in to use one of their services.
BTW, our Google rep owned this one very early on, well before anyone online had an explanation. Some sort of caching code change went in and sent everything into an immediate tailspin. They rolled it back promptly, but it took a long time to get that propagated. To be worse, we're in us-central so our recovery took a longer time. They told us to change to us-west or wait it out, which was unwelcome. Like Cloudflare, I think a lot of companies might have learned a lot about the realities of their architectures this week. There should be some good lessons drawn from this in the coming weeks.
-
Friday 13th June 2025 08:42 GMT Steve Button
Re: The cloud is just someone else's computer
"The cloud is just someone else's computer"
That's just so funny and original. I'm going to write it down and use it myself.
I'd love to know what company you work for, because you sound like an absolute legend. A proper BOFH who really sticks it to the stupid management. Perhaps Dilbert?
A legend in your own lunch hour?
You don't need to bother to *convince* the management to put in redundancy, by working with them and putting together a *business case*. You simply keep hold of an email to Cover Your Ass to prove that you were *ordered* not to to it the correct way.
Well done.
-
-
-
Friday 13th June 2025 16:56 GMT MachDiamond
Re: The cloud is just someone else's computer
"Worse yet, a bunch of computers and networks managed by someone else."
A "somebody else" that doesn't care that your company is dead in water and losing money every minute. If you aren't at least 1% of their business, they can afford for you to use somebody else and not even notice so there's little point to spending money on customer support. Besides, when they are down, all of the phone lines are down, their people can't access Xitter to announce they are having an issue and state some sort of recovery estimate. Of course their own web site is down if they even maintain a System Status page as that's so old fashioned. Even if you switch, chances are that the company you switch to is a reseller of their services anyway.
In the US, there are three major operators of mobile phone hardware. Everybody else resells those services. There's often some court review of a buyout proposal where two will merge leaving only 2 tower operators remaining and those, so far, have been swatted down. The cost for another company to come along and compete is too high of a bar so any reduction in the number of players will be permanent. Adding one more now might slice the pie too thin for any of them to survive as they've raced to the bottom of pricing to be able to absorb any hits to their business.
-
-
-
Friday 13th June 2025 17:07 GMT Anonymous Coward
Re: The cloud is just someone else's computer
Totally agree with your tack ... BUT ... It can be somewhat difficult to convince C-Level people to spend more on the basis of a 'Might happen' !!!
We the techies know that it is the right thing to do and the impact of getting it wrong is huge ... BUT ... It is still difficult to convince 'others' who do not understand the real world in IT.
Rather than snipe and talk down to the OP ... maybe suggest how to convince people who do not want to be convinced !!!
:)
-
-
-
-
Thursday 12th June 2025 23:15 GMT BinkyTheMagicPaperclip
Re: So...
You claim on your business interruption insurance, or against an SLA with the service provider that actually involves them giving you cold hard cash and the right to terminate the contract with them without penalty.
That's assuming the SLA is actually any better than 'LOL, we'll do better next month'
-
-
-
-
Friday 13th June 2025 08:32 GMT Jamie Jones
Re: Eggs and Baskets
Totally missing the point there. Maybe this will help: https://dictionary.cambridge.org/dictionary/english/put-all-eggs-in-one-basket
The point is, all 6 of them are really just hobbiest machines, appropriate sizes for their task, and no way will one problem take them down at the same time.
Back when I was responsible professionally for hundreds of servers spanning the UK and upwards of tens of thousands of users, we didn't have anything like the budget of these huge cloud providers, and whilst individually these machines couldn't be guaranteed 100% uptime, we could easily guarantee we wouldn't lose the lot at once, and departments that had the budget for proper redundancy setups never lost anything. But then, we weren't using opaque terms like "cloud" and we were real people who could be directly shouted at by our employer. And never did someone who didn't even work for the company manage to take our systems offline.
I'm nothing special - most here would have similar experiences, which is why most would recommend not putting your critical stuff on someone else's computer.
-
-
-
This post has been deleted by its author
-
Friday 13th June 2025 12:23 GMT cookiecutter
Time to turn the phone off
What CFOs need to understand is...
If it's someone else's computer....it's someone else's problem.
Especially with the level of fuckwittery you get from Indian & south African 1st line support whose only interest is not admitting anything is going wrong, not escalating you to 3rd line and closing the call asap
-
Friday 13th June 2025 14:14 GMT JasonT
Yeah, it's someone else's computer...
I've must have had relatively bad luck, but the companies I have worked for who run their own "data centers" forget the part about depreciation where you are meant to dispose of the aged assets and replace them. Cloud providers have their problems, but they aren't hanging on to kit forever.