back to article Google Cloud goes down, takes Cloudflare and its customers with it

Google Cloud went down hard on Thursday, and took Cloudflare and some of its customers with it. Google published its first status update 11:46 Pacific Daylight Time (PDT) when it reported over 40 of its locations and 26 services were “experiencing impact due to Identity and Access Management Service Issue.” Google’s Cloud …

  1. mark l 2 Silver badge

    It kinda shows how now we have a few mega corps running large parts of the internet (Google, Microsoft, AWS, Cloudflare) that at outage at one of them brings problems for lots of sites and services. And the whole selling point of the cloud is that these thinks aren't meant to happen.

    1. ChrisElvidge Silver badge

      Wasn't he whole idea of the Internet (not Cloud, not World Wide Web) to be a distributed system so that it could route around problem nodes? The likes of Cloudflare running large parts of it was seen as a problem years ago.

      1. ecofeco Silver badge
        Pirate

        Well, it WAS... but you know, tech douche bros and all. ------------------------>>>>>>>>>>>

  2. Anonymous Coward
    Anonymous Coward

    Another month, another Google Cloud outage.

    It's not as bad as it was a few years ago but they still have the worst uptime of of the Big Three(TM)

    1. Anonymous Coward
      Anonymous Coward

      No, they haven't. The worst by far is Azure, which is down almost every other day, followed by AWS. GCP has been surprisingly resilient compared to those two.

      (We know because we run lots of services on all three).

      1. cookiecutter

        Maybe but who deleted the entire environment for an $80 billion hedge fund?

  3. Camilla Smythe

    SPOF

    Distributed sprockets.

    1. Anonymous Coward
      Anonymous Coward

      Re: SPOF

      "Distributed sprockets." ¿Qué? I must be from Barcelona. ;)

      The temperature here is an unusually a low 16°C which slows the little gray cells somewhat and I didn't immediately grok SPoF although I vaguely recalled about fifty years ago spof(f) was a euphemism in a remote antipodean part of the Empire, for the English euphemism 'toss.' Unsure whether that's related to the cricketer Fred Spofforth.

      1. Camilla Smythe

        Re: SPOF

        Hosting the sprockets for your gear box in separate locations may result in sub optimal or no performance.

  4. Steve Foster
    Thumb Up

    Obligatory XKCDs

    https://xkcd.com/908/

    and

    https://xkcd.com/2347/

    1. MachDiamond Silver badge

      Re: Obligatory XKCDs

      I'm seeing a lot of those things from companies that have based their software products on Google API's. For all intents and purposes, those vendors are indistinguishable from Google. I'm setting up yet another single application computer to run Chrome since I've boxed myself into a corner by offering a popular service based on one of those vendors and the Google API's are written to require Chrome. I picked up a ChromeBox at an estate sale for $1 so I'm not out real money for the hardware. I've stopped pushing the services so I only do it by request and that's mainly for my longer term customers. The downtime and other failures make those services hard to know if they'll work when I need them. I don't have time to install an unannounced update when I'm out in the field and have limited time on site to get the work done I need to do.

  5. billdehaan

    The cloud is just someone else's computer

    Unfortunately, that someone else is in Nebraska, and he went out to eat just before his machine crashed, so your internet won't be back until he finishes his lunch.

    Whenever a customer refused to pay for redundancy/backup because "we've never had a problem before/how likely is that to happen", I named their server spof.companyname.com or projectspof.companyname.com. Invariably, some curious executive would ask what "spof" mean, and I'd explain it mean "single point of failure". They'd then ask what would happen when, or if, that computer failed, I'd do a rendition of Monty Python's dead parrot sketch, and they'd demand to know why I "allowed" such an oversight.

    Then I'd show the emails ordering me to not install a redundant setup, there would be a flurry of activity, and a few days later, budget would suddenly appear for redundancy.

    It looks like Google managers are at the dead parrot sketch phase.

    I'm sure they have huge redundancy accounted for, but it's always the weakest link in the chain.

    1. Claptrap314 Silver badge
      Boffin

      Re: The cloud is just someone else's computer

      Not likey. The core tenant of SRE at Google when I was there was that outages happen, both planned and unplanned. So plan to have an unplanned outage during your planned outage. (Yo, dawg, I heard you like outages...) This is also known as "N+2" (redundancy).

      The problem is that you practically have to be as good as Google at resilience to get it right. Almost none of their customers are, so as GCP spun up, there was a lot of effort in the direction of making N+1 good enough.

      Certainly, SOMETHING went wrong, but I would be quite surprised indeed if it turned out to be a SPoF. Far more likely someone fat-fingered a configuration change, or mistuned an internal watcher which triggered a bad capacity change.

      --

      OTOH, I should probably add CDNs to EMS as pretty much the only businesses that it makes sense to be multi-cloud. If AWS has a big outage, Cloudflare doesn't want GCP-based systems to suffer, and vica versa. In other words, this feels like Cloudflare may have been insufficiently redundant.

      1. m4r35n357 Silver badge

        Re: The cloud is just someone else's computer

        Tenet?

        1. KarMann Silver badge
          Terminator

          Re: The cloud is just someone else's computer

          David Tenet? Wait, no, that's not right…

      2. Cris E

        Re: The cloud is just someone else's computer

        Agree, although there's no multi-cloud architecture to save you when google's own IAM fails and you can't log in to use one of their services.

        BTW, our Google rep owned this one very early on, well before anyone online had an explanation. Some sort of caching code change went in and sent everything into an immediate tailspin. They rolled it back promptly, but it took a long time to get that propagated. To be worse, we're in us-central so our recovery took a longer time. They told us to change to us-west or wait it out, which was unwelcome. Like Cloudflare, I think a lot of companies might have learned a lot about the realities of their architectures this week. There should be some good lessons drawn from this in the coming weeks.

      3. Meeker Morgan

        Re: The cloud is just someone else's computer

        Google Cloud is your enemy's computer, even if you work there.

        The only exception is if you are on their board of directors.

    2. Steve Button

      Re: The cloud is just someone else's computer

      "The cloud is just someone else's computer"

      That's just so funny and original. I'm going to write it down and use it myself.

      I'd love to know what company you work for, because you sound like an absolute legend. A proper BOFH who really sticks it to the stupid management. Perhaps Dilbert?

      A legend in your own lunch hour?

      You don't need to bother to *convince* the management to put in redundancy, by working with them and putting together a *business case*. You simply keep hold of an email to Cover Your Ass to prove that you were *ordered* not to to it the correct way.

      Well done.

      1. m4r35n357 Silver badge

        Re: The cloud is just someone else's computer

        Good advice does not go out of "fashion".

        To reiterate: "The cloud is just someone else's computer".

        1. Cris E

          Re: The cloud is just someone else's computer

          Worse yet, a bunch of computers and networks managed by someone else. Plenty of ways to mess that up, much worse than individual boxen.

          1. MachDiamond Silver badge

            Re: The cloud is just someone else's computer

            "Worse yet, a bunch of computers and networks managed by someone else."

            A "somebody else" that doesn't care that your company is dead in water and losing money every minute. If you aren't at least 1% of their business, they can afford for you to use somebody else and not even notice so there's little point to spending money on customer support. Besides, when they are down, all of the phone lines are down, their people can't access Xitter to announce they are having an issue and state some sort of recovery estimate. Of course their own web site is down if they even maintain a System Status page as that's so old fashioned. Even if you switch, chances are that the company you switch to is a reseller of their services anyway.

            In the US, there are three major operators of mobile phone hardware. Everybody else resells those services. There's often some court review of a buyout proposal where two will merge leaving only 2 tower operators remaining and those, so far, have been swatted down. The cost for another company to come along and compete is too high of a bar so any reduction in the number of players will be permanent. Adding one more now might slice the pie too thin for any of them to survive as they've raced to the bottom of pricing to be able to absorb any hits to their business.

      2. Anonymous Coward
        Anonymous Coward

        Re: The cloud is just someone else's computer

        Totally agree with your tack ... BUT ... It can be somewhat difficult to convince C-Level people to spend more on the basis of a 'Might happen' !!!

        We the techies know that it is the right thing to do and the impact of getting it wrong is huge ... BUT ... It is still difficult to convince 'others' who do not understand the real world in IT.

        Rather than snipe and talk down to the OP ... maybe suggest how to convince people who do not want to be convinced !!!

        :)

    3. Anonymous Coward
      Anonymous Coward

      Re: The cloud is just someone else's computer

      In this particular case, the single point of failure seems to have been the source code and a config change

  6. captain veg Silver badge

    my heart bleeds

    Oh no, actually it doesn't.

    Cloudflare is a supplier of "bulletproof" hosting to criminals. I'm not inclined to sympathy to its customers.

    -A.

    1. Kevin McMurtrie Silver badge

      Re: my heart bleeds

      The very large customer you're probably referring to wasn't impacted. They recently moved their backend from Google Cloud to Huawei Cloud. Their domain name services and front ends are still alive and phishing on Cloudflare.

      1. Bob H

        Re: my heart bleeds

        Cloudflare hosts the majority of pirate websites and Cloudflare makes very little effort to do anything about that, unlike their competitors.

        1. Dinanziame Silver badge
          Pirate

          Re: my heart bleeds

          Ahoy! Good for them!

        2. desht

          Re: my heart bleeds

          But, but, I thought mass-scale piracy was OK now?

          Or is that just for AI corps with deep pockets and friends in government?

    2. IGotOut Silver badge

      Re: my heart bleeds

      "I'm not inclined to sympathy to its customers."

      Then you may as well close your account on here, and a huge amount of other sites.

      You do know The Reg is a Cloudflare customer?

      1. captain veg Silver badge

        Re: my heart bleeds

        Yes. It disappoints me.

        -A.

  7. Tron Silver badge

    So...

    Where do users apply for compensation for consequential loss of income from when services were down.

    Because a Google Fail is not an 'act of God'.

    1. BinkyTheMagicPaperclip Silver badge

      Re: So...

      You claim on your business interruption insurance, or against an SLA with the service provider that actually involves them giving you cold hard cash and the right to terminate the contract with them without penalty.

      That's assuming the SLA is actually any better than 'LOL, we'll do better next month'

    2. Nick Stallman

      Re: So...

      Your SLA has all the details. You do have a SLA right? If it's that important to you, of course you got a SLA?

      And yes, I do indeed have a Cloudflare SLA.

    3. Phil O'Sophical Silver badge

      Re: So...

      Because a Google Fail is not an 'act of God'.

      Yet. Give them time...

    4. MachDiamond Silver badge

      Re: So...

      "Because a Google Fail is not an 'act of God'."

      As the personification of Evil®, it would be an "act of satan", wouldn't it?

  8. Gene Cash Silver badge
    FAIL

    Ve haff implemented ze mitigation for ze issue

    Can we get any more nonspecific? Can you vague that up a little for me?

    1. VonDutch

      Re: Ve haff implemented ze mitigation for ze issue

      We might find out details in Monday's Who,Me?

      1. Mr Dogshit

        Re: Ve haff implemented ze mitigation for ze issue

        In twenty years' time, perhaps.

  9. Lon24 Silver badge

    Standards should Triumph

    Dear Reg, Please save a google check and put times in UTC - or at least Anglesey time. (Sorry, forgot the proper name for it nowadays).

    1. Unoriginal Handle
      Coat

      Re: Standards should Triumph

      Ynys Môn. Because I'm a geek.

    2. ForthIsNotDead

      Re: Standards should Triumph

      I humbly request that all times be expressed in terms of Shropshire time. Things tend to move pretty slowly around here, and consequently, there are only 4 times of day that are actually relevant:

      * about now

      * before

      * after - no rush

      * dunno

      :-)

      1. Phil O'Sophical Silver badge

        Re: Standards should Triumph

        A Spanish tourist wandering around Ireland was intrigued by signs in Irish, and after chatting with a local asked what the equivalent of "mañana" was. After some thought the local admitted "I don't think we have anything with the same sense of urgency."

        1. Don Bannister

          Re: Standards should Triumph

          That was a good gag that Irish comedian Frank Carson had in his repertoire !

          1. Phil O'Sophical Silver badge
            Thumb Up

            Re: Standards should Triumph

            It's the way I tell 'em

      2. Blitheringeejit
        Pint

        Re: Standards should Triumph

        You forgot "opening time". Happy Friday!

  10. Jamie Jones Silver badge

    Eggs and Baskets

    I can validly claim that my servers have better uptime than Google and Cloudflare, and they aren't even running mission-critical operations.

    1. Claptrap314 Silver badge
      Facepalm

      Re: Eggs and Baskets

      Both of them, huh?

      1. Jamie Jones Silver badge

        Re: Eggs and Baskets

        Totally missing the point there. Maybe this will help: https://dictionary.cambridge.org/dictionary/english/put-all-eggs-in-one-basket

        The point is, all 6 of them are really just hobbiest machines, appropriate sizes for their task, and no way will one problem take them down at the same time.

        Back when I was responsible professionally for hundreds of servers spanning the UK and upwards of tens of thousands of users, we didn't have anything like the budget of these huge cloud providers, and whilst individually these machines couldn't be guaranteed 100% uptime, we could easily guarantee we wouldn't lose the lot at once, and departments that had the budget for proper redundancy setups never lost anything. But then, we weren't using opaque terms like "cloud" and we were real people who could be directly shouted at by our employer. And never did someone who didn't even work for the company manage to take our systems offline.

        I'm nothing special - most here would have similar experiences, which is why most would recommend not putting your critical stuff on someone else's computer.

    2. Penguinista
      Joke

      Re: Eggs and Baskets

      Put some mission-critical operations on them, and they'll be up and down more times than a prostitute's knickers...

      1. Jamie Jones Silver badge
        Happy

        Re: Eggs and Baskets

        Sods law, eh?

        Don't confuse mission-critical for under-utilised. There are some busy things running on there, but I don't really give a shit if they broke for a few hours!

  11. xyz Silver badge

    IKEA went titsup and...

    I was saved from my girlfriend buying more cushions! Thank you Google > Cloudfare > IKEA whoever. The internet was supposed to survive a nuclear war but can seemingly now be fucked by a perms issue... Yay for progress.

  12. Pete 2 Silver badge

    Rain on your parade

    > Google Cloud goes down

    Surely clouds precipitate?

    1. captain veg Silver badge

      Re: Rain on your parade

      They're certainly not the solution.

      -A.

  13. This post has been deleted by its author

  14. cookiecutter

    Time to turn the phone off

    What CFOs need to understand is...

    If it's someone else's computer....it's someone else's problem.

    Especially with the level of fuckwittery you get from Indian & south African 1st line support whose only interest is not admitting anything is going wrong, not escalating you to 3rd line and closing the call asap

  15. JasonT
    Devil

    Yeah, it's someone else's computer...

    I've must have had relatively bad luck, but the companies I have worked for who run their own "data centers" forget the part about depreciation where you are meant to dispose of the aged assets and replace them. Cloud providers have their problems, but they aren't hanging on to kit forever.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like