back to article Not just Microsoft: Auth turns out to be a point of failure for Google's cloud, too

Google has posted more details about its 50 minute outage yesterday, though promising a "full incident report" to follow. It was authentication that broke, reminiscent of Microsoft's September cloud outage caused by an Azure Active Directory failure. In an update to its Cloud Status dashboard, Google said that: "The root cause …

  1. 2460 Something

    Redundancy

    And this is another classic example of why you should never settle on just one cloud provider for services that are critical. It isn't a question of 'if' it is a matter of 'when' they have an outage, as to even get to being an outage it is going to be massive (smaller stuff they have internal redundancies anyway so end user likely never even knows).

    1. richardcox13

      Re: Redundancy

      No way to do that for things like Docs and Gmail (or any other file store or mailbox) without adding a lot of complexity.

      Adding some sort of synchronisation between providers on top of a third party services has a serious possibility of making the whole thing less reliable.

      For things that naturally scale out it is different, but those are special cases.

      1. sgp Bronze badge

        Re: Redundancy

        Rules out any SaaS. Don't really get the "add resilience" either. Clouds are complex systems. Complexity <> resilience.

        1. FlamingDeath Silver badge

          Re: Redundancy

          I’m not sure complexity is the right word

          I mean, I’m sure if I were to add cake mix upon cake mix, I would end up with what you might describe as a “complicated cake”

          However, I would describe it as a fucking disaster and it would not make the bakeoff finals

      2. Warm Braw Silver badge

        Re: Redundancy

        It's certainly difficult to see how you'd replicate between two different providers without some form of mutual authentication - which means instead of needing one authentication provider working in order for the service to be available you'd need at least two.

        1. Graham Dawson Silver badge

          Re: Redundancy

          You can kinda sorta do it with a third party sync like InSync (which I shall not link, but you can search for it, I think), which can sync between google drive and office 365 by synchronising both services to a local folder. Rough and ready, but serviceable.

          1. SecretSonOfHG

            Re: Redundancy

            So you proceed to move to the cloud so that you can save yourself the hassle of backups, only to find yourself making backups in your local disk just in case the cloud goes down. Sort of defeats the purpose of the cloud move.

            1. Alister Silver badge

              Re: Redundancy

              So you proceed to move to the cloud so that you can save yourself the hassle of backups

              That has never been true except in the minds of the cloud salesman.

              Moving stuff to the cloud is not an automatic panacea, and needs to be planned just as thoroughly - if not more thoroughly - than setting up a physical environment, and backups and redundancy have to be added, they are not there by default.

            2. SuperGeek Bronze badge

              Re: Redundancy

              Not really. You should have offline backups of data whether it is on the cloud or not. It is a common misconception that cloud=backup. It doesn't, really. Whether the provider does backups or not it's faster and easier for recovery to have offline copies.

            3. Graham Dawson Silver badge
              Big Brother

              Re: Redundancy

              Anyone who uses cloud storage as a backup is a lunatic.

  2. macjules Silver badge
    Meh

    Thanks be to God

    For a second there yesterday I though Google might have employed Baroness Dido Harding as their new CEO.

    1. James O'Shea

      Re: Thanks be to God

      She does such a cracking good job that she needs a promotion. No longer should she be a mere Baroness. No, she should be a _Duchess_. I do believe that the Duke of York is currently between engagements. They deserve each other. And then the newly hitched pair should be sent to be co-Governors of the Falklands.

      1. Mark #255

        Re: Thanks be to God

        That seems unduly harsh... on the Falklanders.

        Why not Rockall (with mandatory residency)?

        1. James O'Shea

          Re: Thanks be to God

          Too close.

    2. The Griff

      Re: Thanks be to God

      People are harsh on Dido, I like that one she did with Eminem.

  3. Kobus Botes
    Meh

    GMail out again today...

    GMail fell over at least twice today. According to downdetector it was out from 6:48 EST (no info on how long it lasted). When I tried to log on to GMail at 15:30 (GMT + 2), it was unable to connect. Downdetector did not report any problems here at that time and I was able to log on a minute or so later.

    Downdetector still shows outages for GMail though, mostly in Western Europe and the UK and the North-Eastern parts of the USA and Canada (Washington to Boston and Chicago to Toronto), plus Florida.

    1. Marjolica

      Re: GMail out again today...

      Authentication was also problematic yesterday evening: I don't use gmail but send two messages to friends' gmail accounts around 23:00 and got:

      I'm sorry to have to inform you that your message could not

      be delivered to one or more recipients.

      ...

      <redacted@gmail.com>: host gmail-smtp-in.l.google.com[173.194.69.26] said:

      550-5.1.1 The email account that you tried to reach does not exist. Please

      try 550-5.1.1 double-checking the recipient's email address for typos or

      550-5.1.1 unnecessary spaces. Learn more at 550 5.1.1

      https://support.google.com/mail/?p=NoSuchUser cw4si6981ejb.196 - gsmtp (in

      reply to RCPT TO command)

      1. Anonymous Coward
        Anonymous Coward

        Re: GMail out again today...

        > 550-5.1.1 The email account that you tried to reach does not exist.

        Just got one of those ten minutes ago! (It is around 2020-12-15T22:20Z now)

        I know the recipient forwards a copy of his email to a GMail box. For a moment I thought he might have seen the light and deleted his stupid Google account.

        1. Marjolica

          Re: GMail out again today...

          Agree. I resent one and got that through earlier but it's now down again at 22:25:36 +0000 (GMT), same message, for two more recipients.

        2. Ken Moorhouse Silver badge

          Re: the recipient forwards a copy of his email to a GMail box

          I once had a customer who set that up on the mail server that I supplied them with. Their reasoning was they were concerned about the reliability of my on-prem solution. What happened was that a few spams got forwarded to gmail and, as a consequence gmail blocked my customer's IP. Which meant they were unable to send to gmail accounts. "See, I was right to be suspicious about your tin-pot email system" was their reaction.

          (Try going through the hoops to unblocdk an IP from gmail's blacklist: it was easier just to use a different one from my customer's IP pool, and to black-hole that user's forwarding abilities so that it wouldn't happen again).

  4. Snowy
    Facepalm

    Fall down

    When authentication breaks and things carrying on working then you have other more serious problems.

    Edit: Spelling.

    1. sabroni Silver badge

      Re: When authentication breaks and things carrying on working you have other more serious problems.

      Unless all your using authentication for is targetting ads. A video player can still play a video, but the provider can't charge as much for showing you the ad before it if they don't know your demographic.

  5. HildyJ Silver badge
    Facepalm

    Damned if you . . .

    If you trust in the cloud, any cloud, Auth will eventually bite you in the butt.

    OTOH, if you keep it in-house, Auth will eventually bite you in the butt.

    Auth ain't easy and there's no solution that avoids it.

    1. SecretSonOfHG

      Re: Damned if you . . .

      Amen

  6. This post has been deleted by its author

    1. This post has been deleted by its author

  7. Arthur the cat Silver badge
    Facepalm

    The advance of technology

    Once upon a time the critical need detectors were in printers and stopped you printing anything when you desperately needed to. Now they're outsourced to the cloud and take down a whole range of services when you need them.

    Progress - we've heard of it.

    1. FlamingDeath Silver badge

      Re: The advance of technology

      Its what the salesman promised, do you think they were lying? Sales people never do that, right?

      I means, its not as if their company would purposely keep their basic salary low in order to entice sociopathic tendencies, would they?

      If you’re in sales, chances are you’re a sociopath.

      Do society a favour and get that shit checked out

      Or better yet, take Bill Hicks advice!

    2. hoola Silver badge

      Re: The advance of technology

      But for some reason even though the impact is greater, people just appear to accept the outage as a normal part of running a system in the cloud.

      If we had the sort of issues seen here on our internal AD there would be management screaming everywhere. This happens in a cloud service, the same people make a few calls shrug their shoulders and head for Starbucks.

  8. Mage Silver badge
    Mushroom

    Resilence isn't enough for some things.

    "Cloud services in general may be more reliable, on average, than on-premises services, but the impact when they fail is huge. It is in all of our interests if efforts to further improve their resilience succeed."

    NO!

    It can never be reliable enough. On premises takes out only one company. A small number of Cloud providers with monoculture is an eventual apocalypse.

    This fairy tale explains why: https://www.smashwords.com/books/view/716453 also Amazon, Google Playbooks, Apple, Kobo, Barnes & Noble. Soon on paper in the local bookshop via ISBN ordering.

    Also some people's own services are more reliable than the cloud.

    1. John Robson Silver badge

      Re: Resilence isn't enough for some things.

      Meh - on prem takes out one customer at a time, but it still do it very widely... what happens if an MS server update gets pushed that has a critical failure that manifests on clock change day...

      Lots of "one customer" failures happen at the same time...

      With a cloud provider you are likely to have a fairly competent staff engaged in fixing the issue, and empowered to do so.

      There are a number of services that we all depend on, and sometimes they fail. At least this failed in the correct direction (denying access rather than admitting everyone)

  9. Anonymous Coward
    Anonymous Coward

    I beg to differ

    "Cloud services in general may be more reliable, on average, than on-premises services," my exchange platform has been running without an outage for 3 years for 4k people (3 years since I changed the certs and broked it) How many times has EOL fallen over in that time?

    1. SecretSonOfHG

      Re: I beg to differ

      Taking your uptime claim at face value there's one missing factor to consider: how much your Exchange platform costs, both in capex (hardware, physical facilities, setup, air conditioning, security) and on going (electricity, off-site backups, license support, support staff) forms.

      Tell us the cost per user, and if it is less than Google's, no doubt you'll have a queue of customers lined up in no time!

      1. Anonymous Coward
        Anonymous Coward

        Re: I beg to differ

        > Tell us the cost per user

        Cost is hardly just monetary. E.g., if his organisation happens to be a major defence contractor, internal emails will not be going through the internet one way or another so that precludes the use of a third party host, unless they *really* like living dangerously.

        And even in monetary terms, I would find it hard to believe that an organisation with a well-manned IT department (i.e., they know what they're doing) would get a cheaper deal than in-premise hosting.

    2. Cuddles Silver badge

      Re: I beg to differ

      One service run in one place for 4k users has a 3 year MTBF. 1000 services run in 1000 places for 4 million users has an MTBF of about one day. How many services are the big cloud providers effectively running? They really are pretty damn reliable. It's just that a failure wipes out everyone at once, rather than each person individually having a failure that hardly anyone else ever knows about.

      Obviously that doesn't mean you can just throw stuff in a cloud and forget about it. You still need your own backups and your own plans about what to do when a failure does inevitably happen. But on average, cloud services really are more reliable than on-premises in a great many cases.

  10. Danny 2 Silver badge

    when authentication breaks, everything breaks

    [I always get downvoted for doing this but I don't care because young people need to know how great 1986 was]

    The Woodentops - Everything Breaks

    I took the only girl technician in our class to their first gig, I'd seduced her by playing 'Good Thing'. Her first concert, she grabbed my hand when The Woodentops came on because she was scared. I tried to get in contact with her a few years ago and she'd died from a brain aneurysm.

    The worst thing about ageing isn't ageing, it's that everyone else ages too. I apologise to everyone who can't know how great 1986 was.

    1. Anonymous Coward
      Anonymous Coward

      Re: when authentication breaks, everything breaks

      > [I always get downvoted for doing this but I don't care because young people need to know how great 1986 was]

      True, I remember it well. England got eliminated in the quarter finals.

  11. Claptrap314 Silver badge

    Should auth be subject to quota?

    That's actually a strange question. The routers and network cables can only handle so much. There is therefore an absolute limit at layer one regarding traffic, and if there is no traffic on your network but auth when that limit is hit, auth WILL be limited. (In practice, other service limits are almost certain to kick in first...)

    Quota is a way of limiting traffic just a tad higher in the stack--and earlier in the call. Everything, and I mean absolutely EVERYTHING needs to be quota'ed if you want to avoid catastrophic degradation.

    While at Google, I worked on modifying the quota system used for Hangouts. Got a real close view of what was going on, and formed some opinions about what needed fixing. On day, I had a flash of insight. If my quota system is returning 100% 500s to protect my back end, that's a good thing. We can bring traffic back in a controlled fashion. Then, I found myself hearing, "My job is to keep the network up. It is only my good nature that allows users to be on it at all." Just so.

    Having said that, the quota system that I had available at Google struck me as very naive. It appears that they are still working that one out. You will recall that it was also implicated in their previous major outage.

  12. Anonymous Coward
    Anonymous Coward

    Cloud

    Daft twats use it.

    1. Anonymous Coward
      Anonymous Coward

      Re: Daft twats use it.

      Another well thought out and reasoned post, add a "Fucking" to the start to make it phd level!!

      1. Anonymous Coward
        Anonymous Coward

        Re: Daft twats use it.

        It looks right to me. Giving control over your IT to someone you have no control over looks pretty fucking stupid to me.

  13. FlamingDeath Silver badge

    The internet is broken

    What is the point of a distributed network if all you’re gonna do is a monoculture?

    Businesses flocking and congregating to a few select clouds

    Datacentres concentrated near conduits

    I guess the internet isn’t as resilient to nuclear attack as has been suggested in the past

    “As so often, Twitter proved more reliable for status information”

    Why else do you think its a favourite for C&C messages

  14. MrNigel
    Happy

    Not our fault!!

    Exchange Online service alert: 16/12/2020 00:15

    Incident information

    Title: Users unable to send email to Gmail recipients

    ID: EX230052

    Status

    Investigating

    Details

    Title: Users unable to send email to Gmail recipients

    User Impact: Users may be unable to send email to Gmail recipients.

    Final status: The investigation is complete and we've determined the service is healthy. A problem didn't occur within the Microsoft-managed environment and is a result of an issue with the affected third-party email provider.

  15. thondwe

    DISK FULL

    I read that failure as "Disk Full" (OK - so unable to allocate more space failed), but effect Google fails because "DISK FULL".

    And just to wind up the "Local is Better than Cloud" brigade - how many orgs can afford dual everything + hot spares on tap? Most orgs can deal with a 45 min outage from a cloud provider - it's less well equipped to deal with a (very expensive) main router failure on a Monday morning with a "time to fix" of at least a day whilst a van brings a replacement from Germany and installs it...

  16. Kevin McMurtrie Silver badge
    Terminator

    Of nearly 100000 employees

    You'd think a few could be paid to troll the logs for errors. I know it has to be Google scale log trolling, but there are systems that can be trained to send only deviations to the caretaker meatbags.

    1. Anonymous Coward
      Anonymous Coward

      Re: Of nearly 100000 employees

      Troll the logs?

      1. Kevin McMurtrie Silver badge
        Facepalm

        Re: Of nearly 100000 employees

        Trawl (I should not post while drinking)

  17. bigtreeman

    single sign in

    And when authentication breaks for single sign in for other web sites

    things get really forked...

    don't they Facebook,

    but it could be blamed on the other site.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2021