back to article Google BigQuery TITSUP caused by failure to scale-yer workloads

A four-hour outage of Google's BigQuery streaming service has taught the cloud aspirant two harsh lessons: its cloud doesn't always scale as well as it would like, and; it needs to explain itself better during outages. The Alphabet subsidiary's trouble started last Tuesday when a surge in demand for the BigQuery authorization …

  1. Anonymous Coward
    Anonymous Coward

    Google's service outage notifications are woeful PR fluff

    "There is an issue with X affecting only 0.00Y% of customers."

    "The issue is resolved for most users of X."

    "The issue with X is resolved, we value your business, we hope you have warm fuzzy feelings, we strive to be awesome."

    That is NOT communication with customers about an outage, that's PR BS on par with "due to higher than normal call volumes" and "your call is important to us".

    I work for a large multinational, where, every time there's an outage, we put out a business outage notification through the technical operations centre, and the lead team needs to write up an "RFO" (Reason-For-Outage). Regular notifications are made to stakeholders, as the outage happens, what the symptoms are, what the outage is determined to be, when and how it's mitigated, and when and how it's resolved. It's then all written up, chapter and verse in the RFO and made available to stakeholders.

    The whole "oops, but it's only affecting 0.000x%" of people thing is PR BS on par with "95% fat free" (5% fat people, do the math).. That could still be, at cloud provider scale, many, many, many people and businesses.

    I accept problems happen, but geez, when they happen, enough of the fluffy "your call is important" crap.. Detail please.

    1. Anonymous Coward
      Anonymous Coward

      Re: Google's service outage notifications are woeful PR fluff

      Here are the _actual_ updates they gave out during the outage:

      https://groups.google.com/forum/#!topic/bigquery-downtime-notify/We-PRncjM4U

      I think this (which I received by email as it happened) is actually pretty decent in that it told us enough to know the impact on our service:

  2. Anonymous Coward
    Coat

    …zipped lips gave users the … heebie jeebies

    I'll bet it did… surely they can use something more modern than that DOS relic ZIP like xz or something… Ohh you mean that kind of "zipped"… never mind.

  3. Anonymous Coward
    Anonymous Coward

    umm

    One of the supposed characteristics of public cloud is elasticity and that relates to capacity on demand. If they have an architecture that hits the capacity wall easily, it isn't much of a public cloud. This tells me Google isn't in the big leagues like Amazon or Microsoft or even IBM. Sorry Diane, but enterprise cloud my ass.

    1. Adam 52 Silver badge

      Re: umm

      You got down-voted but I'd tend to agree for different reasons. The whole arrangement, but especially the commercial and support side just isn't in the same league as AWS.

      This incident highlights the support side and you only have to look at the Cloud Service agreement to realise that they just aren't mature enough work in the Enterprise space (e.g. we lose all your customer data we'll pay you $5 compensation or this contract is with Google Ireland but under US jurisdiction.

  4. Anonymous Coward
    Anonymous Coward

    The Cloud...

    Other peoples computers you have no control over.

    1. Anonymous Coward
      Anonymous Coward

      Re: Other peoples computers you have no control over.

      No control? Not full control, obviously, but if I can spin up an instance of a server and configure it in Google's cloud that counts as having some control.

  5. Pascal Monett Silver badge
    Trollface

    "the premise of cloud is that it will just scale as demand increases"

    Cloud theory is like military strategy : as soon as the battle starts, you can throw the plans out the window.

  6. monty75

    If only there was some kind of documentation on how to avoid causing yourself an embarrassing DoS http://www.theregister.co.uk/2016/11/10/how_to_avoid_ddosing_yourself/

  7. yoganmahew

    All hands in the way...

    The all hands on deck response is part of the problem with MTTR. 200 people on a call, running in different (or no) directions.What happened to expertise in operations? Oh yeah, it got outsourced and now requires senior VP approval before anything can be fixed...

    1. Dave Pickles

      Re: All hands in the way...

      In my BOFH days, when there was a outage the least useful PFY du jour would be tasked with answering the phones and keeping visitors out of the way.

  8. batfastad

    60% of the time

    ... it works every time!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon