back to article Google cloud glitch hits at Beer O'Clock Friday, fix coming Monday

Google's got a problem with its Cloud SQL service – about seven per cent of instances using the service's first-generation code aren't backing up properly. The problem started at 16:54 on Friday afternoon and by 17:34 the company announced it was “forcing” backups “as short-term mitigation.” There's no suggestion data is in …

  1. veti Silver badge

    The point of SLAs is to define exactly what they promise to deliver, and what compensation I get if they don't.

    Provided they can deliver what they promised, why do I care how they do it? They could be writing the whole thing out on vellum with quill pens for all I know. Or care.

    1. Anonymous Coward
      Anonymous Coward

      Agree. This seems like a non-story. I understand why Google would want to, and will, automate the process, but the whole idea is that Google owns the SLA and they can do it however they want to do it. If there is an issue, there is an issue... and financial compensation. If there is no issue, then it is really not anyone's problem or concern how they choose to meet the SLA.

      This is gen1 Cloud SQL, which is, frankly, out of date and only around for people who have not yet moved to gen2, so probably a non issue in any case.

      1. This post has been deleted by its author

      2. Adam 52 Silver badge

        Have you read the standard Google T&C and the SLA? If you're expecting any meaningful compensation for any sort of outage, data breach etc. then you'll be sorely disappointed.

        Supposing Google lose your Cloud SQL backup and you get fined €10 million by the regulator. Google will compensate you about $3000, and it'll cost you thousands in Californian lawyers to get it.

    2. Doctor Syntax Silver badge

      "The point of SLAs is to define exactly what they promise to deliver, and what compensation I get if they don't."

      This is the MBA version. The reality could be that a failed backup leads to a failed business.

      If you're a DBA or sysadmin working for the company that owns the database you know that a failure could lead to the loss of your jobs and those of your colleagues. If a problem happens on a Friday you work over the weekend to fix it, you don't leave it until Monday.

      If you work for a 3rd party provider you know the worst is that it cost your employer whatever the SLA says and nothing else. What's more your employer has already taken the decision to carry the risk themselves or to insure against it and if they don't care enough to tell you to work the weekend why should you care any more?

      1. Anonymous Coward
        Anonymous Coward

        "If you work for a 3rd party provider you know the worst is that it cost your employer whatever the SLA says and nothing else."

        That's not true. If there is some major issue, I doubt the only issue a provider will have is whatever the "SLA says and nothing else." They will probably lose the business, account. The SLA is a minimum. It's not like Google said "we're not doing any back ups until further notice, have a good weekend"... they have a fix/solution in place, they're doing back ups, it's working. It just isn't automated... and, in theory, manual DBA intervention is slightly more risky than automation. It is also what pretty much everyone does on prem, manual or partially manual DBA back ups.

        What the cloud providers do is a step beyond what any on prem provider would do... if Oracle or MSFT doesn't patch something in a timely fashion (just in theory :) and something goes wrong as a result, you are owed nothing and Oracle or MSFT will issue the patch when they are good and ready. They don't own *any* SLA... they are just dropping off code and, if there is an issue, they will fix it when they feel like it.

        That is the part I don't get about people's concerns about cloud SLAs. They are concerned because now you are in the cloud providers' hands and their competency is paramount... but that is the case with on prem today. It's not like people are making their own servers and writing their own databases from the ground up. If there is an issue anywhere, you are basically on the phone with Oracle, IBM, Cisco, EMC, MSFT, etc waiting for them to get a fix out. Same situation as on prem as in cloud, but at least in cloud the provider owns the overall SLA and it is on them.

        1. Anonymous Coward
          Anonymous Coward

          Here's the scenario -

          Customer is concerned that cloud is not 100% automated and manual intervention is required (only on back ups, until next week when it will be automated, on one legacy service). Ergo, they need to keep workload on prem where nearly everything is manual with little automation and bolted together from 10 different providers who all hate each other, patch levels are rarely the most current rev because they often require outages, etc.

          It is like a person who has been drinking all night at a bar reading about how much better self driving cars are in terms of safety than professional car service drivers... so they decide to drive home after eight drinks to avoid the risk of a professional car service driver as a self driving car would be optimal.

  2. ratfox

    Pushing an emergency fix on a Friday evening sounds like a good recipe for disaster. Doing manual backups over the week-end might entail a bit of risk, but probably much less.

    1. Doctor Syntax Silver badge

      "Pushing an emergency fix on a Friday evening sounds like a good recipe for disaster."

      Pushing an emergency fix on a Friday evening and waiting until Monday morning aren't the only alternatives.

    2. Anonymous Coward
      Anonymous Coward

      An emergency fix on any day is a recipe for disaster. Perhaps the fix itself takes just 15 minutes to code, but they still need to do QA, and finally let it propagate to their many DCs. If they are promising (?) the fix to be finished on Monday then it still sounds a bit of a rush job unless the error was an obvious one and can't have an adverse effect on other system parts.

      Google can very well afford a couple well paid techs to identify the root cause and start on the fix even on a Friday evening.

      But as they didn't - you can draw your own conclusions where Google's priorities are.

      1. Anonymous Coward
        Anonymous Coward

        They did get a fix out... nothing was ever broken. They just haven't automated back ups for DBs for a legacy service, gen1 Cloud SQL. The SQL services were never down and always working, back ups were never down and always working, they have not yet automated back ups.

        I would love to see your reaction if your boss asked you to work through the weekend to get DB back up automation working so zero manual intervention is required... even though the DBs are performing perfectly, the back ups are running, this is a nice to have automation piece.

        In other words, Google's legacy SQL environment is degraded to the point where it is working exactly like your on prem DBs which you would consider to be working perfectly, without issue.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like