back to article GitHub rolls back database change after breaking itself

If you can't or couldn't access GitHub today, it's because the site broke itself. The Microsoft-owned code-hosting outfit says it made a change involving its database infrastructure, which sparked a global outage of its various services. The biz is now in the process of rolling back that update to recover. "We are …

  1. bemusedHorseman
    Megaphone

    Take your bets!

    Which was it this time? Was it DNS? Wrong window syndrome? Regex gone wild? Far fingered an rm -rf? Or did someone forget to pay the onion bhaji tax to the local BOFH???

    1. Andy Mac
      Pint

      Re: Take your bets!

      You fat fingered “fat fingered”, have a beer on me

      1. Pierre 1970
        Coat

        Re: Take your bets!

        Love this recursive jokes (sort of)... I would have said that far fingers usually are not allow to make these mistakes taken into account the distance to the keyboard.

        1. TheRealRoland
          Happy

          Re: Take your bets!

          Professor Farnsworth might sue you for copying his idea of the fing-longer (even though he didn't build it)

  2. Claptrap314 Silver badge

    30 minutes to decide to do a rollback?

    Classic. When I started a chain of actions that triggered a Hangouts outage (mostly internal to Google, thankfully, but still really bad), it took me less than 10 minutes to decide that. Seriously, if you last change corresponds to the start of an outage, now is NOT the time to say "correlation does not imply causation". You aren't Aristotle. Be Beyes instead. Undoing that change is far more likely to fix things than to harm them.

    1. Bebu Silver badge
      Windows

      Re: 30 minutes to decide to do a rollback?

      I am thinking it might have taken 30' for the changes to propagate to the full extent of github's infrastructure.

      From bitter experience hastily reversing an undesirable change midway through that change taking full effect is often just adding another locomotive to the train wreck.

      Taking time to collect the evidence and think through everything before potentially compounding the felony is one of the hallmarks of experience.

      BTW Bayes for the typo Beyes.

      1. Claptrap314 Silver badge

        Re: 30 minutes to decide to do a rollback?

        If they did not have a rollout cancel capability, I can see that. As a possibility. Hopefully, they will release and RCA or similar. Of course, if it's like the others I've read from m$, I expect nothing good...

    2. Anonymous Coward
      Anonymous Coward

      Re: 30 minutes to decide to do a rollback?

      This comment stinks of junior developer who wants to modernise a decades old system in their first week

      1. teknopaul

        Re: 30 minutes to decide to do a rollback?

        & yours smells of grey beard who hasn't upgraded in 10 years. ;{)

        While there is nothing wrong with that, GitHub given it's core functionality, ought to be err on the side of bleeding edge.

  3. JamesTGrant Bronze badge

    I’ve worked on some service windows where the plan has a point at which ‘going back now will exceed the service window length’. At that point it is VERY tempting to press on and fix any issues rather than roll back - and many times there’s a good chance that you win. I’ve worked on some where there’s cabling changes, equipment changes and configuration changes carefully coordinated between people where even the thought of considering roll back is anxiety inducing.

    But I do not subscribe to the ‘don’t pull the knife out from the wound’ approach. A sensible plan has criteria at each stage that, if missed, dictate the start of the (documented) rollback from that point and a documented rollback procedure (that you have tested yourself!) from that very point.

    All very standard - or is in many industries. Seems to be based on consequence to the business as to how much effort is put into a proper plan and review process.

  4. simonlb Silver badge
    Trollface

    What's That?

    Another Microsoft owned cloud service failed? I see a trend forming.

  5. Lonpfrb

    Scalable Cloud, but...

    So m$ wants to convince customers that their cloud is scalable and demand flexible yet they are too cheap to provision a preproduction instance for final change deployment and validation before going to production.

    Leadership is do as I do, not just do as I say..

    If you don't live best practice, don't expect to be taken seriously in enterprise IT...

  6. teknopaul
    FAIL

    Microservices

    I heard thta Microservices are a good idea to stop the whole Shitshow from coming tumbling down when you bork a database update.

    Git is famously one of the last remaining monoliths, and it's bork rate proves the Microservice architects poimt

    1. Anonymous Coward
      Anonymous Coward

      Re: Microservices

      Does not sound like a git service failure. It is more like not sufficiently tested infrastructure code. Stuff like this is not unavoidable with those budgets

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like