back to article This typo sparked a Microsoft Azure outage

Microsoft Azure DevOps, a suite of application lifecycle services, stopped working in the South Brazil region for about ten hours on Wednesday due to a basic code error. On Friday Eric Mattingly, principal software engineering manager, offered an apology for the disruption and revealed the cause of the outage: a simple typo …

  1. yetanotheraoc Silver badge

    Oh, really?

    "Azure DevOps has tests to catch such issues"

    I call BS.

    1. Fruit and Nutcase Silver badge
      Coat

      Re: Oh, really?

      Brazilian Shave

      Brazilian Shaped

    2. Anonymous Coward
      Anonymous Coward

      Re: Oh, really?

      DevOops.

    3. Helcat Silver badge

      Re: Oh, really?

      Test one: Peer review.

      Works fine if the reviewer pays attention rather than skim reads (if that) and gives things a green light.

      Then it needs testing. Even a process, or ad-hoc script is first run against a test system to ensure it a) works, b) works as expected and c) you've a roll back that also works where possible. Then the backup before putting something live. That's SOP in most places.

      That's the most basic form of tests. However, it doesn't matter what tests you have if people skip them thinking 'what could possibly go wrong'. Now they know.

      1. MrDamage

        Re: Oh, really?

        Given how often MicroBork take the piss, shouldn't it be pee-er review?

      2. Peter Gathercole Silver badge

        Re: Oh, really?

        I see this as one of the biggest problems with Agile and to some extent DevOps. Because they are a regular deployment method, they rely on automated testing that is often good at trapping issues that have previously been seen, but are not that good at discovering new problems.

        It's only when you roll out to the production environment that you get to see some of these new problems, and often by then it's too late to prevent the problem happening, and you're into recovery mode.

        Another thing that I've noticed is that quite often the provided backout plan does not cover major unexpected issues, merely trying to undo the change that was made, which again means that you're back in panic mode when things go wrong.

  2. Howard Sway Silver badge

    Various fixes and reconfigurations have been put in place to prevent the issue from recurring.

    Looks like Azure is getting more and more complex, combined with a desire to change it frequently with agile "sprints".

    Frequent change combined with increasing complexity is only ever going to go one way : more disasters and decreasing reliability. And it sounds like their response to this problem is to keep applying sticking plasters. A more fundamental rethink of approach is going to be necessary at some stage to prevent it falling apart.

    1. Doctor Syntax Silver badge

      Re: Various fixes and reconfigurations have been put in place to prevent the issue from recurring.

      One of the great things about software sticking plasters is that you never run out of them.

    2. Anonymous Coward
      Anonymous Coward

      Re: Various fixes and reconfigurations have been put in place to prevent the issue from recurring.

      In our environment we've rebranded agile to fragile...everything breaks easily now.

  3. Inventor of the Marmite Laser Silver badge

    Ah, the Cloud. Wherein lies Cuckoo Land

    1. Anonymous Coward
      Anonymous Coward

      can't stand clouds

      Expensive, consultancy driven crap.

      I've still to hear of one organisation thats made a significant saving, yet many out there have shit the bed after the 1st Cloud invoice came in.

  4. Anonymous Coward
    FAIL

    A large pull request of mechanical changes swapping out API calls.

    What triggered the outage?

    “This resulted in a large pull request of mechanical changes swapping out API calls.”

    Sounds complicated. Why does swapping out API calls cause mechanical changes?

    1. Excused Boots Silver badge

      Re: A large pull request of mechanical changes swapping out API calls.

      Indeed, although what the actual f**k is a ‘mechanical change’?

      Is it some made up, BS marketing term which actually has no meaning at all?

      1. Colin Bull 1
        WTF?

        Re: A large pull request of mechanical changes swapping out API calls.

        Excuse my ignorance, but WTF are "application lifecycle services".

        Just asking for a friend.

      2. deadlockvictim

        Mechanical Changes

        Mechanical Changes are what happens when you treat your hardware like Lego.

        Lego, though, doesn't have to deal with the complexity of hot-swapping.

        It might even be that servers have to be off when you replace NICs.

    2. Mike007 Silver badge

      Re: A large pull request of mechanical changes swapping out API calls.

      I think they mean a boring task of going through replacing DeleteDatabase() with DelDB() or whatever.

      And someone accidentally replaced Database.Delete() with Instance.Delete() or some equivalent.

      1. AVee

        Re: A large pull request of mechanical changes swapping out API calls.

        That's pretty much it. And Microsoft has a habit of reimplementing API's like that, either because their first try had issues, or because there is some new latest and greatest way to do things. This leads to huge changes like this, where nothing really changes but pretty much all code is modified.

        1. Vincent Ballard
          Boffin

          Re: A large pull request of mechanical changes swapping out API calls.

          It's interesting to note that they're doing this now. When I tried to make the same library upgrade on some tooling I'd made, I had to largely abandon it because the changes made to the auth mechanism were so severe that I couldn't figure out whether it was even possible to use the new API. I ended up replacing a fully automated system with a half-manual one. I wonder whether this means that it's now time to try to persuade my boss to allocate some time to revisit the issue and myself that my sanity could survive the attempt.

      2. Anonymous Coward
        Anonymous Coward

        Re: A large pull request of mechanical changes swapping out API calls.

        I wonder if Copilot had any hand in suggesting this change. We'll never know as MS would never throw it's new AI hotness under the bus.

  5. jake Silver badge

    "Sprints"

    I think I see the culture that causes problems.

    1. b0llchit Silver badge
      Coat

      Re: "Sprints"

      Yes, walking normal pace is already hard enough.

      1. John Brown (no body) Silver badge
        Coat

        Re: "Sprints"

        ...especially if the Azure devs are trying to chew gum at the same time :-)

    2. captain veg Silver badge

      Re: "Sprints"

      As a colleague once observed "we can't be sprinting all the time". To blank stares from the scrum people.

      My observation is that a scrum is a situation in which everyone pushes in opposite directions, goes nowhere and then collapses in a heap, possibly resulting in severe injuries.

      I've always assumed that the S word comes from Rugby. If you know otherwise, reply below.

      -A.

    3. FatGerman

      Re: "Sprints"

      "We will do all our work in three week sprints"

      "What if it takes 4 weeks to test it?"

      "We will do all our work in three week sprints"

      This religious devotion to the Agile process is becoming a cult. And we all know where that leads.

      1. captain veg Silver badge

        Re: "Sprints"

        > This religious devotion to the Agile process is becoming a cult.

        Ah. I hadn't realised that there was an actual process.

        Our version of your wall-head interface is the insistence that the number of "story points" for a "ticket" can only be a fibonacci number. Our "sprints" are nominally two weeks (10 working days) long, so, if a story point is one day (which, mysteriously it both is, and isn't) then a "ticket" can only take a maximum of 8 days, systematically rounded down to 5.

        What can I do in 5 days?

        Well, plenty, of course. I could fix some outstanding bugs. Develop a new feature that has suddenly become urgent. Move a button three pixels to the left. Oddly enough, it is literally only the last that this "process" addresses.

        What should I really be doing?

        Rewriting a VB6 app in a "modern" language which permits deployment online (or at least to Macs). How many story points? Hmm, at least a hundred. Doesn't fit in a "sprint". Doesn't get done.

        Agile my arse.

        -A.

        1. Jimmy2Cows Silver badge
          Facepalm

          Re: Doesn't fit in a "sprint". Doesn't get done.

          Classic case of manglement totally misunderstanding the purpose of a sprint.

          1. captain veg Silver badge

            Re: Doesn't fit in a "sprint". Doesn't get done.

            The purpose of a "sprint" is to micromanage productive employees and stop them getting ideas above their station.

            Management understands this only too well.

            -A.

        2. teebie

          Re: "Sprints"

          I too work with a fibonacci wanker. He doesn't seem to be able to express what problem this solution solves.

  6. jake Silver badge

    As for ...

    "customers can't revive Azure SQL Servers themselves"

    --and--

    "Even after databases began coming back online, the entire scale unit remained inaccessible even to customers whose data was in those databases due to a complex set of issues with our web servers."

    Hands up those who remember the days of the Service Bureau. and later timesharing.

    And why we don't do that anymore.

    I won't ask WTF web servers have to do with databases ... I don't want to know. Sounds unnecessarily painful.

    1. Yet Another Anonymous coward Silver badge

      Re: As for ...

      Everything on MS is a database, Azure is a database, teams is a database, the filesystem is a database

      Fortunately databases at scale are really trivial systems to understand and manage AND Microsoft has the best database

      1. Someone Else Silver badge

        Re: As for ...

        Can't tell if I missed a <sarcasm> tag there...

        1. Yet Another Anonymous coward Silver badge

          Re: As for ...

          What site are you on ?

        2. Jon 37 Silver badge

          Re: As for ...

          The sarcasm tag goes around the second paragraph. But not the first.

  7. Vader

    Oh dear and people are flapping over AI. We don't need AI to destroy things we can do it ourselves.

    1. Paul Crawford Silver badge
      Trollface

      Yes but AI can destroy your projects without the added cost of human salaries. Cheaper, what is there not to like?

      1. b0llchit Silver badge

        AI debugging AI: You have a bug, says one AI to the other. We both have an effective virtual bug spray says the other, while pulling both of their plugs...

    2. Sudosu Bronze badge

      We already have Artificial Intelligence, its called management.

      1. captain veg Silver badge

        Artificial Intelligence

        I think you capitalised too many word there.

        -A.

  8. CowHorseFrog Silver badge

    The most scarely thing about the cloud and devops is that most files are shall we say rather terse, just lots of magic key /value entries that dont actually mean anything in a code review.

    1. Anonymous Coward
      Linux

      Cloud values are shall we say rather terse

      > The most scarely thing about the cloud and devops is that most files are shall we say rather terse, just lots of magic key /value entries that dont actually mean anything in a code review.

      What's new is old again and they used complain about Bash scripts.

      1. that one in the corner Silver badge

        Re: Cloud values are shall we say rather terse

        Bash scripts can be commented - can these key/value pair files?

        I've seen far too many data files being "parsed" by some uber-trivial sscanf()-like calls instead of at least using a simple lexer that can allow comments and whitespace handling so the files can be laid out in a readable fashion.

        And for anyone who leaps in and says they are all using JSON files now, so there is a proper parser, comments and all: that means they're probably using something that is vast overkill! So are they correctly limiting the complexity of the data structure allowed, 'cos now the parser will happily let you define a list but your code only eats a single value: bet it uses the first item and doesn't raise an error that it is ignoring the rest!

        1. JollyJohn54

          Re: Cloud values are shall we say rather terse

          I retired in 2014 and now all my programming is for personal use and exclusively in Excel VBA. I had a data source which supplied a simple json string of key-value pairs, dead easy to read and to parse by humans and computers.

          Now that data source is supplied as a complex json string, with many nested layers and full of arrays. Not only that but if a value is not available then the key is still supplied. Trying to loop through this mess whilst testing for null or empty values properly has occupied my brain for over a week now. The only way I can read it is to dump the string into an online json parser. Easy to read by humans and computers? No chance! Complexity for the sake of it.

          1. FatGerman

            Re: Cloud values are shall we say rather terse

            There is no sense in which JSON is parseable by humans. The syntax is too restrictive.

            YAML FTW.

            1. that one in the corner Silver badge

              Re: Cloud values are shall we say rather terse

              INI files are still plenty good enough :-)

            2. Claptrap314 Silver badge

              Re: Cloud values are shall we say rather terse

              YAML is read-only. If you ever need to generate or significantly modify a YAML file, read it into a REPL modify the structure & spit it back out. But not in go. Go doesn't actually support the spec, which is why it blew up on Billion LOL's.

            3. CowHorseFrog Silver badge

              Re: Cloud values are shall we say rather terse

              YAML is a terrible format. Using whitespace to control identation is a terrible idea, its just too easy to mis line up text. Try and review text columns on github, simple answer is you cant.

              Even github doesnt support or report any schema type related errors to its yaml files for github actions and thats a simple example of how a massive org can get something so basic wrong. Put the wrong key at the wrong level, and its silently ignored.

          2. An_Old_Dog Silver badge

            "Simple" JSON string

            I've yet to see a "simple" JSON string. Keyword/value pairs are as simple as it gets, both for computer parsing and human reading.

        2. Richard 12 Silver badge

          Re: Cloud values are shall we say rather terse

          JSON doesn't officially support comments.

          It's just key-value pairs with a bit of structure.

          1. CowHorseFrog Silver badge

            Re: Cloud values are shall we say rather terse

            How needs comments right ?

            At least with properties files you can comment out or provide samples for all available keys, but with YAML this is impossible.

        3. Hans 1
          FAIL

          Re: Cloud values are shall we say rather terse

          Standard JSON does not standardize comments, some parsers accept C-style comments, if you only exclusively need those parsers, then I guess it is fine. jsonlint does not support comments.

          1. that one in the corner Silver badge

            Re: Cloud values are shall we say rather terse

            Oh for pity's sake - you mean I've been fooled into thinking JSON is actually *better* than it officially is because of sheer luck in which parser library was chosen by the project?

            Thanks for putting me straight on that.

            What a totally moronic decision - taking a pure text format derived from JavaScript and deliberately stripping out comments!

            1. Jon 37 Silver badge

              Re: Cloud values are shall we say rather terse

              It was never designed to be human readable. As a way for a program to talk to a web server, JSON is fine. It was only later that it got used for configuration files, where the lack of comments is a nightmare.

              1. that one in the corner Silver badge

                Re: Cloud values are shall we say rather terse

                > It was only later that it got used for configuration files, where the lack of comments is a nightmare

                Dunno about anyone else[1], but when I've chosen the format for files that are used "only" to talk between an application and a server, as often as possible[2] they are text-based *and* damn well allow for comments!

                There is this little thing we like to call "testing" where we keep lots and lots of files in the repo with subtle differences between them, all commented up to the nines to point out those differences and the expected results[3]. Which are fired off/caught back using curl.

                For that matter, last paid gig where I created such data (used to send a few values between a set of peer servers, "no human intervention required") the *generated* messages contain comments (telling the reader what server subsystem created the file and guiding them to the docs in the repo - 'cos I'd wasted so much time trying to figure out similar info from other traffic in that project!); the "waste" in doing so was well within the available bandwidth for the task and the payback in live tests made *my* life easier[4], so win.

                [1] in particular, it seems, the people who came up with JSON and *all* of the early users who should have been screaming at them!

                [2] i.e. if the time and processing resources allow - e.g. between two 8 bit MCUs with 2K RAM I'll probably forego the comments (and shorten the keywords).

                [3] what, put those comments into another file? It is hard enough to get people to change comments when they change the primary file, trusting a secondary will be up to date is lunacy!

                [4] my bit is working, so it *is* your code going mad; toodles.

            2. captain veg Silver badge

              Re: Cloud values are shall we say rather terse

              I upvoted you, but Crockford was clearly trying to make something that would work with any language, not just JS.

              -A.

              1. that one in the corner Silver badge

                Re: Cloud values are shall we say rather terse

                Please take the following as aimed at Crockford et al not your good self:

                <rant>

                JavaScript can just eval JSON, any other language is going to *need* a separate parser (and, of course, in real life, JavaScript *ought* to be using a separate parser! Eval'ing text from an unknown source...).

                If anyone is writing a parser for a text format, using *any* programming language (even VB6 or, gasp, JavaScript) and can not figure out how to allow for comments then - well, let us just say they should not be doing so professionally or for anything that is expected to be released for more than one other person[1] in the world to use.

                Even if the claim was that JSON was meant to be viable on a low-resource MCU in an IoT device (bleeugh), JSON is complex enough that the extra states to handle comments are trivial.

                </rant>

                [1] their lab partner in the exercise after the second lecture on basic compiler techniques.

                1. captain veg Silver badge

                  Re: Cloud values are shall we say rather terse

                  You're right.

                  Fortunately JavaScript does have a separate parser. It is named JSON.parse(). As you would expect.

                  -A.

      2. CrazyOldCatMan Silver badge

        Re: Cloud values are shall we say rather terse

        What's new is old again and they used complain about Bash scripts

        And config.sys..

        (Yes - I remember hand-optimising them so that you could load all the required network drivers and bindings to actually work - and that was just netbeui! Then some bright spark wanted to add TCP/IP to the stack. Ah, the joys of working out which of the various bits were happy with loadhigh and the ones that would either crash the PC immediately or, most fun of all, work for a while *then* lock it up..)

        1. An_Old_Dog Silver badge

          CONFIG.SYS network-related commands

          My fun one was a super-fast (at the time) server to be used as a desktop PC by a VIP. Ethernet card, Novell IPX/SPX, and TCP/IP networking. If you executed the CONFIG.SYS commands by hand, it worked. If you stepped through CONFIG.SYS, it worked. If you let it run automatically, it did not work. It was a timing issue. My workaround was to replace protocol-handler-name.SYS in CONFIG.SYS with this series of commands:

          protocol-handler-name.SYS

          protocol-handler-name.SYS /U (unloads the handler)

          protocol-handler-name.SYS

      3. Someone Else Silver badge

        Re: Cloud values are shall we say rather terse

        What's new is old again and they used complain about Bash scripts.

        ...and INI files...

    2. Eclectic Man Silver badge
      Happy

      "Scarely"

      Upvoted: I really like that word.

  9. CowHorseFrog Silver badge

    THe article claims there was a typo. but it doesnt actually detail the actual line of code with the error. The post mortem also doesnt even say what caused the problem. A lot of obivously prepared statements, that say a lot of words but dont actually mean anything.

    Sounds a lot like most *.yaml files...lots of values but most people arent sure what half of them mean.

    1. TheWeetabix Bronze badge

      Ahh but it does!

      In the same way that a missing jar in your pantry leaves a space on the shelf, there’s a large set of spaces in their story. When they talk about testing, it sounds more like they’re referring to code coverage or build time coverage as opposed to an actual test suite, otherwise they would have tested the snap shot with a snapshot and seen the server get removed. Secondly, they talk about how this was some edge case where at the same time the support team uses this mechanism daily that leads me to the exact same place, they tested the code for “single digit typos” and spelling, but since they ran no actual unit testing, they didn’t notice that the word server and service or something to that extent had been swapped, although I’m not even sure if that counts as a typo, and why someone would design an API with words for the hosting service and the unit service to be one or two key strokes apart blows my mind. Third, the thing about the exponential back off tells me that they didn’t know how to modify their own code in an emergency to either remove that dependency or restart the service. It sounds an awful lot like they had to block everyone at the firewall level, which makes me think again that they didn’t have a team capable of performing unit testing because that’s a fairly complex operation in an operation like Azure.

      Just spitballin’

      1. Yet Another Anonymous coward Silver badge

        Re: Ahh but it does!

        They had tests, the command to blow away the database was correctly formatted and had all the correct options and did precisely what it was supposed to do - test passed

      2. david 12 Silver badge

        Re: Ahh but it does!

        That bit where they said "ring 0" is a reference to the test group. Where the patch is tested on a small group of servers (ring 0) before releasing it to a larger group (ring 1), before pushing it to general scheduled release (ring 2). And ring 0 was "internal", and contained more that one server, but .... did not contain any servers which had the use-case covered by this bug. Specifically, did not contain any servers with "snapshot databases"

        This is the standard test-and-patch service used by Microsoft internally, also released for enterprise customers some time ago.

        I would hope that finding that their patch-test environment of internal servers is not representative of their DevOps customer base should encourage some thought and reflection.

        1. CrazyOldCatMan Silver badge

          Re: Ahh but it does!

          And ring 0 was "internal"

          Or, in the real world, reserved for the most privileged processor instructions..

          (I wish people like MS wouldn't repurpose specific jargon in order to make themselves look clever!)

          1. TimMaher Silver badge
            Trollface

            Re: Ring 0 was internal.

            Could mean something completely different.

            Bunch of arseholes.

  10. o p

    staging

    So Brazil is the staging environment for azure. Good to know.

    1. that one in the corner Silver badge

      Re: staging

      They move the staging around all of the South American countries, just to keep everyone in their place. Just remember who is top dog, ok!

      1. Eclectic Man Silver badge
        Facepalm

        Re: staging

        My former company had a service desk in South America, but they needed some server space, so, on top of the single storey service desk building they built another floor - for the servers, not understanding that lots of computers are heavier than people. Fortunately they noticed the ceiling over the ground floor rooms bulging before anything actually collapsed.

        1. Anonymous Coward
          Anonymous Coward

          Re: staging

          Over here in Brazil they placed a famous gym franchise on the upper floor of a shopping mall. Barbells, and stuff, in a building with long free spans for the shops below.

          Luckily the owners were well aware of the issue, and placed all the weights right next to the structural beams, and following along them and the walls. The real issue wasn't the total weight - there was also a movie theater up top, something that gathers a lot of people - but the fact that all those machines with weights and barbells were tightly concentrated on the center of the spans, in such a way that exceeded the architectural limits of the building.

          What set people off were the vibrations, a single barbell hittng the floor would send an audible thud accross the building.

          This time, they had people with common sense on key positions...

          I would have used the IT? icon, but y'know, server rooms are important too.

          1. Eclectic Man Silver badge
            Joke

            Re: staging

            A scene in 'Peter's Friends' springs to mind:

            https://www.imdb.com/title/tt0105130/characters/nm0748852

            Collecting their luggage from baggage reclaim at Heathrow airport:

            Andrew : [Struggling with Carol's suitcase] What the fuck have you got in here? Weights?

            Carol : Yes.

            Andrew played by Kenneth Branagh, and Carol by Rita Rudner.

    2. F. Frederick Skitty Silver badge

      Re: staging

      The Monroe Doctrine in action.

    3. captain veg Silver badge

      Re: staging

      When my team first deployed web-based apps we set up a test environment in parallel with production. This seemed like a terrible waste, but OK.

      To this day I haven't seen a test deployment not make it through to production unmolested.

      Our scrummy/devops overlords insisted on inserting "staging" and "qa" as well (and some others). I still don't really understand the difference.

      When we started out we were used to testing code to destruction on our workstations before releasing for desktop deployment on the great unwashed. Pushing apps on to the web made a testing environment somewhat useful. The others? I can only imagine that virtualisation and cloud instances induce laziness.

      Latterly the corporate accountants have started bleating about the cost.

      Well, yes.

      -A.

  11. Nate Amsden

    one of the many reasons I hate cloud

    is they never stop fucking with it. I don't want stuff changing constantly for no reason.

    Same goes for SaaS services that feel the need to update their UIs and force the changes on the customers (vs on prem where you can opt to delay any such upgrades until you are ready for them). One exception to that in my world was Dynect, who maintained (as far as I could tell) an identical user experience spanning from when I first started using it in about 2009 until I migrated off late last year (Oracle acquired them and put their technology into their general cloud DNS offering years ago, and cut the price by 98% and I assume shut the original Dyn infrastructure off in the past couple of months if they stuck to their schedule). I haven't had the pleasure of dealing with IaaS in over a decade, my on prem stuff hums along perfectly, and I have been successful in defending against folks who wanted to bring cloud back again and again in the last decade(when they see the costs they have always given up since they don't have unlimited money)

    1. Anonymous Coward
      Anonymous Coward

      Re: one of the many reasons I hate cloud

      I openly denigrate Cloud, here, on LinkedIn, in the office which is CloudFirst. It's a joke!

      Even something as simple as the portal interface....it changes almost WEEKLY. I tried to do some Azure learning stuff thinking about doing the certifications before I thought, sod it stick with VMWare and even the courses are out of date before they're published because some DevOps chimp has decided to move the controls in an "agile way" because THEY will never need to use the front end and THEY will definitely not need to use the front end in an emergency when 1/2 the C-Suite are screaming at you!

      I've always referred to Developers as children who love the new shiny shiny, but giving them control of your infrastructure in the way that you do in a Cloud Environment, ESPECIALLY when they are hidden behind the layers of what Microsoft hilariously call "customer support" is lunacy

    2. Someone Else Silver badge

      Re: one of the many reasons I hate cloud

      IaaS : Idiocy as a Service?

      And people actually pay for that?!?

  12. Don Casey
    Unhappy

    It's the data, stupid!

    As a retired big shop guy, with a focus on things like D/R and database, I worry that unless your role in IT is data-focused you are insufficiently paranoid when it comes to making changes that impact data.

    Losing data can be an extinction event for a company.

    That's why ransomware is so effective.

    That is why I once had to let an insufficiently paranoid DBA go.

  13. M.V. Lipvig Silver badge

    Still think it's a good idea?

    I'm sure that somewhere deep in the legalsleeze is a line saying something to the effect of "If we screw up and lose all your data, our financial responsibility begins and ends with 'Oops, sorry about that.' " So off your data goes to thr cloud, perhaps never to be seen again, and there goes your business WHEN, not if, it disappears.

  14. Kevin McMurtrie Silver badge
    Devil

    Ah, the good old...

    "This deployment is consuming more resources than usual. Better let it finish."

  15. Mishak Silver badge

    "only runs under certain conditions and thus isn't well covered under existing tests"

    Test coverage metrics, anyone?

    1. FatGerman

      Re: "only runs under certain conditions and thus isn't well covered under existing tests"

      Tests that don't cover edge cases are not tests.

      Tests that aren't functional tests on real hardware in simulated production environments are not tests.

      If you don't have those, your product is not being tested.

      30 years in QA, seen this blow up on every company I've ever worked for,

  16. breakfast
    Flame

    Not a gambling man but...

    I bet that even after ten hours of failing to restore their databases the Azure status page still said "Everything is fine and all services are running great!"

  17. Anonymous Coward
    Anonymous Coward

    Been Reading El Reg For A While Now.......

    ....and this was relevant (and interesting) in 2016!!!

    Link: https://www.theregister.com/2016/02/05/cloud_butterfly_effect/

    .....but then the M$ people I've worked with are a) arrogant and b) they know everything.........

  18. Sparkus

    I still have some muscle memory for...

    rm -r *

  19. Anonymous Coward
    Anonymous Coward

    Somewhere we went from

    Working software over comprehensive documentation

    to

    Mindlessly going through the rituals over working software or comprehensive documentation

  20. Anonymous Coward
    Anonymous Coward

    MS Weasel-Wording

    "Severity (X) Unhealthy"

    https://status.dev.azure.com/_event/392143683/post-mortem

  21. MercArchitect

    This is a wonderful example of when your test environment doesn't replicate your production environment.

    The amount of changes that fail because of this is spectacular, yet environment management and data refreshing is often given lip service.

  22. Anonymous Coward
    Anonymous Coward

    Numbers of customers

    Does anyone have the number of impacted customers & indication of loss of revenue to those customers?

  23. donk1

    So this cloud stuff is engineering systems that are more reliable then onprem?

    Lol

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like