back to article How to have a more positive 'outage experience' according to Microsoft: Please don't rely on the Azure Status page

Microsoft Azure CTO Mark Russinovich, together with principal program manager overseeing "outage communications" Sami Kubba, has posted about advancing the "outage experience" – not the phrase that usually comes to mind when a cloud failure ruins your day. Outages are "an unfortunate inevitability of the technology industry", …

  1. Anonymous Coward
    Anonymous Coward

    Isn't the whole point of 'the cloud'

    is that apart from it supposedly being cheaper than an in-house DC, it is available 24/7/52.

    If it isn't available then it costs the people/companies using possibly billions of ££££.

    Will any beancounters see the fairly frequent outages and call a halt to the stampede into Cloud Computing with 3rd parties?

    Somehow, I doubt it.

    I guess it will take an outage that actually causes a business to go TITSUP before they take notice.

    1. Anonymous Coward
      Anonymous Coward

      Re: Isn't the whole point of 'the cloud'

      "is that apart from it supposedly being cheaper than an in-house DC, it is available 24/7/52."

      No no... haven't you done MS's AZ900 or AWS's CCP?

      The cloud is all about being cheaper by not needing to have those expensive fleshy infrastructure engineer things on staff to manage your expensive hardware.

      Instead you rent the hardware and MS/AWS deals with the management of them.

      (and it can be resilient, but no one pays for it, just like they didn't when it was on-prem)

    2. Mark 110

      Re: Isn't the whole point of 'the cloud'

      Microsoft provide a 99.9% availability SLA for most services (not all). So they can be down 8 unplanned hours a year without penalties. You need to go in eyes open. If you are expecting 100% then you don't have a contract for that.

  2. Doctor Syntax Silver badge

    Outages are "an unfortunate inevitability of the technology industry"

    Outages are an inevitable consequence of that attitude.

    1. RM Myers
      FAIL

      If you don't believe outages are inevitable, then I have a bridge you might be interested in buying. The next question I would ask any cloud provider, after what is your availability percentage and what measures do you take to continue meeting that percentage, is what is your procedure to recover from an outage including any procedures to move workloads to other data centers if the outage can not be immediately resolved. Shit happens, get over it.

      1. jake Silver badge

        I don't believe they are inevitable.

        In fact, quite the opposite.

        My personal email/FTP/Usenet/Shell account system has been online, and available to whoever has an account, since Flag Day. That was January 1st, 1983, when we changed from NCP to TCP/IP. It had already been running for a number of years, and probably would have survived the change, but I chose to reboot everything at midnight, just to come up from scratch.

        Note I said "system" ... it's multi-homed, multi-OS, multi-hardware, multi MTA (and etc). ... redundancy is fitted in everywhere I can fit it. It started as a Thesis platform when I was at Uni (three locations: at SAIL, under Bryant Street in Palo Alto, and at MAEWest), and now is spread out on six continents.

        Over-kill for a home system? Absolutely. But as a research platform, she's mostly tax deductible. It scales well, and parts of the concept are in place at several Fortune 500s. They should see similar uptimes for the bits they use, barring the almost inevitable catastrophic human maliciousness ... and even then, systems are in place to minimize that kind of damage. Maintenance at this stage of the game is on the order of minutes per month, and that's mostly just scanning the logs for anomalies.

  3. Pascal Monett Silver badge
    Windows

    What a great way to blame the customer for problems

    "Outages are 'an unfortunate inevitability of the technology industry' "

    The Cloud (TM) is sold as being always on, always there, so you have no right to make that argument.

    ""Azure has operated core compute services at 99.995% average uptime across our global cloud infrastructure "

    So you're boasting about the fact that you've got 4 nines performance when you sell at five nines and, on top of that, you're talking about how your cores are functioning, not about how reliably your customers are accessing said cores.

    "more than 95 per cent of our incidents" do not appear there

    Then your Status page is worthless and you should do better. A Status page that only says there's a problem when every single customer can see that there's a problem is just a cover-up.

    there is a separate Azure DevOps status page

    Which proves that there is a cover-up. If you have to separate your failure warnings over multiple web sites, you're just diluting the information willfully.

    And this last one is a beaut : "reliability is a shared responsibility"

    Not when you're making your customers pay for said reliability.

    This entire piece is just a "it's your fault, we're doing everything we can" puff piece.

    Despicable.

    1. Excellentsword (Written by Reg staff)

      Re: What a great way to blame the customer for problems

      It's not a puff piece then, is it? We've reported how Microsoft is thinking about these topics. We haven't praised them for it.

      1. stiine Silver badge

        Re: What a great way to blame the customer for problems

        I don't think he was referring to the article, but to MS. That's how I read it.

        1. Excellentsword (Written by Reg staff)

          Re: Re: What a great way to blame the customer for problems

          Ah gotcha. Well, it goes without saying that anything out of Microsoft is a puff piece for... Microsoft.

        2. Pascal Monett Silver badge

          Re: What a great way to blame the customer for problems

          Indeed, I was absolutely not criticizing El Reg for the article, I was bashing the MS spokesdrone for what was being said.

          I should have been clearer. Now that I re-read myself after a day, I realize that confusion is possible.

          Sorry for that.

    2. Ben Tasker

      Re: What a great way to blame the customer for problems

      > "more than 95 per cent of our incidents" do not appear there

      >

      > Then your Status page is worthless and you should do better. A Status page that only says there's a

      > problem when every single customer can see that there's a problem is just a cover-up.

      This. There's a reason their customers keep going to the status page, and that's because that's where information about known issues is supposed to be exposed.

      It's no good if every provider puts that information somewhere different. When your supplier is going TITSUP the very least they can do is give you some consistency so it's easy to find the information

      1. big_D Silver badge

        Re: What a great way to blame the customer for problems

        Exactly, if it isn't showing the status of Azure, it isn't doing its job. Either it should be renamed or it should, you know, do what it says on the tin!

    3. Anonymous Coward
      Anonymous Coward

      "The Cloud (TM) is sold as being always on"

      The cloud is always on somewhere but that doesn't mean your little bit of it will be working exactly how you need it to be. No-one working in IT should be confused by this.

    4. Mark 110

      Re: What a great way to blame the customer for problems

      "when you sell at five nines"

      They sell at 3 9's.

  4. Dan 55 Silver badge
    Devil

    What's Mark Russinovich been drinking lately?

    The same as Vint Cerf?

  5. Dinanziame Silver badge
    Meh

    Oh yes please give me three different places to look for possible issues, said no one ever.

    1. Ben Tasker
      FAIL

      If you look at his original post, under their "key principles" they've got "Discoverability"

      They want information to be discoverable, but have fragmented it across 3 different places, you couldn't quite make it up.

  6. whitepines
    Facepalm

    How to have a more positive 'outage experience'

    Step 1: Download Linux

    Step 2: Install Linux

    Step 3: Use your computer for work while everyone else is on forced holiday....

    ....errmmmm....

    Oh, wait. That sounds like no fun at all. Down to the pub then, and thank Microsoft for the paid time off?

    1. jake Silver badge
      Pint

      Sounds like a plan.

      I'll get this round in :-)

      1. big_D Silver badge
        Pint

        Ooh, pub o'clock, I'm on my way.

  7. fidodogbreath

    "Despite this, we constantly find that customers visit the Azure Status page to determine the health of services on Azure," [...] it is not much use since "more than 95 per cent of our incidents" do not appear there, according to Kubba.

    Some less-advanced companies might show the status of a service on that service's status page. But that's not the Microsoft Way.

    1. Anonymous Coward
      Anonymous Coward

      re: But that's not the Microsoft Way.

      The only Microsoft way they understand is 'Do it our way or not at all'

      or in this case,

      We don't show the status on the status page simply because there is no need. It is always 100% available.

      Excuse me while I puke.

    2. big_D Silver badge
      Facepalm

      Status page? Why would you look for the status on the status page? That's hardly logical is it?

      You want https://azure.com/cellar/no-stairs/no-lights/disused-lavatory/locked-filing-cabinet/beware-of-the-leopard

  8. Robert Grant

    Architecting reliable applications?

    Are MS advocating multi-cloud architectures? It's a good idea, but I'm surprised they're advocating it.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like