back to article Network driver issue shaves 12 more hours off Microsoft's '365' infrastructure, and yeah, it was Exchange Online again

Microsoft's cloudy email service, Exchange Online, decided to have an early night last night, and then enjoyed a lie-in this morning. Traditionally a night for fireworks, 5 November saw some sort of detonation within the Microsoft 365 infrastructure in the form of a borked update or, as the company delicately put it: "an issue …

  1. johnnyblaze

    Down we go

    Microsoft 358 - and counting (down)!

    1. aregross

      Re: Down we go

      [Overpaid M$ Bigwig 1]: We can't keep calling it 365 if we keep having these break-downs! We need to come up with another name. We've become the laughing-stock of IT World!

      [Overpaid MS Bigwig 2]:This was the best Marketing could come up with. How 'bout we ask our World-Famous AI software to come up with a new name!

      [World-Famous AI]: Office Sometimes

      1. sgp

        Re: Down we go

        Tay, is that you?

  2. Hubert Cumberdale Silver badge

    "Traditionally a night for fireworks, 5 November saw some sort of detonation..."

    Well, I know it's just a bit of reporting fun, but to be pedantic, fireworks should only ever be undergoing deflagration rather than detonation. This is written into the Explosives Regulations 2014:

    “pyrotechnic substance” means an explosive substance of a kind designed to produce an effect by heat, light, sound, gas or smoke, or a combination of any of these, as a result of non-detonative, self-sustaining, exothermic chemical reactions;

    There has always been a bit of a question surrounding the speed of the shock wave in flash powder, but nobody wants to measure it too carefully in too many scenarios, just in case in turns out to sometimes be supersonic. This would result in fireworks being a lot less fun.

    1. Excellentsword (Written by Reg staff)

      Bet you're fun at fireworks parties.

      1. Hubert Cumberdale Silver badge

        Point taken. I'll have you know that my firework parties are fun: the great thing is that you don't have to talk to me because I'm busy setting them off.

      2. TRT Silver badge

        Oh come on, surely... just this once you can let him off.

        1. AndrueC Silver badge
          Joke

          ..as long as he doesn't bottle it.

          1. TRT Silver badge

            He won't but I heard Catherine wheel.

      3. Korev Silver badge
        Joke

        >Bet you're fun at fireworks parties.

        Yeah, he's *that* guy

        1. AndrueC Silver badge
          Joke

          Hope he wears asbestos clothing then :)

      4. Michael Wojcik Silver badge

        Actually, this is just the sort of thing I'd like to hear at fireworks parties.

        No, I'm not any fun either.

    2. Ochib

      It always says light blue touch paper, but the ones I used last night had dark blue touch paper

      1. Hubert Cumberdale Silver badge

        I'm just going to carry on being annoying and tell you that BS 7114-1:1988 replaced the use of blue touch paper with safety fuse...

    3. el kabong

      Fascinating, never thought of it that way...

      but now that you mention it, it does make sense.

      Thanks for the instructive post, I will never watch fireworks again without also thinking of those wise words I've just read.

  3. OssianScotland

    Bowl of Petunia moment...

    <Petunias>

    Oh no, not again.....

    </Petunias>

  4. Anonymous Coward
    Anonymous Coward

    UK Reseller Confusion

    These outages confuse me. Are these just the "buy direct from Microsoft in the US" users?

    Being a very small UK reseller of the UK hosted stuff my clients don't seem to suffer. I was directly working with one this morning in Outlook changing some settings around. So we would have noticed if it was down.

    I don't pretend to know how the UK \ Europe hosting works, but does this imply UK & Europe have the sense to wait and ignore the Left Pondians doing their updates and let them test the changes first?

    I don't want to defend MS on this stuff, but these reports are often a puzzle to me when I never hear complaints from my clients.

    1. AMBxx Silver badge

      Re: UK Reseller Confusion

      In the same boat here. I get loads of admin warnings from MS, but never have any problems. Last time my email stopped working was back in the days of Blackberry BIS.

    2. Ken Moorhouse Silver badge

      Re: I never hear complaints from my clients.

      Ok, I've succumbed to the ubiquitous Smart Alec response, forgive me:-

      Do they have your phone number as well as your email address?

      ===

      On a serious note though, how does one plan for such an eventuality? With an on-prem mail server I have been able to resolve all catastrophes within four hours of notification (by telephonic or Short Message Service means) and copious proactive updates - via the same medium - to the afflicted customer. Incidentally without loss of messages. Flooding, burglary, internet outage, faulty hardware, ransomware, yep seen 'em all. DNS issues are more difficult because the failure modes are less predictable and, again that's because they rely on outsiders for their resolution.

      Main vehicle for communicating MS365 failures would appear to be Twitter.

      1. Anonymous Coward
        Anonymous Coward

        Re: I never hear complaints from my clients.

        Not really "Smart Alec", it just shows more of this assumption of knowing everything without actually knowing any facts. Anyone can guess what may happen.

        Seriously - I would never want to defend Microsoft, and only supply O365 because my clients have asked for it. I would love to pile in on the "it is always down" jokes but just fail to see it happening as often as reported with my small sub-set of clients.

        It was a genuine question about the different between the "buy from MS and be hosted in the US" and "Buy from the European based operations". It just seems like the European hosted stuff gets less problems. Personally I just put it down to different people running the actual servers. I would not want to credit MS with anything here.

        1. Ken Moorhouse Silver badge

          Re: I... only supply O365 because my clients have asked for it.

          As someone who has worked in environments where "Stay until it is fixed" is the expectation, I am not comfortable with acceding to such requests, even with my advice set out in writing. The risk analysis does not stack up favourably. I prefer to be in control of a situation such that I can sleep at night.

          One argument I have heard a lot of is that "If O365 goes down then everyone else is in the same boat." Whatever happened to the USP that "we have resilience where other's fail"?

          Incidentally, if I am not mistaken O365 in decimal is 245.

          1. hoola Silver badge

            Re: I... only supply O365 because my clients have asked for it.

            I think the biggest issue is that when you have something running on your servers, even if the overall reliability is better than a service, manglement still persist in believing that "Cloud" is better. This is partly down to smart salespeople, being "on-trend" with the ridiculous Gartner advice and crucially, responsibility.

            It does not matter how good your own Exchange may have been these idiotic mangers see a hosted service as a way of avoiding responsibility.

            So, a service goes down and users are inconvenienced, these manglement idiots can with complete truth say "was are doing everything we can to fix it". In these cases, send an email to the appropriate account manager for the cloud service that has gone down. Full marks for effort, zero marks for effect. Board rooms get some spiel about SLAs and a few charts, everyone is happy except the real people that matter, the users who are trying to work.

          2. Anonymous Coward
            Anonymous Coward

            Re: I... only supply O365 because my clients have asked for it.

            0x365, got it! El Reg, are you listening? We have a corrected product name for you!

      2. Adelio

        Re: I never hear complaints from my clients.

        I understand the reasoning for getting this stuff hosted "in the cloud" but when your cloud provided seems to be as incompetent as Micrsoft then maybe you should think again!

    3. Giles C Silver badge

      Re: UK Reseller Confusion

      Looking at sites such as down detector, I think the time zones help us in the uk, the outage of the 5th was reported as being mostly affected from 7pm, when most uk offices are shut for the night. If it took 12 hours to be resolved that would take it to 7am which is before we get into work.

      Not defending the system but it seems likely to why we don’t notice outages in the uk that much.

    4. MatthewSt Silver badge

      Re: UK Reseller Confusion

      I'd recommend this video from 2015 - https://channel9.msdn.com/Events/Ignite/2015/BRK3186. 15 seconds in they say it's running on 65,000 servers (but that includes SharePoint I think).

      Deployments are done in batches (about 54 minutes in), monitored for results and then continued. That Means that problems won't always affect everybody, and may affect different people (if the order of the batches is rotated).

      1. Ken Moorhouse Silver badge

        Re: I'd recommend this video from 2015

        Thank you for the link.

        Interesting that the system they've built is inspired by Kanban. I can understand the attraction, but can imagine that there would be times when the system becomes deluged with too much information, and I feel would be a big problem during outages.

        The Red Alert concept seems to attribute visits to the System Status portal as indicative of a potential failure, and is used as a factor in trying to reduce MTTR (mean time to resolution). Maybe true, but a bit uneasy with the overall logic.

        Resolution of hardware failure seems to be driven by SLA's dependent on that specific hardware, rather than the event of hardware failure being reported. It makes sense to replace hardware in batches for efficiency reasons, but interesting that SLA trumps all else to the extent that it was mentioned in the presentation.

        Funniest moment was when he exclaimed "I think I saw an Edge browser in there!" looking through the browsers logged on to the service.

  5. fidodogbreath

    Borkity Bork Bork, Borkity Bork Bork

    Look at Office go

    Borkity Bork Bork, Borkity Bork Bork

    Is it up yet? No.

  6. O RLY

    Today's reminder that cloud = "someone else's computer." You'd think MS would be better at drivers than the average punter by now...

    1. DoctorNine

      But, but... 'cloud' sounds so soft.. and calming... Besides, we all know that every cloud has a silver lining! How could something go wrong in such a perfect environment? We all should just trust the puffiness of clouds to bring us release from the drudgery of running these bloody server farms all day long!

      Set your worries free! Send them to The Cloud!

      1. Ken Moorhouse Silver badge

        Re: But, but... 'cloud' sounds so soft.. and calming...

        Hmm, what about mushroom clouds?

  7. Anonymous Coward
    Anonymous Coward

    Test environment, I do my testing in production.

    1. Imhotep

      You get immediate feedback that way, and it identifies problems that may not show up in Test.

  8. Anonymous Coward
    Joke

    Remind me again...

    Remind me again, why did we vote for MICROS~1 instead of Borkzilla?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like