back to article Snow day in corporate world thanks to another frustrating Microsoft Teams outage

Corporate communications ground to a halt for many Office 365 subscribers around the world on Friday after a network outage left Microsoft Teams unresponsive for them for several hours. The IT goliath said it became aware of the breakdown around 1455 UTC, with a service bulletin reporting that at least some customers may …

  1. biddibiddibiddibiddi Bronze badge

    I thought "Cloud" was supposed to make all this stuff nigh-unbreakable, with seamless failover in the event of an issue, everything tested to destruction before being put into service, absolutely safe for us to give up all control and put all our eggs into their single basket at a much higher cost than just doing it ourselves.

    1. aerogems Silver badge

      Doesn't help if the network backbone went down and no one can reach your cloud servers. From the sound of things, that's what happened. Once upon a time the Internet would automagically reroute traffic if node(s) went down, but that has long since been done away with. Progress, amirite?

    2. Doctor Syntax Silver badge

      This isn't just Cloud, it's MS Cloud.

    3. elsergiovolador Silver badge

      The "cloud" is just a vapour over a pile of warm faeces.

    4. abend0c4 Silver badge

      Group communication could be robust and decentralised and, under those circumstances, take advantage of additional resilience from cloud services.

      Resilience, in the end, depends to some extent on keeping the solution reasonably simple - as you add more components the chances of a fault in one bringing down the whole edifice quickly multiply.

      Many cloud systems are actually just distributed single points of failure, despite their marketing.

      1. Notas Badoff

        Abend's Observation - your on the way to eponymous famous

        "Many cloud systems are actually just distributed single points of failure"

      2. trindflo Silver badge

        decentralised

        Yes! Decentralised is what we want to do. That is not what Microsoft wants to do. Microsoft likes to gather up our eggs and brood on them.

  2. billdehaan

    I was wondering why things were so quiet today

    Also, much more productive.

    I can't help but be amused by all of these outages. IT and IS departments convinced CTOs to spend massive amounts of money to outsource all of their infrastructure to the cloud, so that it would be more reliable, and yet many companies are experiencing more downtime and data loss.

    It reminds me of the time some execs ordered us to save money by getting rid of those "pointless" co-located backup servers and the "useless" in-house redundant server, and just put everything into one really big box. Simple, clean, none of that "replication" nonsense that slowed things down.

    It wasn't until it was fully in production (which we did under protest) that I was asked what the machine name spof.companyname.com meant. When I explained that SPOF mean "single point of failure", the CEO (the CTO's boss) went white as a sheet, and wanted us to explain what would happen if it we to fail.

    One rendition of Monty Python's dead parrot sketch ("it's pinin' for the fjords; it's ceased to be; it shall be an ex-server") later, he demanded we explain and justify "our" decision to do this. Several CYA emails were displayed, and the new CTO that arrived the next month promptly reversed the decision, and we were able to restore multi-site before there was any disaster.

    Today, "SPOF" is becoming synonymous with "the cloud". AWS, Office 365, and the like mean that if your net connection goes down, so do you.

    1. aerogems Silver badge

      Re: I was wondering why things were so quiet today

      To be fair, there are service agreement contracts which usually mean companies get some kind of refund from Microsoft/Google/whoever if the cloud is down more than like 1% of the time. So, if it were in-house and it went down, they get nothing. It goes down and they outsource it to Microsoft, they can get a bit of money back, and no one but the beancounters and a few people who think a little too much for the corporate world will stop to realize that it still probably costs them more than running it all in-house.

      1. Anonymous Coward
        Anonymous Coward

        Re: I was wondering why things were so quiet today

        It's not a refund, it's credit for future purchases.

        1. Alumoi Silver badge

          Re: I was wondering why things were so quiet today

          It's not even that. It's hush money so corporation won't start rising hell.

      2. Darkk

        Re: I was wondering why things were so quiet today

        Good luck getting any big money (credit) back from your cloud providers. When they do it's not much compared to your loss of productivity and downtime which can cost the company big bucks. I know there's always a risk of losing access to these services but it's supposed to be extremely rare given all the redundancy that's out there. Ah well. I guess the system admins these days just don't have what it takes to really build a solid infrastructure. Microsoft is no exception.

      3. Richard 12 Silver badge

        Re: I was wondering why things were so quiet today

        $1000 is poor compensation for $1000000 (or more) in losses.

        A $1000 credit note is even worse.

      4. billdehaan

        Re: I was wondering why things were so quiet today

        Oh, there are SLA (System Level Agreements) all over the place.

        The problem isn't just that the outtages themselves, it's that things that shouldn't be moved to the cloud in the first place are.

        Before the cloud, there were internal backup servers, where users' Office documents were backed up. If there was an outtage of the backup server on Wednesday night, if meant that the most recent backup was Tuesday. If it didn't come back up until Thursday, that meant users were working without a net for two days. Not great, but work was still getting done.

        With the move to the cloud, when the net connection goes down, that's it. No more Office access until it comes back on. Customers don't just lose backup capability, they lose access to everything, hence the term single point of failure.

      5. Shalghar Bronze badge

        Re: I was wondering why things were so quiet today

        "some kind of refund from Microsoft/Google/whoever if the cloud is down more than like 1% of the time"

        Which means 3,65 days downtime until some sort of pseudopayout. And we are talking downtime that the "cloud" guys have grudgingly accepted to be their fault.

        Now whats "downtime" ? Nothing works at all ? Comms down to acoustic coupler level,Semaphore level,Signal fire level (1bit/minute)? After all even when only the overhead crawls around you are somewhat "connected", you just cant use it.

        And what about network outages not related to your cloudy guys ? Not sure if anyone will pay for that but i am certain that those will not be accounted for under "downtime" by your cloudies.

    2. Pascal Monett Silver badge
      Flame

      Re: if your net connection goes down, so do you

      Or, if your provider's network goes down, so do you.

      I can't help but be thankful for all this downtime because that means that, one day, some bright MBA spark might actually start spreading the gospel of "Cloud is bad for your reliability, in-house means you control things".

      And we'll get back to in-house servers that don't need fat-fingered admins from another company to fuck things up. Because the wheel keeps turning.

      Once upon a time, if your network was down, you and your clients were the only ones impacted. Today, we're all on the bandwagon of "we hope Borkzilla won't fuck up today because otherwise, we're toast".

      And to think that the people who pushed for this earn at least 10 times the salary of the people who actually work . . .

    3. JimboSmith

      Re: I was wondering why things were so quiet today

      Someone from another company I met at a meeting said that their cloud provider had gone down and they were having a COTM at that point. I asked what that was and she said Cup Of Tea Moment/Minute/Minutes depending on how long it lasted. Apparently they had quite a few of them.

  3. Bilby

    Outrage

    It's called a network outrage.

    1. Anonymous Coward
      Anonymous Coward

      Re: Outrage

      Its Microsoft, all of their error messages used to contain the word 'network', besides, they probably let an internal certificate lapse again.

      1. Shalghar Bronze badge

        Re: Outrage

        "Its Microsoft, all of their error messages used to contain the word 'network'"

        Really ?

        I am tempted to resurrect and mistreat GWBASIC just to see a "syntax network error in 10". ;)

  4. desht
    FAIL

    This cloud thing...

    Bit shit innit?

  5. JimmyPage
    FAIL

    The Peter Principle

    Everything gets promoted above it's level of competence. Ay which point you have to rely on the unpromoted parts.

    I give you ... the cloud.

    We also need to bear in mind John Glenns famous observation about sitting atop a bomb build by a collective of lowest bidders.

    You think your PHB were tight ? Imagine how much these "cloud corporations" are trying to get away with not spending.

  6. cookieMonster Silver badge
    Trollface

    Ha

    “….including one that occurred almost exactly a year ago yesterday”

    Oops, I thought I deleted that cron job last time.

    1. Anonymous Coward
      Anonymous Coward

      Re: Ha

      That would explain why the certificate didn't auto-renew...good job, well done.

  7. Chloe Cresswell Silver badge

    Client of mine was recently purchased by a larger competitor.

    Small company: VoIP phones, Teams for chat type interactions.

    Large company: Teams for everything.

    Yesterday afternoon the small part lost internal chat functions for 2 hours, but everything else they needed was working.

    But when they tried to contact their official tech support, they hit the issue that they couldn't message them (teams), couldn't telephone them (remote end telephones are via teams), etc..

    It was interesting to watch from the outside.

    1. Eclectic Man Silver badge
      Facepalm

      I ounce visited a 'Government Agency' (no, not the one in Cheltenham). Parked my car outside the gate and found a veritable horde of visitors wanting to gain entry. Trouble was that the computer system had 'gone down', taking the phone directory with it (no paper back up copy). The phone system was running, but you needed to know the extension of the person you were visiting to get in.

      Which is one reason why I insisted that all of my clients printed out their DR / BC plans on paper and had them to hand, just in case.

      1. Chloe Cresswell Silver badge

        I was doing some work at a local NHS office which is one of the area disaster centre locations.

        I'm not sure what they will do soon, as one of the jobs I did for them was to route a PSTN connection to their main meeting room, for disaster use.

        Why? Well, all their phones are VoIP, and step one in their disaster plan for anything is: disable the internet connections and inter-site connections on all sites.

        So the first step in their disaster plan was to cut the disaster management offices off from the world.

        Hence asking when I was there if I could do something to get this one left over analogue line wired in.

        Thankfully, this is SEP.

        1. Eclectic Man Silver badge
          Joke

          And here I was visualising the installation of a searchlight with a silhouette of a bat, or maybe a Boy Scout, to summon local fleet-of-foot messengers to their aid.

          https://dcau.fandom.com/wiki/Bat-signal?file=Bat-signal.png

          (Guessing copyright protection prevents a Bat Signal icon.)

          1. Hazmoid

            Maybe they could beam a version of the scout logo, after letting all the local groups know that they are required to attend their local DR centre for allocation

            https://en.wikipedia.org/wiki/File:World_Scout_Emblem_1955.svg

  8. elsergiovolador Silver badge

    Correlation

    There must be a study buried somewhere due to inconvenience that shows correlation between "the cloud" going down and productivity.

    I mean, people will get much more stuff done if they don't have to attend Teams meetings that could have been an email.

  9. Innique

    I wonder if this affected the Activiision COD servers as the kids were complaining the servers were lagging yesterday. Games being down is one thing, but a work environment is not ok, not with the work from home issues already present with internet connectivity issues. Bad weather causes water to expand when it freezes and a lot of cables and hardware don't like it. Rats are also a problem here in the Big D, one of my outside connectors was chewed a couple years ago.

    1. Eclectic Man Silver badge
      Unhappy

      A friend of mine with two children said he couldn't get their school Teams session up to get at their homework last Friday. And (worse) he couldn't get onto the fortnightly social Teams call for friends and ex-'workplace proximity associates', where we have a chat, bemoan management and generally set the world to rights without actually having to be in the same room or pub at the same time together.

  10. martinusher Silver badge

    Wasn't Horizon a sort-of cloud?

    Obviously not a cloud in the MS365 sense butI though that one of the key elements in the Horizon SNAFU is that the system upgrade did away with local journaling of transactions when it moved everything to the branch offices. The result was no resilience when the network went down -- no network, no transactions could be processed and, furthermore, when timing glitches occurred in the software there was no way to easily spot them and sort them out.

    You would have thought that someone, somewhere, would have noticed the obvious flaws in this setup. But obviously not because the scandal dragged on becoming a serious scandal because of the nrrf to CYA. Now we're constantly being told that 'the cloud' (aka "someone else's computer") is totally foolproof and utterly reliable. Except it quite obviously isn't. I know it makes business sense to lease hardware and software -- the mainfraine world was built around this model -- but we moved away from it precisely because of cost, reliability and flexibility (choose two from three). Why are we returning to it?

  11. Tron Silver badge

    It's not just the cloud.

    SAAS makes you reliant on a third party (and one which you probably don't trust), which is a real risk. If you operate from executable apps and use local storage, you can implement as many back ups as you like, including cloud storage if you want, but you have some control over your digital doings. Worst case scenario, you fire up a spare PC, install the software/back up data and away you go. With SAAS and the cloud, all you can do is read a book, twiddle your thumbs and wait. You have placed your balls on someone else's chopping block and are hoping for the best. Hoping they won't withdraw an option, an entire package or require you to upgrade to Windows EvenWorse.

    The reason the cloud and SAAS have become popular (OK, bad term, instead: common) in corporate land, is that no employee gives two shits if something bad happens to their employer. They still get paid. Worst case, they get a settlement and move on.

    1. Chloe Cresswell Silver badge

      Re: It's not just the cloud.

      Not just that. From the employers side: they want it to not be capex, but running costs.

      I've had clients say that to me. They don't want to buy things, as that comes from the capex budget, they are happy to lease services at a much higher cost over time, because it's a different budget...

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like