I thought "Cloud" was supposed to make all this stuff nigh-unbreakable, with seamless failover in the event of an issue, everything tested to destruction before being put into service, absolutely safe for us to give up all control and put all our eggs into their single basket at a much higher cost than just doing it ourselves.
Snow day in corporate world thanks to another frustrating Microsoft Teams outage
Corporate communications ground to a halt for many Office 365 subscribers around the world on Friday after a network outage left Microsoft Teams unresponsive for them for several hours. The IT goliath said it became aware of the breakdown around 1455 UTC, with a service bulletin reporting that at least some customers may …
COMMENTS
-
-
Saturday 27th January 2024 11:28 GMT abend0c4
Group communication could be robust and decentralised and, under those circumstances, take advantage of additional resilience from cloud services.
Resilience, in the end, depends to some extent on keeping the solution reasonably simple - as you add more components the chances of a fault in one bringing down the whole edifice quickly multiply.
Many cloud systems are actually just distributed single points of failure, despite their marketing.
-
Saturday 27th January 2024 04:40 GMT billdehaan
I was wondering why things were so quiet today
Also, much more productive.
I can't help but be amused by all of these outages. IT and IS departments convinced CTOs to spend massive amounts of money to outsource all of their infrastructure to the cloud, so that it would be more reliable, and yet many companies are experiencing more downtime and data loss.
It reminds me of the time some execs ordered us to save money by getting rid of those "pointless" co-located backup servers and the "useless" in-house redundant server, and just put everything into one really big box. Simple, clean, none of that "replication" nonsense that slowed things down.
It wasn't until it was fully in production (which we did under protest) that I was asked what the machine name spof.companyname.com meant. When I explained that SPOF mean "single point of failure", the CEO (the CTO's boss) went white as a sheet, and wanted us to explain what would happen if it we to fail.
One rendition of Monty Python's dead parrot sketch ("it's pinin' for the fjords; it's ceased to be; it shall be an ex-server") later, he demanded we explain and justify "our" decision to do this. Several CYA emails were displayed, and the new CTO that arrived the next month promptly reversed the decision, and we were able to restore multi-site before there was any disaster.
Today, "SPOF" is becoming synonymous with "the cloud". AWS, Office 365, and the like mean that if your net connection goes down, so do you.
-
Saturday 27th January 2024 05:58 GMT aerogems
Re: I was wondering why things were so quiet today
To be fair, there are service agreement contracts which usually mean companies get some kind of refund from Microsoft/Google/whoever if the cloud is down more than like 1% of the time. So, if it were in-house and it went down, they get nothing. It goes down and they outsource it to Microsoft, they can get a bit of money back, and no one but the beancounters and a few people who think a little too much for the corporate world will stop to realize that it still probably costs them more than running it all in-house.
-
-
Saturday 27th January 2024 17:59 GMT Darkk
Re: I was wondering why things were so quiet today
Good luck getting any big money (credit) back from your cloud providers. When they do it's not much compared to your loss of productivity and downtime which can cost the company big bucks. I know there's always a risk of losing access to these services but it's supposed to be extremely rare given all the redundancy that's out there. Ah well. I guess the system admins these days just don't have what it takes to really build a solid infrastructure. Microsoft is no exception.
-
Saturday 27th January 2024 20:02 GMT billdehaan
Re: I was wondering why things were so quiet today
Oh, there are SLA (System Level Agreements) all over the place.
The problem isn't just that the outtages themselves, it's that things that shouldn't be moved to the cloud in the first place are.
Before the cloud, there were internal backup servers, where users' Office documents were backed up. If there was an outtage of the backup server on Wednesday night, if meant that the most recent backup was Tuesday. If it didn't come back up until Thursday, that meant users were working without a net for two days. Not great, but work was still getting done.
With the move to the cloud, when the net connection goes down, that's it. No more Office access until it comes back on. Customers don't just lose backup capability, they lose access to everything, hence the term single point of failure.
-
Sunday 28th January 2024 19:00 GMT Shalghar
Re: I was wondering why things were so quiet today
"some kind of refund from Microsoft/Google/whoever if the cloud is down more than like 1% of the time"
Which means 3,65 days downtime until some sort of pseudopayout. And we are talking downtime that the "cloud" guys have grudgingly accepted to be their fault.
Now whats "downtime" ? Nothing works at all ? Comms down to acoustic coupler level,Semaphore level,Signal fire level (1bit/minute)? After all even when only the overhead crawls around you are somewhat "connected", you just cant use it.
And what about network outages not related to your cloudy guys ? Not sure if anyone will pay for that but i am certain that those will not be accounted for under "downtime" by your cloudies.
-
-
Saturday 27th January 2024 10:56 GMT Pascal Monett
Re: if your net connection goes down, so do you
Or, if your provider's network goes down, so do you.
I can't help but be thankful for all this downtime because that means that, one day, some bright MBA spark might actually start spreading the gospel of "Cloud is bad for your reliability, in-house means you control things".
And we'll get back to in-house servers that don't need fat-fingered admins from another company to fuck things up. Because the wheel keeps turning.
Once upon a time, if your network was down, you and your clients were the only ones impacted. Today, we're all on the bandwagon of "we hope Borkzilla won't fuck up today because otherwise, we're toast".
And to think that the people who pushed for this earn at least 10 times the salary of the people who actually work . . .
-
Sunday 28th January 2024 18:22 GMT JimboSmith
Re: I was wondering why things were so quiet today
Someone from another company I met at a meeting said that their cloud provider had gone down and they were having a COTM at that point. I asked what that was and she said Cup Of Tea Moment/Minute/Minutes depending on how long it lasted. Apparently they had quite a few of them.
-
-
Saturday 27th January 2024 09:05 GMT JimmyPage
The Peter Principle
Everything gets promoted above it's level of competence. Ay which point you have to rely on the unpromoted parts.
I give you ... the cloud.
We also need to bear in mind John Glenns famous observation about sitting atop a bomb build by a collective of lowest bidders.
You think your PHB were tight ? Imagine how much these "cloud corporations" are trying to get away with not spending.
-
Saturday 27th January 2024 11:06 GMT Chloe Cresswell
Client of mine was recently purchased by a larger competitor.
Small company: VoIP phones, Teams for chat type interactions.
Large company: Teams for everything.
Yesterday afternoon the small part lost internal chat functions for 2 hours, but everything else they needed was working.
But when they tried to contact their official tech support, they hit the issue that they couldn't message them (teams), couldn't telephone them (remote end telephones are via teams), etc..
It was interesting to watch from the outside.
-
Sunday 28th January 2024 15:26 GMT Eclectic Man
I ounce visited a 'Government Agency' (no, not the one in Cheltenham). Parked my car outside the gate and found a veritable horde of visitors wanting to gain entry. Trouble was that the computer system had 'gone down', taking the phone directory with it (no paper back up copy). The phone system was running, but you needed to know the extension of the person you were visiting to get in.
Which is one reason why I insisted that all of my clients printed out their DR / BC plans on paper and had them to hand, just in case.
-
Sunday 28th January 2024 20:33 GMT Chloe Cresswell
I was doing some work at a local NHS office which is one of the area disaster centre locations.
I'm not sure what they will do soon, as one of the jobs I did for them was to route a PSTN connection to their main meeting room, for disaster use.
Why? Well, all their phones are VoIP, and step one in their disaster plan for anything is: disable the internet connections and inter-site connections on all sites.
So the first step in their disaster plan was to cut the disaster management offices off from the world.
Hence asking when I was there if I could do something to get this one left over analogue line wired in.
Thankfully, this is SEP.
-
-
-
-
Saturday 27th January 2024 15:31 GMT Innique
I wonder if this affected the Activiision COD servers as the kids were complaining the servers were lagging yesterday. Games being down is one thing, but a work environment is not ok, not with the work from home issues already present with internet connectivity issues. Bad weather causes water to expand when it freezes and a lot of cables and hardware don't like it. Rats are also a problem here in the Big D, one of my outside connectors was chewed a couple years ago.
-
Sunday 28th January 2024 15:29 GMT Eclectic Man
A friend of mine with two children said he couldn't get their school Teams session up to get at their homework last Friday. And (worse) he couldn't get onto the fortnightly social Teams call for friends and ex-'workplace proximity associates', where we have a chat, bemoan management and generally set the world to rights without actually having to be in the same room or pub at the same time together.
-
-
Sunday 28th January 2024 02:35 GMT martinusher
Wasn't Horizon a sort-of cloud?
Obviously not a cloud in the MS365 sense butI though that one of the key elements in the Horizon SNAFU is that the system upgrade did away with local journaling of transactions when it moved everything to the branch offices. The result was no resilience when the network went down -- no network, no transactions could be processed and, furthermore, when timing glitches occurred in the software there was no way to easily spot them and sort them out.
You would have thought that someone, somewhere, would have noticed the obvious flaws in this setup. But obviously not because the scandal dragged on becoming a serious scandal because of the nrrf to CYA. Now we're constantly being told that 'the cloud' (aka "someone else's computer") is totally foolproof and utterly reliable. Except it quite obviously isn't. I know it makes business sense to lease hardware and software -- the mainfraine world was built around this model -- but we moved away from it precisely because of cost, reliability and flexibility (choose two from three). Why are we returning to it?
-
Sunday 28th January 2024 19:32 GMT Tron
It's not just the cloud.
SAAS makes you reliant on a third party (and one which you probably don't trust), which is a real risk. If you operate from executable apps and use local storage, you can implement as many back ups as you like, including cloud storage if you want, but you have some control over your digital doings. Worst case scenario, you fire up a spare PC, install the software/back up data and away you go. With SAAS and the cloud, all you can do is read a book, twiddle your thumbs and wait. You have placed your balls on someone else's chopping block and are hoping for the best. Hoping they won't withdraw an option, an entire package or require you to upgrade to Windows EvenWorse.
The reason the cloud and SAAS have become popular (OK, bad term, instead: common) in corporate land, is that no employee gives two shits if something bad happens to their employer. They still get paid. Worst case, they get a settlement and move on.
-
Monday 29th January 2024 18:11 GMT Chloe Cresswell
Re: It's not just the cloud.
Not just that. From the employers side: they want it to not be capex, but running costs.
I've had clients say that to me. They don't want to buy things, as that comes from the capex budget, they are happy to lease services at a much higher cost over time, because it's a different budget...
-