
Another day....
Another Azure failure
O362.5 for the win!
Microsoft Azure's UK South storage region developed a distinct wobble today. The company reckons only a subset of customers have been affected by the "availability issue", which began at 13:19 UTC, according to the status page. @AzureSupport We're getting timeouts/failures both in the Azure portal and in applications (e.g. …
"Starting at 13:19 UTC on 10 Jan 2019, a subset of customers leveraging Storage in UK South may experience service availability issues. Engineers have identified that a single storage scale unit experienced availability issues for a subset of storage nodes, and are working with other teams to confirm the mitigation path for this issue. Resources with dependencies on this scale unit, may also experience downstream impact.
The next update will be provided in 60 minutes, or as events warrant. Last updated at 19:54 UTC"
"Engineers have identified that a single storage scale unit experienced availability issues for a subset of storage nodes, and are working with other teams to confirm the mitigation path for this issue."
And therein lies the problem with the hyperscale cloud.
Its all well and good with 99.999999999999999999% availability and all that jazz....
Until it spectacularly fails in an unexpected fashion. Then it becomes a real beast to troubleshoot because of all the complexity that is normally hidden by layers of complex automation. That leaves you needing entire "other teams" worth of man hours instead of a handful of sysadmins.
AWS, Azure, GCP none of those guys have "99.9999999999999999%" availability.
In fact, they have three 9's which is equal to 44 minutes a month or almost NINE hours a year. The twelve or fifteen 9's of availability is durability for their storage. You wont LOSE data but you might not be able to access it!
Oh, get me the tank!!! I am laughing so hard it is breaking my ribs. Trust Micro$hit to make a mess, sweep it under the carpet and then try to sell you the broom! our fantasy Football site, I mean the one my mates and I play on, is screwed at the moment. 18 hours after it went down nothing whatsoever has changed. I advised my mates in the league to open a fine bottle of French Red, sit back in a comfortable chair, relax and laugh as this one gets worse. there's nothing we can do about it except pray to Bacchus that our season isn't over.
People saying Azure availability sucks, or whatever, seem to be missing a rather large point.
Azure is a huge collection of services, spread across a large number of locations. A subset of users of a single service at a single location had an issue for a few hours.
The rest of the Azure platform carried on as normal.
Just like any IT system, Azure is just a platform, you as the end user need to make the right choices when deploying to it. Like deploying your services across multiple locations. This downtime should realistically not have affected anyone if they'd designed their systems properly!
Biting the hand that feeds IT © 1998–2021