Another Wobbly Service...
And how unexpected is it that there would be a cascade of additional events related to the first?
Amazon Web Services has revealed that its efforts to recover from the massive mess at its US-EAST-1 region caused other services to fail. The most recent update to the cloud giant’s service health page opens by recounting how a DNS mess meant services could not reach a DynamoDB API, which led to widespread outages. Down the …
The counter argument, that you get a bunch of highly-skilled operations staff working 24/7, a service you could not individually afford, unsurprisingly doesn't seem to have got much of an airing over the last few hours.
The biggest problem, it seems to me, is not the outsourcing per se (there will be a significant body of customers for whom it makes economic sense in principle if the implementation is right), but the small number of providers. The potential international economic effect of such a huge chunk of infrastructure going out simultaneously is a risk that transcends any one particular customer's interests.
These vast enterprises must be broken down into smaller units. Not only for resilience, but to encourage competition, necessitate the development of standards that allow services and data to be migrated and to restore the balance of power between consumer and provider - indeed, between nation state and provider. This is a warning that requires an urgent response.
Breaking them up without some form of legislation only kicks the can down the road. They'll just grow and merge into different incarnations of the same thing again. Just look at the so-called "baby Bells" in the USA and what they are now. A few large conglomerates with more or less defined monopoly regions and very little competition under new names.
You are better off with your stuff on prem if you are operating an intranet and never connecting to the public internet. And have good back up policies. It could be armageddon out there, and your intranet would keep on humming on a UPS, back up and then neatly shut down.
Those who outsource are better off if there is only a small number of services, not lots of services of varying sizes, or national ones (run by governments, spying on everything you do).
With a few vendors, it is immediately apparent that there is a problem and skilled people can be thrown at it. Plus, credit to Amazon for the transparency.
If you increase the number of vendors you dilute your chances of getting it fixed (or even noticed, if it happens at 3am). And if it is run by govt, don't expect any transparency.
It is not unusual for individual services to fall over and for there to be no public recognition that they have for hours, sometimes days. If the big services fall over, everyone notices and fixes begin quickly.
Anything operated on nation state lines immediately becomes a target, so that is something to avoid.
"And have good back up policies"
What stuns me is how few businesses have planned how they will work without the Internet. It's so bloody obvious that one day it could all get brought down by either a natural event or war. In the early days of online payment shops used to keep those manual card readers but no longer. They rely purely on low cost Internet services. A few don't even take cash now. Large businesses seem to have given up any hope of survival without the Internet. It hard to get many to even have validated backups and tested redundancy.
So where does it leave us? If there is a world war millions may die, even without nukes, not from bullets but starvation if all the Internet links are severed. With EMP nukes it would be billions dying of starvation or fighting for the last crumbs with most electronics gone. Even if you live in a rural area people will go mad, damage crop cycles and simply take animals with no thought of replenishment unless you can get communities to work to defend it. One farmer alone, even if armed wont stop it.
Elon Musk is deluded if he thinks going to Mars will help us survive a future nuclear war. By the time you can support a sizable colony on mars they will be involved too. But I'll concede it is a first step to humanity's resilience.
So let's not let our stupid and ridiculous leaders, lead us into conflagrations. It's about time we stopped listening to the constant propaganda especially from the EU globalist bureaucrats who just want absolute control. The mad idiots. I'm all for strong defence and more military spending but I emphasise for DEFENCE.
>So where does it leave us? If there is a world war millions may die, even without nukes, not from bullets but starvation if all the Internet links are severed. With EMP nukes it would be billions dying of starvation or fighting for the last crumbs with most electronics gone. Even if you live in a rural area people will go mad, damage crop cycles and simply take animals with no thought of replenishment unless you can get communities to work to defend it. One farmer alone, even if armed wont stop it.
It doesn't even need to be a nuclear war. A decently sized Carrington Event would take down much of civilisation and have a similar effect on the human population...
The big outages on cloud services aren't the problem for customers. Amazon throws everyone they have at these kinds of problems and no one goes home until it's fixed.
The problem is when it's a smaller outage and only affects some of their clients. Then you get regular staff working on it and at end of shift, if you are lucky, it gets handed off to someone else who has to get up to speed on the problem. If you were working this problem on site your company would be sparing no resources to get you back online. With a cloud provider the urgency is based on how big of a customer you are. And note, if you spread your business across multiple cloud providers you become a smaller customer even though you are buying the same, or more, resources.
Problem is we have a habit of behaving like sheep and rushing headlong into the latest trend. Cloud is still considered the place to be although well over the peak of hype. Combine that with a lack of professionalism in assessing architectures and environments and decisions are made on emotion and convenience. The Hyperscalers, whilst warning shit happens don't seem very upfront on their internal dependencies. If your service matters you need to pay more to avoid outages by taking more time to build it resilient.
Sounds a bit anti captalist of you... Shouldnt the "market" be sorting this out? If companies lose trust in AWS with enough of these events wont customers hop over to other platforms / providers?
Obviously it was a financial motivation rather than technical but you can look at vmwares market share predictions since the licensing changes
Exactly @abend0c4, the problem isn't using a cloud service, but rather a significant proportion of the planet using one region of one cloud service provider.
Breakup would be consistent with previous monopolies (Standard Oil)
One potential solution: the cloud service industry could be regulated for certain non-negotiables (like resilience).
Another: the cloud service industry could attest independently to the level of resilience it has (giving the option of you contracting to a less costly service at the expense of flakiness)
What we have today is the illusion of resilience, but the real-existing experience of it falling over.
This post has been deleted by its author
Because every other supplier who answered the request for tender apart from AWS and Azure shat the bed. As one of those suppliers was Fujitsu with their cloud offering (since throttled in its cradle) you can guess how badly the bed linen was soiled. Some of the other suppliers weren't as good as Fujitsu.
With something like HMRC, it ought to be run in-house for security and accountability. They are a big enough organisation that they can resource it at the appropriate scale - it's not like some small company using cloudy services because it allows far more functionality than they could provide in-house.
We are lead by selfish leaders from the top down. It has spread the culture to the institutions of government. Maybe this is all by intent to bring about the reset but none the less the people are not being served by institutions that are supposed to serve them. We pay for HMRC as we pay for Parliament. They are supposed to be our employees but they have managed to reverse the intentions of democracy. Ask yourselves when dealing with a government institution do you feel like the boss or the bossed?
"If" they truly served us we could get rid of 80% of the nonsense and there would be plenty of money to build professional and highly resilient systems for the remaining 20% of services that we actually need to thrive. Most of the stuff done is for control and politics. We do not need a complex tax system, we do not need digital id or digital money we already have it and it works. When they say digital they mean under their control. We are being treated as cattle, we need to be the farmers.
“Over time we reduced throttling of operations and worked in parallel to resolve network connectivity issues until the services fully recovered. By 3:01 PM, all AWS services returned to normal operations, meaning problems persisted for over a dozen hours after resolution of the DynamoDB debacle."
Is that East Coast time?
This post has been deleted by its author
Nope. We were still having problems until around 19:00 Eastern. We did not have full and complete access until 19:33 Eastern. Subtract three hours for West Coast time. Add five hours for GMT. Some of our systems were operational at 08:00 but failed by 11:00, some were dead at 08:00. So that's eight to eleven hours of no service, which could have been avoided if only our stuff was on prem.
Mmmm not necessarily but there are certainly more dominoes than admitted. But let's not forget things go wrong wherever hosted the bad thing about hyperscalers is the cross border dependencies which are not clarified. Even if they were, not all would be even known. Ideally they would move towards full regional independence so the failure of any service elsewhere only affects its region unless the client has designed otherwise. But money matters.
The entire internet, intranet, your on-prem infrastructure is ALL built on a line of dominoes known as DNS! When DNS fails, everything fails!
Dork up your DNS in Active Directory and watch your entire enterprise come to a halt!
All of these services and their dependencies rely on functioning DNS to work because IP addresses change as instances are spun up, swapped over, etc.
It kills me how you server jockeys think you can provide the complex interdependent infrastructure that AWS provides On-Prem.
Yup...... computers are fantastic, and productive, and useful....................
...............until they aren't!! Just ask:
- Jaguar Land Rover
- Asahi
- Cloudflare
- SolarWinds
And now:
- Alexa services
- Ring doorbell setrvices
- Signal
- Lloyds Bank
- HMRC
I wonder if any of the accountancy types who promised that "the cloud" would be a fantastic replacement for "on premises" data centres are available for comment?
....or Gartner?
[quote] I wonder if any of the accountancy types who promised that "the cloud" would be a fantastic replacement for "on premises" data centres are available for comment?
....or Gartner? [/quote]
we can all wonder, but I will not be holding my breath :o)
This post has been deleted by its author
It just shows hyperscaler datacenters are not as isolated as they claim and people need to think carefully top down what is important before rushing headlong into using vendor specific services. They're great if a day's outage doesn't matter too much but I heard UK Gov's OneLogin was affected - doh! Don't you worry about Digital ID, it will keep you safe from living. The hyperscalers need to be VERY clear about what is reliant on global services and how so people can make sensible decisions. To not be transparent will limit their market.
Let me corrupt all the SRV records in your internal DNS and see how long your wonderful on-prem infrastructure stays working!
The problem was not the cloud, it was not the architecture, it was a human fucking something up! Which can happen on-prem just as well as in the cloud!
True, if I cock up a load of DNS service records, down goes my on-prem infrastructure.
Except that’ll just be ‘my’ on-prem infrastructure, not yours, not Ring, not Snapchat, not Single etc.
And not a significant fucking chunk of the global infrastructure. That’s the difference, ‘embrace the cloud’ is pushed and marketed as ‘vastly more resilient and reliable than going it yourself’, which is probably true, although the more, err, delusional, advocates will claim it’s ‘perfect and never goes wrong, promises 100% reliability’. Which, of course is complete bullshit, but it impresses CEOs!
Obviously the claim that everyone is distributed and has no single point of failure is demonstratively wrong? Hypothetically North Virgina suffered from an earthquake, volcanic eruption, meteor impact, and US-EAST-1 isn’t coming back in the near future, or it was a simple human error but fixing it took a lot longer than 12 hours, then what? Why does some UK government sites go down, even though they are hosted in local data centres, because of an issue in N Virginia?
This shit has not actually been properly thought through, has it?
Just an old gray tech pensioner here, waiting out my last days on the net and learning so much, as always, from all the Reg comments. I was a marketing sod who worked inside some Sacred Holy Engineering companies, now long gone or zombified beyond recognition, such as Honeywell Information Systems (did Engineering Design Reviews on Multics terminals), Digital Equipment (Ken Olsen was a hoot even if he pretended to be an Ogre), and LTX (once had Sol Max explain to me how to measure timing signals using a delay line). I took the trouble to get my CCNA because I eventually realized that the real Alchemy was done in the network, and learned to love IPv4 and doing subnets manually. So my question, dear brothers and sisters, is what happened to the OSI 7 layer model and having everyone talk together peacefully over this internet thing. We learned all about BGP and DNS etc etc and how fucked up it all is, but it more or less worked, right? Is this the lesson of AT&T, Bell Labs, and SS7 all over again? I can pick up a phone in Nutley NJ and call my cousin in Dublin, right? Ok, ok I know, it's all about scale, and only AWS can solve that. Or is this the pseudo-philosophical battle between SNA and Decnet Phase V? For the love of Cthulthu can anyone get us back to just talking to each other, with some packets?
Cloud only where it makes sense. That's about 5% of things. Ppl I work with are laughing all the way to the bank today and yesterday.
Hey ya'll!
If you think "cloud" is just the bee's knees, then you're really gonna love you some of this "AI chatbot!"
Buy buy buy! you mindless consumers.
Let's have a global network of computers with access to all information ever created. Let's store personal details on these machines. Auntie Ethel who is in her 90's suddenly needs a degree in computer science to work these things, as does 7 year old Herbert. Let's have VPN's where you can appear to be in another country. You can view stuff that's not intended nor relevant to you! Online banking/bill paying! Entertainment and sports of your choice anywhere in the world, available 24/7! Binge watch TV! Remote Access to any of these machines! Make your home smart, for ultimate convenience! Send messages to family and friends wherever they are in the world! Blue light keeping you awake when you should be sleeping! Simple password access! Share pictures of your dinner with the world! Be liked by everyone! No bad stuff! Great ideas with the best of intentions! What could possibly go wrong?
We are under pressure from our owning company to get moved off prem and onto cloud based asap. I'm dragging my feet somewhat.
We did recently moved from Sage on prem to Xero. Guess which cloud provider hosts Xero....
Our MSO that we use for 3rd line support were dead in the water as their monitoring, ticketing and VoIP are all hosted on AWS. Some of our sister companies have custom apps, many of which it seems run on AWS.
We had an urgent heads of IT meeting, where there was a lot of flapping going onm I just stayed muted and carried on working as normal. The overall head of IT at the owning company did ask if I was feeling smug. I just said that accounts are annoyed as they can't access Xero, but apart from that, no not smug as pride comes before a fall.
I just reminded him that there is a few good reasons we haven't moved from on prem and aren't rushing to do it now. It was actually nice to have a quiet day without being bombarded with data export requests etc.