back to article The perfect AWS storm has blown over, but the climate is only getting worse

When your cabbie asks you what you do for a living, and you answer "tech journalist," you never get asked about cloud infrastructure in return. Bitcoin, mobile phones, AI, yes. Until last week: "What's this AWS thing, then?" You already knew a lot of people were having a very bad day in Bezosville, but if the news had reached an …

  1. Acrimonius

    Future on a wing and a prayer

    Systems have grown far too complex for anyone with a hand on heart to say the system design, the coding and the testing can be relied upon. More reslience and redundancy just adds more complexity that fewer and fewer can grasp or unravel. Buggy software added on top of buggy software. Complexity unfair on coders as they are only human. Will be like headless chickens soon. Cause and effect too deep to analyse. Root cause never found. Fixes become temporary and actually add even more fragility.

    1. cookiecutter Silver badge

      Re: Future on a wing and a prayer

      systems are complex because each project is done in isolation.

      "we want to do X" - OK get in Accenshite or tati consultancy services & pay them a shit load of money to do it in isolation - then they leave

      "we want to do Y" - OK get krappyMG or infoshite & pay them a shit load of money to do it in isolation - then they leave

      "we want to do Z" - OK get craptia or shitpro in and pay them a shit load of money to do it in isolation - they they leave

      At no point is anyone looking at the big picture. At no point is anyone doing the 20-30 year career overview across the environment and at no point does anyone come down and plan this stuff.

      Someone comes in, they don't sit there for 6 months (which is what they SHOULD do), see what is already there, see how it fits together, see where it makes sense to keep things the same and where it makes sense to change things.

      At no point does anyone ask "What is it that you ACTUALLY WANT TO DO?" - it's shovels in the ground and lets go so we can justify our high costs & as they actually work it out, charge them more.

      There's a reason 99.5% of large projects across the entire world, across every country, across every industry over the lsat 200 years has failed on cost, deadlines or actual benefit of the program.

    2. The Organ Grinder's Monkey Bronze badge

      Re: Future on a wing and a prayer

      A wise (and now long retired) man once told me that the space shuttle's computer was the last significant system ever built that had had every possible combination of input & output tested. Increasing complexity made that effectively impossible from then onwards.

    3. DS999 Silver badge
      Devil

      Don't worry

      There are companies willing to sell you a solution to that problem. They will tell you to hand over all that too complicated for humans work to AI!

  2. Bluck Mutter

    it's too late baby, now, it's too late

    Launch dates for the big three:

    aws - 2006

    azure - 2010

    google cloud - 2008

    Thus these are mature businesses with associated legacy debit.

    On top of that they have all moved fast to provision ever new services thus amplifying legacy debit.

    And on top of that, they regularly undergo large purges of people, with little logic as to what knowledge the fired employees might have so important IP and institutional knowledge goes out the door.

    And on top of that on top of, onperm tech staff are passionate about the systems they design, deploy and maintain, they have a sense of pride cause they have skin in the game... cloud staff are in many cases an off shore commodity that don't have the mandate nor time to care.

    And finally, due to time to market pressures, the old waterfall project delivery that I grew up with is dead, it's all [FR]Agile... don't need to worry about all the nitty gritty ... just get something out the door (a MVP) and we can patch it later but they never do.

    Summary: if they didn't account for every potential gotcha upfront as you might in a waterfall deployment then there are close to two decades worth of unseen "stuff" that can trip them up.

    Bluck

    PS. Do these cloud providers ever do DR tests....probably not cause it's all too big so the probability of it turning to shit is huge so why risk it.

    1. ecofeco Silver badge

      Re: it's too late baby, now, it's too late

      All of that and then some.

      Just like people failing to know what the words trickle and down meant, so to did no one realize what "move fast and break things" meant.

  3. Dr Who

    When a butterfly flaps its wings ...

    The internet as a thing could be compared to the weather, or the climate. Chaos reigns and there are tipping points everywhere. And to those who insist on saying "that's the cloud for you - on prem only for me", you may as well say the same of the electricity grid, or the road network. Whether we like it or, it is woven into our lives in a myriad of ways.

    1. Bluck Mutter

      it's not the internet....

      The internet is the plumbing between the end points (AWS, your PC etc).

      Even when it's DNS, it's their internal DNS not the internet's DNS.

      Bluck

      1. Dr Who

        Re: it's not the internet....

        Fair comment. Replace "internet" with "cloud" or whatever term you want to use to describe the internet and everything that is connected to, and depends on, it and the point still stands.

  4. Caver_Dave Silver badge
    Angel

    A resilient edge

    Back last century I built a system (H/W and S/W) to conduct the front end transactions (transparently multi-protocol) with hundreds of Pharmacist ordering drugs from a wholesaler (some ordering many times a day). '486SX with 50 modems talking to a mainframe (it was an early '486SX PC and so that dates it for you). The mainframe was so unreliable that front end had to transparently buffer all of the transactions until the mainframe appeared again and was updated with the orders, on a very regular basis. And because the IT Manager of the wholesaler had recently heard about TSR's, the contract insisted that the front end ran as a TSR! It ran without interruption 24/7 for nearly 10 years until decommissioned. A resilient edge has been a thing for a long time.

    1. Bluck Mutter

      been there, done that.

      The old "store and forward" messaging system.

      Send the message, don't remove it from the queue it until you know it reached the end point ok and then archive it just in case you need to roll forward.

      Bluck

  5. Anonymous Coward
    Anonymous Coward

    It's always going to be a problem, because bean-counters are at the helm. C-Suite are beholden to the share holders and their next big bonus payout. Getting rid of technical (aka expensive) expertise is always the go-to to make whatever insane profit margins that are being demanded. As the technical knowledge pool diminishes, quick hacks to get services back up and running become BAU, and lack of time/investment just ends up leaving those hacks in place (documented or otherwise), just waiting for a time to rear their ugly heads.

    Now I'll be the first to admit I am getting old and jaded - but I have seen this play out countless times. I don't see that changing during my career.

    1. This post has been deleted by its author

    2. ecofeco Silver badge

      Slight correction: CxOs are the MAJORITY shareholders.

      So yes, a thief does indeed put themselves first above all else, like rules and laws.

    3. David Hicklin Silver badge

      > getting old and jaded

      Just get yourself over the retirement line - you never look back !

  6. cookiecutter Silver badge

    Enshittification and lock in

    This was totally predictable and completely avoidable.

    The fact that idiots are STILL going on about Cloud First is madness, especially as they put stuff in Oracle Cloud after decades of seeing how Oracle treat their customers.

    I was able to get off Facebook easily, something a LOT of people can't seem to do because of the perceived societal lock in.

    However, the hyperscalers are using exactly the same methods of enshitifcation to get idiots into their clouds and now they're locked in. The figure that I heard was for ever $1000 you spend going into the cloud will cost you $10,000 getting out.

    Stage One: Be good to users - Cloud was cheap, easy to get into and you could move your loads in easily.

    Stage Two: Good to Business customers - Cloud made it easy for vendors to sell it, other vendors to create virtualised versions of their products and sell them to customers in the cloud

    Stage Three: A Giant Pile of Shit - Where we are now.

    We have seen Amazon become more or less useless and they are still making money. Same with Anything Microsoft.

    Azure had the Russians and Chinese bouncing around for 6 months before they noticed, Chinese engineers working on DoD machines, a 10 hour SQL outage across teh whole of South America,

    AWS is well on the way to the kind of nightmare Azure is. OCI will ramp up its costs as soon as various governments have loaded their stuff on there. Google regularly deletes peoples data and yet developers are still allowed to put their stuff there!

    Genuinely on some pricing, I could have migrated the last project I was on at 1/2 the price if not 1/3 of the price that the project was costing by doing a VMware migration and in 1/3 of the time rather than them sticking it into the cloud, but everyone makes more money for their shareholders sticking it in the cloud & "decision makers" get to look "up to date" being cloud first. The fact that the morons are probably now going on about somehow "Using AI" makes it even more depressing.

    1. ecofeco Silver badge

      Re: Enshittification and lock in

      I've said it before, most large companies don't choose a vendor by price, service and reputation. They choose by stock performance.

      Because conflict of interest is the name of the game these days.

  7. glennsills@gmail.com

    Too big to fail

    The technical problem is daunting but there might be non-technical remediation. If instead of hosting everything on Amazon, Google, and Microsoft cloud platforms, why not encourage lots of smaller cloud platforms with different teams. Currently lawyers of the big three would say that there is plenty of competition, but why not legally define competition as dozens or perhaps hundreds of cloud hosting companies. Each of these new companies would have the same problems as the big three, but on cloud hosting provider having problems would cause a lot few problems.

    1. Jellied Eel Silver badge

      Re: Too big to fail

      The technical problem is daunting but there might be non-technical remediation. If instead of hosting everything on Amazon, Google, and Microsoft cloud platforms, why not encourage lots of smaller cloud platforms with different teams.

      Why not get the C-suites heads out of the cloud (or other orifice) entirely? Suggesting smaller providers is really SS;DD with the added bonus of a smaller provider might be more likely to go titusup.com. But it can be a hard sell, or just demystifying what clouds actually are. So showing C-suite pics of me leaning against a cloud. Yes, it's a rack of HPE kit. Yes, you could buy that yourself. Or lease it. But then the really tricky bit.. can you manage it, which means having capable IT staff rather than just paying an invoice produced by the RNG at the 'cloud' provider.

      But then all that business critical stuff would actually be under the businesses control, rather than some third party who might be less motivated if your business becomes unreachable, gets hacked, and is very unlikely to agree to consequential losses, and will probably just offer some worthless service credits by way of compensation. CFO tends to be the hardest sell, given OPEX impact of having IT staff vs outsourcing, but in one fun client, the CEO was reluctant to put their business into the hands of a cloud provider and solved this problem by just bluntly telling the CFO that this was their decision.. Who promptly folded and they went with a private 'cloud' solution in-house. CFO grumbled, but we massaged their ego by pointing out they could play financial engineering and cross-charge IT services to shunt money around, with the help of their Treasurer. Amazon, MS, AlphaGoo do this with your money, so why not use the same tax shenanigans yourself?

    2. Duncan Macdonald
      Unhappy

      Re: Too big to fail

      Unfortunately politicians are corrupt. If a proposed measure would harm the profits of big companies then a few well placed backhanders will ensure that it does not pass (or at least is delayed for decades).

      For a couple of examples look at the tobacco industry delaying laws that would require health warnings and the auto industry fighting the ban on lead in petrol.

  8. thrillster

    I think we are headed or have reached 'Peak Cloud'.

    1. Jellied Eel Silver badge

      I think we are headed or have reached 'Peak Cloud'.

      I think/hope so. Software isn't really my thing, but I'm curious. If someone is competent at driving/managing AWS, I'm thinking they should probably also be easy to transition into a private cloud environment. So buy HP, Dell, IBM racks and roll your own 'cloud'. Then if doing that would have other business benefits, like the potential to do things you can't do, or do easily/cheaply in an AWS or MS environment. Then I guess that would be offset by the carrots dangled to keep businesses locked into 'cloud' solutions.

      I'm thinking thanks to Moore's Law, a business can get a lot of 'cloud' in say, 8 racks, doubled for diversity and in the right locations, the networking shouldn't be that expensive either. It certainly need not be very complicated.

      1. Graham Cobb

        Unfortunately, it doesn't really work like that. The services the cloud vendors are offering, and the competances need to drive them, are really not the same as building your own. And driving the Amazon cloud to spin up new instances, or create new services (let alone buy the right amount of capacity to handle workloads and allow for failures) is not the same as doing the same thing in Google, for example.

        However, it could be made much simpler if the (big) commercial cloud users forced cloud vendors to standardise. Then we could see real competition (and even workload mobility) between clouds, and even see the hardware vendors offering the same tools and APIs for on-prem deployments.

        The power is in the wrong place at the moment - it is with the big 3 cloud vendors, not their customers.

  9. retiredFool

    Cloud like insurance

    Seems like a good idea until there is a problem. State Farm recently demonstrated to me that all those premiums I paid turned out to be worthless. What is the phrase, deny, delay, defend. I'm in stage 3, lawyer seems to be the only one winning.

  10. Gene Cash Silver badge

    Howzabout

    Don't fire the people that know WTF is going on?

    Is that a thought? Nope, didn't think so.

    1. Anonymous Coward
      Anonymous Coward

      Re: Howzabout

      It's worse than that.

      The latest fad (AI) means fewer younger admins learning from the current ones. The idea that AI can replace engineers as opposed to just being a tool in an engineer's tool bag is accelerating the decline.

  11. JWLong Silver badge

    AWS

    Just read today AWS is kicking another 30,000 to the door!

    Guess that big AI spend has to be paid for someway.

  12. I could be a dog really Silver badge

    Fortunately, resilience can be improved from the bottom up rather than waiting for top-down to happen. As responsible individuals, within departments, at board level, or as industry groups, what-ifs can be wargamed. You know what a power outage means, and what level of power pack, UPS, or backup generator is worth having. If AWS was to go away for weeks instead of hours, what would it look like? What would redundancy look like? Can you afford an experiment or two? Can you afford not to?

    If you've never had this sort of conversation about any or all of your core technologies and services, then you're part of the problem.

    I appreciate the sentiment, but let's look at what those conversations look like in reality :

    Techie: We really should have a UPS, it'll cost £X, and as a minimum allow us to shut down servers cleanly and avoid data loss.

    Beancounter: We've not had a power cut for $time, waste of money, no you can't have it.

    Then when the power does go off, it's "WTF hadn't you planned for that ?"

    I've had a very similar conversation at one employer. We literally did go years without power cuts (underground cable plant, urban environment), so the UPS was allowed to fester and eventually die. Conversely, at another before that, we had many short power cuts (overhead line plant, rural environment) so no problem justifying a working UPS.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon