back to article Developer made one wrong click and sent his AWS bill into the stratosphere

Welcome to another week of work, a moment The Register celebrates with a new installment of Who, Me? It's the reader-contributed column in which you 'fess up to follies, false moves, and faux pas – and explain how you escaped. This week, meet a reader we'll Regomize as "Chase" who develops free open source software for …

  1. PRR Silver badge

    > Amazon did the right thing and forgave 40 percent

    The Right Thing would be a bit of AI (or even BASIC) which would look at past use and predict new charges. "This change will increase your bill by about $38,590 a month, is that what you want?"

    1. Anonymous Anti-ANC South African Coward Silver badge

      And use a cheerful Clippy to deliver the news...

      1. Someone Else Silver badge

        Having the news delivered by Clippy would cause the news to summarily ignored by anyone with an IQ above room temperature.

        Oh, wait! That can exclude large swaths of what passes for management these days...

    2. simonlb Silver badge

      If the machine image is using a template then a confirmation email listing the changes along with highlighting anything which could increase the costs when saving it would be nice to have. Oh, if only there were some sort of intelligent software you could use to 'innovate' mundane stuff like this...

    3. SVD_NL Silver badge

      And make your customers aware of the insane costs your service is about to incur? No thanks, people might actually stop making these mistakes!

    4. Roland6 Silver badge

      The is this: https://www.theregister.com/2020/09/28/aws_cost_anomaly_detection/ from 2020.

      From memory, when investigation a clients AWS spend, the tool wasn’t particularly easy to set up to give meaningful/useful data; although probably not helped by not having much history to enable the creation of a baseline.

    5. This post has been deleted by its author

    6. sketharaman

      This is the kinda thing that 800 lb chimpanzees typically leave for 8 lb add-on software providers.

  2. Korev Silver badge
    Coat

    Sounds like he needed to find a way to block storage...

    1. David Newall

      need to put a different spin on it

      1. Korev Silver badge
        Coat

        Yeah, he's gone off on the wrong track

        1. b0llchit Silver badge
          Coat

          and kept seeking

          1. FIA Silver badge

            Ah, the Register comment section, when you can always count on people to HAMR the puns home.

            1. Jou (Mxyzptlk) Silver badge

              I must park my head now...

    2. Red Ted
      Coat

      Perhaps he needed a delay line in the code?

  3. This post has been deleted by its author

  4. SVD_NL Silver badge

    WHAT?!?

    Sponsor: "Wait, you don't have cost alerts and budget caps set up?!?!"

    Chase: "I do now!

    1. Roland6 Silver badge

      Re: WHAT?!?

      On AWS these aren’t particularly helpful, other than perhaps remind you to look at the console ASAP as costs will continue to ie to be incurred over and above your budget cap.

      To aware that anything has changed on AWS to actually give customers hard must not be exceeded budget caps, since the last time an unexpected budget overspend on a free tier account happened a few years back.

      1. FIA Silver badge

        Re: WHAT?!?

        Still though, if someone else is being nice enough to pay your bills, taking the time to check them now and then might also be prudent.

      2. doublelayer Silver badge

        Re: WHAT?!?

        True, and if AWS could add more options including a method of automatically stopping some things when a budget was hit, that would be better. You can script that, but you would have to plan what you wanted to do in an unexpected funding situation ahead of time. However, if all they have is alarms, then you should at least set the alarms.

        1. collinsl Silver badge

          Re: WHAT?!?

          Sadly AWS are never going to introduce a hard cap when it's not in their interests to do so - unintended overspend like this is how they make profit and they're not going to turn that off unless they absolutely have to. Note that they forgave 40% of the cost, meaning they still had 60% of the unintended revenue paid to them for basically doing no work. Yes some persistent disk space was consumed but I'm sure with their advanced systems they were quick to archive this off somewhere to cheap storage so most of that 60% was pure profit.

  5. BartyFartsLast Silver badge

    Apart from all the "check your own work" stuff, surely such a large increase in usage in such a short time could indicate a compromise and trigger some sort of email to query it?

    Or, silly question I guess, is AWS that profit focused they'd rather ig ore it and hope the invoice gets paid.

    1. Jellied Eel Silver badge

      Or, silly question I guess, is AWS that profit focused they'd rather ig ore it and hope the invoice gets paid.

      I assume that's a rhetorical question? $40k is less than 1hrs running costs for Bezos's yacht fleet. It's a simple automation process for AWS to send any invoices past due to collections, or just sell the debt. It's something that suprised me when talking to people who have their heads in the clouds, willingly or unwillingly that there didn't seem to be any easy way to set a max budget. But then that feature wouldn't align with Amazon's business model.

      1. Anonymous Coward
        Anonymous Coward

        I can see the headline now, "AWS stopped my business scaling on our busiest day of the year because we set our max budget incorrectly".

        There is nothing about this story that couldn't have been addressed by a VERY, VERY simple cost alert, which takes about 30 seconds to setup, regardless of which provider you use. Set up at $3k and this problem goes away.

  6. Yorick Hunt Silver badge
    Facepalm

    $1-2K per month?

    That sort of money would buy you 1-2 beefy computers (or 4-5 mediocre ones) which you get to keep forever, with money still left over for your internet connection.

    Tell me again why "cloud" is always the go-to?

    1. richardcox13

      Re: $1-2K per month?

      How much does it cost to:

      - Somewhere to put them (rent etc.)

      - Host (electricity, cooling, ...)

      - Backup

      - Admin (lots of updates in even quite a simple tech stack these days).

      For those "one or two computers".

      (This is open source using AWS, so reasonable to assume they don't have an office.)

      1. Rjan

        Re: $1-2K per month?

        That work would be covered as a base workload for an IT manager - which you must still have anyway to manage the cloud providers, as well as the remainder of your onsite equipment.

        As with AI, clouds essentially consist of being sold a crock about how you can sack all the technical staff, and then rehiring them once cloud bills have soared and you've relearned why you needed your own staff.

        1. SecretSonOfHG

          Re: $1-2K per month?

          Ah what a nice solution. Hire a competent IT admin whose salary cost is 10 times your cloud cost so that you can set up and pay even more for having equipment you will not be even using all the time.

          Cloud haters clearly live in their own world.

      2. Anonymous Coward
        Anonymous Coward

        Re: $1-2K per month?

        I don't see temporarily spun up installs (remember, the cost was for temporary build environments, to be deleted soon after use) requiring a lot of backing up and updates. You'd have images you copy to disk as needed.

        1. richardcox13

          Re: $1-2K per month?

          I don't see temporarily spun up installs [...] requiring a lot of backing up and updates.

          Updates to tools happen a lot. Typically weekly. These need to be incorporated into the images you use to build temporary VMs.

          Backups will depend on how much you end up on the pet vs. farm animal scale.

          Your reference VM (apply updates, build new image, repeat for ever) you would likely want to keep around, and thus backup.

    2. SVD_NL Silver badge

      Re: $1-2K per month?

      Without knowing more about the situation, this sounds like a service that could potentially get a lot of spikes in activity. In those situations it might not be worth the investment to get servers to catch those spikes, while doing basically nothing most of the time.

      Another consideration could be location, it may be beneficial that you're able to spin up an instance basically anywhere you want.

      And don't forget power costs. I recently got two servers running proxmox (both dual-socket Xeon E5-2680 v3, not too old, somewhat beefy i guess), their average CPU usage is under 1% (one is running light workloads, the other is pretty spikey spinning up and shutting down windows VMs), and it's still adding almost €100 per month to the electricity bill!

      1. Anonymous Coward
        Anonymous Coward

        Re: $1-2K per month?

        100€? Just how expensive is your electricity?! That sounds to me like several kilowatts on average, which doesn't sound like "one is 1% and other is spiky".

        1. SVD_NL Silver badge

          Re: $1-2K per month?

          €0,25 per kWh, which amounts to approximately 600W constant usage to get to €100 a month. it's not quite there, my guess is that it's using 400-500W. with CPUs like that idle usage of 50-100W or so isn't unheard of, and it has quite a lot of spinning disks too. Add a bunch of RAM sticks and fans, and it's a pretty reasonable power consumption for a server like this.

      2. An_Old_Dog Silver badge

        Re: $1-2K per month?

        it might not be worth the investment to get servers to catch those spikes, while doing basically nothing most of the time.

        Maybe you could sell the otherwise-unused CPU cycles of your servers to other people. You could be like an old-time, small dial-up ISP, selling limited "cloud" services.

        As to your monthly power bill: did you think AWS, Azure, or GCS would not include (somehow) those 'leccy costs in your bill?

        1. doublelayer Silver badge

          Re: $1-2K per month?

          "Maybe you could sell the otherwise-unused CPU cycles of your servers to other people."

          Sounds great. How much are you willing to pay me for the residual cycles on my computer? It's run by some random nobody you don't know, hopefully I have any security in place for both our sakes, you get preempted whenever I get spikes, and it can't scale above two servers at absolute maximum. I'm expecting a competitive price.

          "As to your monthly power bill: did you think AWS, Azure, or GCS would not include (somehow) those 'leccy costs in your bill?"

          Of course they do, while I'm using the resources I provisioned. When I am not using them, then that part is paid by the people who are using them instead. The question is whether their markup on the times when I am using it is greater or less than my wasted spend on self-hosted hardware when I'm not.

          1. Teal Bee

            Re: $1-2K per month?

            It's run by some random nobody you don't know, hopefully I have any security in place for both our sakes, you get preempted whenever I get spikes, and it can't scale

            You just described AWS.

            1. doublelayer Silver badge

              Re: $1-2K per month?

              No, I didn't, and you know that. Unless I used spot instances, I don't get preempted for other users because I'm paying. It does scale very quickly unless you need tons of expensive capacity right now. And although we could argue on the quality of their security, they do employ a bunch of security people and have more secure defaults. None of those would apply to any attempt to rent out unplanned unused capacity on a two-server setup.

              1. FIA Silver badge

                Re: $1-2K per month?

                To be fair, if they were willing to leave both computers on you could probably scale fairly quickly.

                And in this case 'expensive capacity' is a third computer. There's always going to be a lead time on that.

                Plus, if you pay a bit extra I bet you could get exclusive access to the keyboard and mouse... no pre-emption!

    3. Bebu sa Ware Silver badge
      Angel

      "Have you made a mistake that made your cloud costs soar?"

      No. I suffer from incurable congenital nephophobia.

      For research purposes and for development purposes the loads are typically well defined (usually 100% of whatever is on offer if students are involved) owning or leasing the hardware will invariably be cheaper and provides a hard, physical "stop loss."

      A few years ago I was an onlooker in the purchase of a very large memory (>1Tb), multisocket server costing ~$400,000 required for scientific computing which was questioned by the chief PHB as to why the workload wasn't being moved into the cloud.

      So the prospective purchasers toddled off to the cloud providers for quotes—the lowest was ~$1.00 million per year.

      These boxes could be realistically run for up to eight years in that environment with loss of vendor maintenance and support being the main reason for decommissioning.

      Production environments with highly volatile loads, requiring ~100% availability and geographic diversity are a different kettle of fish. Cloud hosting probably is a decent fit for at least part of the solution.

      1. Korev Silver badge
        Boffin

        Re: "Have you made a mistake that made your cloud costs soar?"

        We had a similar thing where we needed a tonne of local SSD for a specialised database[0]. The costs were frightening, but we still had to fight off a few attempts to move it to the Cloud.

        [0] The application was latency-bound

      2. Roland6 Silver badge

        Re: "Have you made a mistake that made your cloud costs soar?"

        >” Production environments with highly volatile loads, requiring ~100% availability and geographic diversity are a different kettle of fish.”

        And on AWS can be expensive if you get the timings wrong. A client saw a big reduction in their AWS bill by simply maintaining an idle running instance for longer - for their usage style 15 minutes was found to be the sweet spot, prior to this practically every visitor caused an instance to be loaded, so got hit with the instance start up charges (and delay) …

        I think as observed in another recent ElReg article, AWS is very powerful, just that you need to be an expert to even do the basic stuff reasonably well. In some respect AWS has parallels to C, it’s very powerful, but there are no safeguards - caveat emptor.

    4. FIA Silver badge

      Re: $1-2K per month?

      That sort of money would buy you 1-2 beefy computers (or 4-5 mediocre ones) which you get to keep forever, with money still left over for your internet connection.

      I would love to know the workload that was costing that a month too.

      I rent a server in a colo, it costs me 35 quid a month. It's not too flash, but as a build server it would work well. I could build nice build farm for a grand a month.

    5. Donn Bly

      Re: $1-2K per month?

      Tell me again why "cloud" is always the go-to?

      Because cloud services are expenses that are paid with pre-tax money and written off immediately, which lowers your annual taxes. On-premise equipment is a capital expense for an asset that you are probably borrowing money to buy, and is still on the books and incurring property taxes well after it has outlived its useful life -- sometimes even before the loan is paid. Choosing Cloud vs On-Premise is often more of a balancing act of the company's books than it is a technical decision.

  7. Anonymous Coward
    Anonymous Coward

    Tell me again why "cloud" is always the go-to?

    Again?

    Really?

    Right:

    Because some idiot beancounters spun it to the higher-ups / board and the imbeciles bought it before looking it up in the dictionary.

    Cloud - / klaʊd / - noun

    A visible collection of particles of water or ice suspended in the air, usually at an elevation above the earth's surface.

    Synonyms: vapour

    .

    1. Anonymous Anti-ANC South African Coward Silver badge

      vapourware trying its darnedest to milk customers...

    2. FIA Silver badge

      Tell me again why "cloud" is always the go-to?

      Because it offers computing as a utility.

      If that's not your use case, or not a requirement you have, then it's probably the wrong solution.

      Oh, sorry... you said 'Always'... that's because people be stupid YO! :)

  8. goblinski Bronze badge
    Headmaster

    As someone who has forgotten to check or uncheck options countless times, I am thrilled to see a full bag of comments blaming everything from hemorrhoids to hurricanes, but not blaming forgetting to uncheck an option, which should be part of...errr...a CHECKlist :)

    The world is still a nice place to be.

    1. Jou (Mxyzptlk) Silver badge

      Well, if you go the "blame the admin path", you should have said "why not automate this so those checkbox patterns are always right?". Checklist is something for pilots on a plane.

      1. goblinski Bronze badge

        ...Checklist is something for pilots on a plane...

        I wouldn't like you to be my sysadmin, surgeon, car mechanic, trip orgnanizer...fortunately there's always pub-mate, so we're good.

        1. Jou (Mxyzptlk) Silver badge

          Ah, I forgot the irony tags again? I still haven't learned that all the worst possible jokes got real the last ten years...

          1. Someone Else Silver badge
            Trollface

            You sure its irony, and not sarcasm?

            1. Jou (Mxyzptlk) Silver badge

              Irony or sarcasm? Yes.

  9. 8BitGuru

    "... but the entity that sponsored the project's cloud bills was deeply unhappy and moved it to another cloud."

    Wait, what? That was the wrong response for so many reasons:

    1. This wasn't Amazons' fault. And whilst you might quibble about the amount they wrote off, they did nevertheless recognise the obvious customer error and made a gesture of goodwill.

    2. Per item 1, this was entirely Chases' (or his organisations') fault. They told the computer what to do, and it did it. Switching provider won't change that, nor guarantee to avoid repetition.

    3. The new provider is almost certainly an unknown quantity, with unfamiliar technology and conventions, and no established relationship with Chase.

    4. Implementing usage budgets and warnings is the customers' responsibility, and the choice of provider is irrelevant.

    I would have requested mandatory budgets and warnings with the current provider, rather than spitting my dummy and jumping into an entirely new frog-boiling pot.

    1. l8gravely

      Since Chase had a consistent monthly spend of $1-2k, he should have just bought a couple of beefy servers and put them into a colo. Then suddenly you have saved a bunch of money over time. Now if this $40k is a large part of the sponsors monthly cost, then sure, maybe they want to make a statement to AWS about their shitty controls. And moving that spend to somehwere else might be a good idea.

      But honestly, runnign your own infrastructure, when even a basic 2 cpu and 8gb of RAM VM can run you $120/month, it quickly makes sense for consistent loads to move onto your own hardware in alot of cases. But hey, the PHBs all belive that purple money (opex) is free, and that capex (red money) costs way too much, so they go with the more expensive option. Perverse incentives.

    2. FIA Silver badge

      Wait, what? That was the wrong response for so many reasons:

      You forgot option number 5.

      5. It's the 2020s and people don't take personal responsibility anymore, it's all someone else fault.

      1. Jou (Mxyzptlk) Silver badge

        6. People take personal offense when asking about the responsibility.

    3. goblinski Bronze badge

      "... but the entity that sponsored the project's cloud bills was deeply unhappy and moved it to another cloud..."

      It is not stated whether said other clowd had anything to do with Chase and his organization anymore, is it.

  10. Jou (Mxyzptlk) Silver badge

    I ask the other way around...

    Why is the default made in such a way to prefer such accidents in first place. It should have been the "safe cheap" option by default, not the "missed one check mark you always have to set to avoid high cost" way. Commonly known as "dark pattern". All larger cloud providers I know work that way, always leaving it up to the admins to not forget to distrust and check again. And again. And on the third check somewhere ten layers in the config menu, at a place no one expects, there is the box you have to check.

    1. FIA Silver badge

      Re: I ask the other way around...

      You want 'automatically delete the data' as the default? ;)

      1. Jou (Mxyzptlk) Silver badge

        Re: I ask the other way around...

        Trust me, I know what I'm doing.

        1. Claptrap314 Silver badge

          Re: I ask the other way around...

          https://www.youtube.com/watch?v=rp8hvyjZWHs

          1. collinsl Silver badge
            Joke

            Re: I ask the other way around...

            I think I'll put this thing right here...

            https://www.youtube.com/watch?v=rp8hvyjZWHs

            FTFY

    2. doublelayer Silver badge

      Re: I ask the other way around...

      Because the alternative is that when the system shuts down, it gets automatically deleted. There was a person who deleted some old VMs because they were probably unused. Take a look at what the comments said about that. Do you think the dev or Amazon would have been let off the hook if the problem was that important systems got wiped because the opposite was the default?

      The problem is that no matter what that setting defaults to, something can go wrong. In neither case is it the fault of the cloud provider. The same thing could happen with owned servers, although it would be less a cost overrun and more a problem when all the disks filled up with unneeded images and new ones couldn't start. Depending on how expensive downtime is, that could even be worse. Oh, but there'd be warnings if the disks filled up, just as there could have been but evidently weren't alarms on cost usage. Computers have lots of options and unfortunately, some of them can have important effects and need to be treated with care.

    3. phuzz Silver badge

      Re: I ask the other way around...

      Why is the default made in such a way to prefer such accidents in first place.

      Because this way Amazon get more money.

  11. GenuineArticle

    When you don't return from lunch to log out

    Years ago the guy in the cube next to mine spun up an AWS instance for an experiment and went to lunch on his motorcycle. He proceeded to experimentally prove that he could not punch a hole in a lorry with his (thankfully) helmeted head. ("Man, I gave it my best shot though" -him). 2 months later he hobbled back to the office to find that instance still happily chugging along and a $50,000 bill.

    1. FirstTangoInParis Silver badge

      Re: When you don't return from lunch to log out

      Ouch (in all senses of the word)! Surely you can set a “stop after N hours” checkbox? But no, guessing there isn’t because that would stop Jeff getting another yacht.

      1. doublelayer Silver badge

        Re: When you don't return from lunch to log out

        You can easily do that with a script, but if you want to do it with a GUI, you can. It's more complicated than it needs to be because AWS, but you can do it. If you expect to need to, which this person probably didn't.

  12. Anonymous Coward
    Anonymous Coward

    Cancelled

    This is exactly the reason I cancelled both my Azure and AWS subscriptions. I once read about this developer who ran a workload on AWS during the weekend and because of a programming error racked up a $60,000 bill. Luckily for him Amazon forgave him the bill.

    As a small ISV I can't afford to suddenly receive a $100,000 bill because that would sent me into dire straits. I'm only using my own VPS servers and running whatever I need on them and still only paying $50 a month or so.

    No cloud for me.

  13. Tron Silver badge

    You pay for convenience.

    If you have unlimited cash, pay for that convenience.

    If you do not, do stuff yourself, on your own silicon with your own storage. Because the tech industry only cares about one thing - acquiring as much of your cash as it can. Outsourcing to GAFA is a really bad idea.

  14. gosand

    One sure-fire way to make your cloud costs soar...

    The one sure-fire way to make your cloud costs soar is to use it.

    About 10 years ago a small company I was at got a new CEO (1st red flag) who was convinced we needed to get to the cloud. We were a successful pure MS shop that produced an on-prem solution for hospitals to do cost accounting. The new CEO decided our new product would be completely in AWS, and hired a team to do it. The existing team would still support the old product. (2nd red flag)

    The new management team was in San Francisco (expensive) and the development team was in Ukraine (cheap). Tech stack and architecture was AWS/Java/Scala/Linux/etc. The proof-of-concept of the new platform went well. Then the offshore developers and management team was let go, and the existing team had to take it over. (3rd red flag) Oh, and we had to migrate the existing product to AWS. (4th red flag)

    Months were spent in training, learning the new tech stack, etc. The existing platform (Windows/SQL-Server) was lift-and-shifted to AWS since that was the only viable option to meet the imposed deadline.

    When the CEO started seeing the AWS bills he was livid. "I thought you could spin up servers when needed, and spin them down when they weren't being used?!" "Well, yes - if the platform is architected to do that."

    Of course, it didn't matter much. We had made it to the cloud. And for what? So the CEO could sell off the company to our biggest competitor.

    (I had figured that part out before it happened, and had already moved on)

  15. Apocalypso - a cheery end to the world Bronze badge
    Joke

    Latency

    > "They wanted to know what changed because our cloud charges for the last two months were $40,000."

    AWS storage latency: 2 milliseconds for the data; 2 milli-centuries for the bill

  16. jonsg
    Facepalm

    Change to another cloud? WHY?

    The problem here was not setting budget alarms.

    Billing and Cost Management -> Budgets and Planning -> Budgets. Create a Budget, and then add Alerts. You can attach Actions to an Alert, for instance to temporarily deny the automation that's launching new instances the IAM permission to do so.

    Changing to a different cloud provider only gives a little satisfaction out of spite, but it means a whole lot of work refactoring all the workflows and creating new images on the new provider. Total waste of time. I hope the benefactor paid for all that lift-and-shift work.

  17. jonfr400

    The cloud is a money scam

    The cloud is nothing but money scam. It can be seen how and for what they charge.

  18. OllieJones

    Remember when the hard drive on the build machine would fill up?

    Useta be a mistake like that would fill up the hard drive on the build machine, and somebody would troubleshoot it and fix it. Rather than spend enough money to buy a few thousand more hard drives.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon