> Amazon did the right thing and forgave 40 percent
The Right Thing would be a bit of AI (or even BASIC) which would look at past use and predict new charges. "This change will increase your bill by about $38,590 a month, is that what you want?"
Welcome to another week of work, a moment The Register celebrates with a new installment of Who, Me? It's the reader-contributed column in which you 'fess up to follies, false moves, and faux pas – and explain how you escaped. This week, meet a reader we'll Regomize as "Chase" who develops free open source software for …
If the machine image is using a template then a confirmation email listing the changes along with highlighting anything which could increase the costs when saving it would be nice to have. Oh, if only there were some sort of intelligent software you could use to 'innovate' mundane stuff like this...
The is this: https://www.theregister.com/2020/09/28/aws_cost_anomaly_detection/ from 2020.
From memory, when investigation a clients AWS spend, the tool wasn’t particularly easy to set up to give meaningful/useful data; although probably not helped by not having much history to enable the creation of a baseline.
This post has been deleted by its author
This post has been deleted by its author
On AWS these aren’t particularly helpful, other than perhaps remind you to look at the console ASAP as costs will continue to ie to be incurred over and above your budget cap.
To aware that anything has changed on AWS to actually give customers hard must not be exceeded budget caps, since the last time an unexpected budget overspend on a free tier account happened a few years back.
True, and if AWS could add more options including a method of automatically stopping some things when a budget was hit, that would be better. You can script that, but you would have to plan what you wanted to do in an unexpected funding situation ahead of time. However, if all they have is alarms, then you should at least set the alarms.
Sadly AWS are never going to introduce a hard cap when it's not in their interests to do so - unintended overspend like this is how they make profit and they're not going to turn that off unless they absolutely have to. Note that they forgave 40% of the cost, meaning they still had 60% of the unintended revenue paid to them for basically doing no work. Yes some persistent disk space was consumed but I'm sure with their advanced systems they were quick to archive this off somewhere to cheap storage so most of that 60% was pure profit.
Apart from all the "check your own work" stuff, surely such a large increase in usage in such a short time could indicate a compromise and trigger some sort of email to query it?
Or, silly question I guess, is AWS that profit focused they'd rather ig ore it and hope the invoice gets paid.
Or, silly question I guess, is AWS that profit focused they'd rather ig ore it and hope the invoice gets paid.
I assume that's a rhetorical question? $40k is less than 1hrs running costs for Bezos's yacht fleet. It's a simple automation process for AWS to send any invoices past due to collections, or just sell the debt. It's something that suprised me when talking to people who have their heads in the clouds, willingly or unwillingly that there didn't seem to be any easy way to set a max budget. But then that feature wouldn't align with Amazon's business model.
I can see the headline now, "AWS stopped my business scaling on our busiest day of the year because we set our max budget incorrectly".
There is nothing about this story that couldn't have been addressed by a VERY, VERY simple cost alert, which takes about 30 seconds to setup, regardless of which provider you use. Set up at $3k and this problem goes away.
How much does it cost to:
- Somewhere to put them (rent etc.)
- Host (electricity, cooling, ...)
- Backup
- Admin (lots of updates in even quite a simple tech stack these days).
For those "one or two computers".
(This is open source using AWS, so reasonable to assume they don't have an office.)
That work would be covered as a base workload for an IT manager - which you must still have anyway to manage the cloud providers, as well as the remainder of your onsite equipment.
As with AI, clouds essentially consist of being sold a crock about how you can sack all the technical staff, and then rehiring them once cloud bills have soared and you've relearned why you needed your own staff.
I don't see temporarily spun up installs [...] requiring a lot of backing up and updates.
Updates to tools happen a lot. Typically weekly. These need to be incorporated into the images you use to build temporary VMs.
Backups will depend on how much you end up on the pet vs. farm animal scale.
Your reference VM (apply updates, build new image, repeat for ever) you would likely want to keep around, and thus backup.
Without knowing more about the situation, this sounds like a service that could potentially get a lot of spikes in activity. In those situations it might not be worth the investment to get servers to catch those spikes, while doing basically nothing most of the time.
Another consideration could be location, it may be beneficial that you're able to spin up an instance basically anywhere you want.
And don't forget power costs. I recently got two servers running proxmox (both dual-socket Xeon E5-2680 v3, not too old, somewhat beefy i guess), their average CPU usage is under 1% (one is running light workloads, the other is pretty spikey spinning up and shutting down windows VMs), and it's still adding almost €100 per month to the electricity bill!
€0,25 per kWh, which amounts to approximately 600W constant usage to get to €100 a month. it's not quite there, my guess is that it's using 400-500W. with CPUs like that idle usage of 50-100W or so isn't unheard of, and it has quite a lot of spinning disks too. Add a bunch of RAM sticks and fans, and it's a pretty reasonable power consumption for a server like this.
it might not be worth the investment to get servers to catch those spikes, while doing basically nothing most of the time.
Maybe you could sell the otherwise-unused CPU cycles of your servers to other people. You could be like an old-time, small dial-up ISP, selling limited "cloud" services.
As to your monthly power bill: did you think AWS, Azure, or GCS would not include (somehow) those 'leccy costs in your bill?
"Maybe you could sell the otherwise-unused CPU cycles of your servers to other people."
Sounds great. How much are you willing to pay me for the residual cycles on my computer? It's run by some random nobody you don't know, hopefully I have any security in place for both our sakes, you get preempted whenever I get spikes, and it can't scale above two servers at absolute maximum. I'm expecting a competitive price.
"As to your monthly power bill: did you think AWS, Azure, or GCS would not include (somehow) those 'leccy costs in your bill?"
Of course they do, while I'm using the resources I provisioned. When I am not using them, then that part is paid by the people who are using them instead. The question is whether their markup on the times when I am using it is greater or less than my wasted spend on self-hosted hardware when I'm not.
No, I didn't, and you know that. Unless I used spot instances, I don't get preempted for other users because I'm paying. It does scale very quickly unless you need tons of expensive capacity right now. And although we could argue on the quality of their security, they do employ a bunch of security people and have more secure defaults. None of those would apply to any attempt to rent out unplanned unused capacity on a two-server setup.
To be fair, if they were willing to leave both computers on you could probably scale fairly quickly.
And in this case 'expensive capacity' is a third computer. There's always going to be a lead time on that.
Plus, if you pay a bit extra I bet you could get exclusive access to the keyboard and mouse... no pre-emption!
No. I suffer from incurable congenital nephophobia.
For research purposes and for development purposes the loads are typically well defined (usually 100% of whatever is on offer if students are involved) owning or leasing the hardware will invariably be cheaper and provides a hard, physical "stop loss."
A few years ago I was an onlooker in the purchase of a very large memory (>1Tb), multisocket server costing ~$400,000 required for scientific computing which was questioned by the chief PHB as to why the workload wasn't being moved into the cloud.
So the prospective purchasers toddled off to the cloud providers for quotes—the lowest was ~$1.00 million per year.
These boxes could be realistically run for up to eight years in that environment with loss of vendor maintenance and support being the main reason for decommissioning.
Production environments with highly volatile loads, requiring ~100% availability and geographic diversity are a different kettle of fish. Cloud hosting probably is a decent fit for at least part of the solution.
>” Production environments with highly volatile loads, requiring ~100% availability and geographic diversity are a different kettle of fish.”
And on AWS can be expensive if you get the timings wrong. A client saw a big reduction in their AWS bill by simply maintaining an idle running instance for longer - for their usage style 15 minutes was found to be the sweet spot, prior to this practically every visitor caused an instance to be loaded, so got hit with the instance start up charges (and delay) …
I think as observed in another recent ElReg article, AWS is very powerful, just that you need to be an expert to even do the basic stuff reasonably well. In some respect AWS has parallels to C, it’s very powerful, but there are no safeguards - caveat emptor.
That sort of money would buy you 1-2 beefy computers (or 4-5 mediocre ones) which you get to keep forever, with money still left over for your internet connection.
I would love to know the workload that was costing that a month too.
I rent a server in a colo, it costs me 35 quid a month. It's not too flash, but as a build server it would work well. I could build nice build farm for a grand a month.
Tell me again why "cloud" is always the go-to?
Because cloud services are expenses that are paid with pre-tax money and written off immediately, which lowers your annual taxes. On-premise equipment is a capital expense for an asset that you are probably borrowing money to buy, and is still on the books and incurring property taxes well after it has outlived its useful life -- sometimes even before the loan is paid. Choosing Cloud vs On-Premise is often more of a balancing act of the company's books than it is a technical decision.
Tell me again why "cloud" is always the go-to?
Again?
Really?
Right:
Because some idiot beancounters spun it to the higher-ups / board and the imbeciles bought it before looking it up in the dictionary.
Cloud - / klaʊd / - noun
A visible collection of particles of water or ice suspended in the air, usually at an elevation above the earth's surface.
Synonyms: vapour
.
As someone who has forgotten to check or uncheck options countless times, I am thrilled to see a full bag of comments blaming everything from hemorrhoids to hurricanes, but not blaming forgetting to uncheck an option, which should be part of...errr...a CHECKlist :)
The world is still a nice place to be.
"... but the entity that sponsored the project's cloud bills was deeply unhappy and moved it to another cloud."
Wait, what? That was the wrong response for so many reasons:
1. This wasn't Amazons' fault. And whilst you might quibble about the amount they wrote off, they did nevertheless recognise the obvious customer error and made a gesture of goodwill.
2. Per item 1, this was entirely Chases' (or his organisations') fault. They told the computer what to do, and it did it. Switching provider won't change that, nor guarantee to avoid repetition.
3. The new provider is almost certainly an unknown quantity, with unfamiliar technology and conventions, and no established relationship with Chase.
4. Implementing usage budgets and warnings is the customers' responsibility, and the choice of provider is irrelevant.
I would have requested mandatory budgets and warnings with the current provider, rather than spitting my dummy and jumping into an entirely new frog-boiling pot.
Since Chase had a consistent monthly spend of $1-2k, he should have just bought a couple of beefy servers and put them into a colo. Then suddenly you have saved a bunch of money over time. Now if this $40k is a large part of the sponsors monthly cost, then sure, maybe they want to make a statement to AWS about their shitty controls. And moving that spend to somehwere else might be a good idea.
But honestly, runnign your own infrastructure, when even a basic 2 cpu and 8gb of RAM VM can run you $120/month, it quickly makes sense for consistent loads to move onto your own hardware in alot of cases. But hey, the PHBs all belive that purple money (opex) is free, and that capex (red money) costs way too much, so they go with the more expensive option. Perverse incentives.
Why is the default made in such a way to prefer such accidents in first place. It should have been the "safe cheap" option by default, not the "missed one check mark you always have to set to avoid high cost" way. Commonly known as "dark pattern". All larger cloud providers I know work that way, always leaving it up to the admins to not forget to distrust and check again. And again. And on the third check somewhere ten layers in the config menu, at a place no one expects, there is the box you have to check.
Because the alternative is that when the system shuts down, it gets automatically deleted. There was a person who deleted some old VMs because they were probably unused. Take a look at what the comments said about that. Do you think the dev or Amazon would have been let off the hook if the problem was that important systems got wiped because the opposite was the default?
The problem is that no matter what that setting defaults to, something can go wrong. In neither case is it the fault of the cloud provider. The same thing could happen with owned servers, although it would be less a cost overrun and more a problem when all the disks filled up with unneeded images and new ones couldn't start. Depending on how expensive downtime is, that could even be worse. Oh, but there'd be warnings if the disks filled up, just as there could have been but evidently weren't alarms on cost usage. Computers have lots of options and unfortunately, some of them can have important effects and need to be treated with care.
Years ago the guy in the cube next to mine spun up an AWS instance for an experiment and went to lunch on his motorcycle. He proceeded to experimentally prove that he could not punch a hole in a lorry with his (thankfully) helmeted head. ("Man, I gave it my best shot though" -him). 2 months later he hobbled back to the office to find that instance still happily chugging along and a $50,000 bill.
You can easily do that with a script, but if you want to do it with a GUI, you can. It's more complicated than it needs to be because AWS, but you can do it. If you expect to need to, which this person probably didn't.
This is exactly the reason I cancelled both my Azure and AWS subscriptions. I once read about this developer who ran a workload on AWS during the weekend and because of a programming error racked up a $60,000 bill. Luckily for him Amazon forgave him the bill.
As a small ISV I can't afford to suddenly receive a $100,000 bill because that would sent me into dire straits. I'm only using my own VPS servers and running whatever I need on them and still only paying $50 a month or so.
No cloud for me.
If you have unlimited cash, pay for that convenience.
If you do not, do stuff yourself, on your own silicon with your own storage. Because the tech industry only cares about one thing - acquiring as much of your cash as it can. Outsourcing to GAFA is a really bad idea.
The one sure-fire way to make your cloud costs soar is to use it.
About 10 years ago a small company I was at got a new CEO (1st red flag) who was convinced we needed to get to the cloud. We were a successful pure MS shop that produced an on-prem solution for hospitals to do cost accounting. The new CEO decided our new product would be completely in AWS, and hired a team to do it. The existing team would still support the old product. (2nd red flag)
The new management team was in San Francisco (expensive) and the development team was in Ukraine (cheap). Tech stack and architecture was AWS/Java/Scala/Linux/etc. The proof-of-concept of the new platform went well. Then the offshore developers and management team was let go, and the existing team had to take it over. (3rd red flag) Oh, and we had to migrate the existing product to AWS. (4th red flag)
Months were spent in training, learning the new tech stack, etc. The existing platform (Windows/SQL-Server) was lift-and-shifted to AWS since that was the only viable option to meet the imposed deadline.
When the CEO started seeing the AWS bills he was livid. "I thought you could spin up servers when needed, and spin them down when they weren't being used?!" "Well, yes - if the platform is architected to do that."
Of course, it didn't matter much. We had made it to the cloud. And for what? So the CEO could sell off the company to our biggest competitor.
(I had figured that part out before it happened, and had already moved on)
The problem here was not setting budget alarms.
Billing and Cost Management -> Budgets and Planning -> Budgets. Create a Budget, and then add Alerts. You can attach Actions to an Alert, for instance to temporarily deny the automation that's launching new instances the IAM permission to do so.
Changing to a different cloud provider only gives a little satisfaction out of spite, but it means a whole lot of work refactoring all the workflows and creating new images on the new provider. Total waste of time. I hope the benefactor paid for all that lift-and-shift work.