
Repeat after me: Use the right tool, not the fashionable tool, for the job.
Web software biz 37signals has started to migrate its data out of the cloud and onto on-prem storage – and expects to save a further $1.3 million (£980,000) a year after completing its high-profile cloud repatriation project and getting off AWS once and for all. We'll delete our entire AWS account and finally say goodbye to …
...and the right tool for the job when they were scaling up probably was S3. It's pay-as-you-go, and you don't need to go out and spend $1.5M up-front on 18PB of storage that you may or may not end up needing.
Once you're mature enough to evaluate the true needs to run your business at scale, then the right tool may well be an on-prem solution. That's fine. It might be a hybrid solution, or still a cloud-based solution. That's fine too.
What pisses me off about DHH and all these 37signals articles is that they always gloss over the first part and focus solely on the second part. Would his business have been as successful and profitable in the first place without the flexible economics of the cloud allowing rapid growth without significant CapEx? Maybe? Maybe not...
I am a cloud architect, and I can explain better for people like you why management is interested in the cloud.
Let's say you run a server (be it on a VM, or just bare metal) in your data center. Let's say you can run it for $100 a month, taking into account the OS licensing, whatever other software you've got licensed on there, and electricity. That VM is probably going to be $200-300/mo. on the cloud or more (depending).
That seems crazy. Why would business leaders make such a devil's deal? It's the externalities, you see.
You have to pay to operate the data center itself. Either paying lease, or property taxes. You have to pay for cooling (i.e. water). You have to pay for data center staff to maintain the machines as they break. You have to pay for internet, via some dedicated fiber link, and of course different kinds of engineers to manage the network. Now the really expensive part... You have to amortize your hardware purchases as part of your tax reporting - a $10k server depreciates in value every year you operate it and you get to write that off in taxes as capital expenditures (cool right?), except it's reallllly expensive to pay a finance department to do that kind of work.
When you move to the cloud, you don't need as big of an accounting department, your vendor relationships are fewer and simpler (after all - you can pay your cloud vendor for certain things, and they will pay the vendors out of what you pay them). Your budget becomes operational expenditures, which are far simpler. You don't have to hire as many staff to maintain a cloud environment - for some small solutions, your dev team can also be your ops team, your QA team, your network team, your security team, etc. because the tooling is very rich and enables doing more with less. You don't pay for a fiber link, but you do pay for bandwidth.
Now - what I wrote above is still simplistic - there's more layers to it. But rest assured, people can and do run the numbers and realize that including all externalities, it's a net win on lower operational costs. This is highly dependent on your business model and operational practices though!
tl;dr - the cloud is massively more complicated than "someone else's computer". Perhaps you're thinking of "server co-location". You've been corrected. You're welcome. Further reading:
https://cloud.google.com/learn/paas-vs-iaas-vs-saas
Leaving aside that you're vastly overestimating the workload caused by dealing with the write-off for on-prem equipment for an accounting department in any larger business, you didn't include the costs of the business risk that comes from putting all your eggs in an SLA with a cloud provider which mostly is written in a way that absolves them of taking any real responsibility for downtime and the resulting financial losses.
Also, what about the business risk posed by the usually exponential annual price increases most cloud providers have been pushing through for years because, only possible because of the difficulty and expenses that come from migrating somewhere else?
Unless you're spending an absolute fortune, I think you'll find that most on premise solutions have some weak link where at least the vast majority of your eggs are in one basket in terms of some kind of reliance on a supplier with a shitty SLA.
I've certainly only come across one data centre implementation in 20+ years where I could hand-on-heart say that it has been engineered to hit it's required internal SLA in spite of the real-world performance of any one supplier. And that was incredibly expensive and complex, and it was still acknowledged that there was a chance there could have been some internal design edge case that would trip things up in certain unexpected failure modes.
Many suppliers of hardware, software, and connectivity promise SLAs they can only actually deliver on if the wind's blowing in the right direction, and most of their penalties still wouldn't cover potential consequential losses. They take calculated business risks when it comes to offsetting the potential financial and reputational penalties versus the cost of delivery.
The only third party contract I've seen that truly would have covered all potential consequential losses for that particular customer had basically had the monthly cost inflated ten-fold to offset the risk it posed to the supplier, and the supplier was effectively paying an insurance company a fair chunk of that inflated cost to share the risk.
It's a fallacy that most on-prem SLAs are better than cloud SLAs both in terms of how likely they are to be met, and in terms of how much you'll actually be compensated if an SLA is breached.
It's also a fallacy that cloud providers are increasing costs "exponentially" - there may be some edge cases but the majority of costs are relatively stable. On the flipside Broadcom are gouging plenty of companies who have significant on-prem deployments, and electricity costs have gone through significant spikes in recent years. You're not at all insulated from cost pressure on-prem, it's just different cost pressures, from many different providers.
> most of their penalties still wouldn't cover potential consequential losses
which cloud suppliers have penalties that will cover consequential losses?
> vast majority of your eggs are in one basket in terms of some kind of reliance on a supplier with a shitty SLA.
Cloud suppliers also have quite a high level of failures. How many times as AWS US East gone down in the last few years? There are LOTS of smaller incidents like this: https://arstechnica.com/gadgets/2024/05/google-cloud-accidentally-nukes-customer-account-causes-two-weeks-of-downtime/
What about failures of individual servers? Ah, yes, you have to design an implement an architecture across multiple zones and regions to get something that is really reliable. Still mostly up to you.
"Cloud suppliers also have quite a high level of failures. How many times as AWS US East gone down in the last few years?"
The people's definition of "Technology" is: Stuff that doesn't quite work yet.
Expect things will go wrong. The real issue is that with an outside supplier, you get no insight into what/why went wrong. You get no clear view into how long it will take to get fixed. If it's costing you loads of money to be down, you can't even through money at a solution to have the services back up as quickly as possible. You'd be lucky (a very large customer) to be able to reach somebody to get any information (when they go down, their VOIP is down, their system status page is offline, etc). TL:DR, you have no control.
Yes, but you don't have to go "all in" on one region with one provider. That is on prem thinking applied to a cloud situation.
Cloud services are not an "online comms room"...the idea is that you distribute things for resilience and redundancy. If you throw all your infra on one provider in one region it's absolutely going to be at best, the same as your on prem setup...but that's not why you shift things to the cloud...you shift to the cloud for redundancy and resilience...so you don't have to rely on a monolith in one location...that additional layer of protection is worth the premium in my opinion...not only do you get better resilience (if you know what you're doing) you can take advantage of different pricing in different regions...for example, keep your front end stuff in regions close to your customers, but put your backend stuff (which is usually the pricier stuff) in cheaper regions like Canada, Stockholm, Seoul etc etc...yes, there is latency between regions, but there are technical solutions for that, like query caching etc which are dirt cheap to setup...
The divide between on prem and cloud is not a philosophical one, it's a skills based one...you can absolutely save money and build a more resilient system on the cloud than you can on prem, but it requires a different set of skills. if you're an on prem guy, it's highly likely that you've never had to worry about query caching technology because your database is on the same network as your frontend etc etc and the only resilience you've ever had to worry about is a UPS and a secondary internet connection.
I don't think either can objectively be called better than the other, because it's all in the execution and planning (or lack thereof)...it's all subjective.
For example, a small business with less than 50 employees at a single site is always going to benefit from the added resilience the cloud can bring, but a larger company with multiple sites might be able to pull off similar resilience to the cloud and not need it...it's all subjective.
What I do know though, is that for a lot of techies their hatred of the cloud comes from a place of fear...because shifting to the cloud takes their system away from being an esoteric monolith locked in a cupboard that only they can access to a platform that can be managed from anywhere by anyone...competition is scary for a lot of people.
Going from being the only techie in a 20 mile radius that can look after an on prem setup in the East Midlands, to being one of millions of techies on the whole planet that can manage the infrastructure from anywhere is a very scary prospect if you're a shitty, average sysadmin....it brings the wolves to your door...that is why a lot of people have a hardline opinion on the cloud and resist it with every ounce of their being.
> ...look after an on prem setup in the East Midlands
oof, close to home.
> Yes, but you don't have to go "all in" on one region with one provider.
There are legal matters concerning putting your data across international boundaries - our data remains within the country for exactly this reason.
If skills on one provider are over represented because most stuff is there it will be a self fulfilling cycle keeping services with a platform as knowledge in the systems used is prioritised. From limited experience, I'd guess this is the normal mode of operation for your average company, especially outside of the computing technology sphere.
> Cloud services are not an "online comms room"
Try telling that to the boss when he's seen stories about resilience and reduced costs & doesn't think he'll need to train his team, or can't get the sign off to, or, better still, gets a contractor in to facilitate the move who buggers off as soon as the last server is powered down, leaving the environment minimally supported by a team already supporting a business at full stretch & given only the barest information. It's unlikely the management will feel the heat when something breaks.
The tool may be right, but the implementation just sawed halfway through the branch the business is sitting on.
> if you're a shitty, average sysadmin
Someone, SOMEONE! Get this man a bigger hat!
The context is clear. They were not saying that cloud is better because it actually provides that. They said that avoiding cloud doesn't provide it either, thus that the decision of which one to use shouldn't rely on incorrect assumptions that one will compensate you if it breaks and the other won't.
One other thing to consider;
Not that it has happened yet (or has it?) but what position is your organization in the queue when a cloud provider has a major prolonged outage incident?
With on premise you have some control on what you bring back up first if you have built it reasonably well and have decent testing procedures.
Who forces you to use just one cloud provider? Unless the business is very small, it's rare for me to see businesses relying on just one provider. The only "cloud" service that essentially forces you to use a single provider is Office 365...but that's not really a big deal...Microsoft are generally pretty swift at compensating if they cause actual damage to a business...I've seen it a couple of times. A couple of businesses I've supported in the past reported record weeks when Microsoft paid out even though no business was done.
With a DC setup, all your eggs are very much in a single basket...it's much more expensive to span several DCs than it is to span several Cloud providers.
Also, if Office 365 goes down, the company gets to sue Microsoft. If your shitty AD + Exchange setup goes down the company gets to sue you...Microsoft can withstand large payouts without even a blink...you could lose your house.
> If your shitty AD + Exchange setup goes down the company gets to sue you...
If a company is suing its own employees something has gone very, very wrong. I'd call this an edge case.
> Who forces you to use just one cloud provider?
I suspect it is quite common.
- Bean counters are always looking to justify cost; go to the cheapest, reliable, big name supplier (outages also cost money).
- Skills within the organisation may only exist around one eco system; services will gravitate towards that platform.
- Management see having multiple suppliers as messy; services are migrated into a simpler to manage infrastructure.
Current operating models within business are not built for wide adoption (maximising shareholder returns is a major driver here), sometimes justifying redundancy is a chore as non-tech finance & management need to be repeatedly told, or sold, the need for a fallback which may just sit there, ticking over, consuming resource - financial or otherwise.
You like to use the word 'shitty' when referring to not-the-cloud. The technology is the tool, wherever it is hosted, however it is used, it is just a tool & versatile as a screwdriver is it struggles with nails - use the tools appropriately.
Yeah this sounds about right.
I think the vast majority of people that compare DC costs to cloud costs are comparing, say, a single rack (or less) or a COLO situation. When you have a full row, it's a lot less clear cut in terms of price. In the absence of any issues or maintenance...a DC can be cheaper in certain circumstances...but when you've got infrastructure that spans half a dozen or more racks...it tends not to be cheaper.
I moved a client of mine to the cloud about a decade ago...they had 8 racks in a DC up in Milton Keynes as well as a smaller "satellite" setup at their offices in 3 racks. Growing to span 8 racks is a bit like growing coriander...it's rare to just be immediately massive in a DC, you typically start out with one rack or something, and over time as you scale your infra, you increase the number of racks, but after a certain point, one of your older racks might need a complete refresh because by the time you've scaled to 8 racks or more, you've got kit in the original rack that might be a decade old, the next rack might have kit in that is 9 years old etc....so past a certain point, every year, you're buying new kit to "refresh" a rack to keep things reliable. So each year, you could be spending £50k-£100k on new kit for a rack as well as all the ongoing running costs. In the cloud, you just don't need to do this, you just change the instance type on your instances to upgrade to the newest kit (which usually keeps your costs down because the older an underlying instance cost becomes, the more expensive it tends to be).
Anyway, the client I moved to the cloud...I saved them £1.2m in the first year...each subsequent year was about £500k-£750k...I massively simplified the infrastructure as well...instead of needing 8 racks, I managed to drop them down to around a dozen app servers, and handful of backend service boxes, all load balanced through 3 proxies with 3 backend databases....costs about £120k a year. Previously, the DC was costing them well over £800k...plus all the maintenance and hardware refreshes, I don't know the exact costs there...but they were easily spending £500k a year.
The main knock on effect, which I suspect is the knock on effect that most people experience, is that with simplicity, you need fewer people. So they were also able to massively downscale their staff. They obviously kept most of the old hands and people with a large amount of experience working with and developing the platform...the highly skilled technical folks, but they no longer needed a large support team bumbling about swapping disks, deploying new kit etc etc...due to the complexity dropping significantly, they also needed fewer developers because it became much less difficult to update things and the codebase didn't need to be as complex.
I walked in on day 1 as a contractor and they had 20 developers, 10 engineers, and a team of 6 support staff for running about doing onsite maintenance.
About a year later, they had 3 devs, 3 engineers and me. I had replaced the CTO.
I've seen cloud audits where 20% of booked resources (not including accounts of former employees) turned out to be forgotten, and quite an amount of "just in case" on top. There is no difference, chaotic companies just have the same chaos in the cloud. For a higher price.
I totally agree.
However, the vast majority of "on prem" evangelists are this guy:
https://www.youtube.com/watch?v=02a723LsoFA
You all know the type, quite a few of you are this guy...and it's this guy that makes on-prem more expensive and shittier than it needs to be which allows consultants to come in and sweep everything to the Cloud.
The problem isn't that cloud is better than on prem or vice versa (I don't think that's even a debate, it's a case of the right tool for the job), the problem is that either solution is executed badly and leaves room for a switch to one or the other. The vast majority of businesses (50 users or less) have a twat like the Bruiser IT guy knocking about who bases his decisions on opinion rather than fact and you end up with a shitty system. He may technically be right, but he hasn't a clue how to properly execute the opinions he read in a forum.
The trouble with building solutions around opinions is that everyone has an opinion, you're eventually going to have someone rock up with a counter opinion and you've got a battle on your hands. If you build your solutions around objective facts, you're not going to be constantly under threat of switching to one or the other...because you have objective facts to back up your case not just opinions...you can always beat an opinion with facts, but you can rarely beat facts with an opinion.
I bet, and I've done this, that if you go around a typical "on prem" envangelists network and you note down all the model numbers for everything and the year they were released, you'll find all of their kit on some kind of top 10 list for the given year it was released, it'll always be number 2 (which is usually the "good value" option, because the top slot is usually "the best" but also "most expensive"). The kit probably wasn't bought because it was objectively the best choice to solve a problem...it was bought because it was on a CNET top 10 list and therefore probably cost two or three times more than the business needed to spend because the appropriate, boring solution, the objective choice wasn't in the top 10 list.
I see it all the time especially with things like network switches. How many times have any of you been into a small business to do them a quick favour and found a rack with 3 fucking 48 port Cisco switches in it (with full Smartnet cover), one of them POE (for the wifi) for a 30 user network...that is egregious...one 48 port managed TP-Link with an 8 port unmanaged POE switch for wifi would have cost 75% less and done exactly the same job with no ongoing contracts / subscriptions...objectively the right solution. It's stuff like this that makes the cloud seem a lot cheaper than on prem because from the point of view of the CEO and CFO...they're able to get rid of a load of ongoing support contracts etc...that's what they see.
Part of the difference with what you did, compared to any others I have witnessed, is that you performed a validation and cleanup before or as part of your cloud migration.
Most places do a "forklift" move and migrate all the old unused and redundant crap as well so they can get their cloud checkbox marked as quickly as possible.
On premise garbage becomes cloud garbage.
“ The main knock on effect, which I suspect is the knock on effect that most people experience, is that with simplicity, you need fewer people. So they were also able to massively downscale their staff. They obviously kept most of the old hands and people with a large amount of experience working with and developing the platform...the highly skilled technical folks, but they no longer needed a large support team bumbling about swapping disks, deploying new kit etc etc...due to the complexity dropping significantly, they also needed fewer developers because it became much less difficult to update things and the codebase didn't need to be as complex.”
“Obviously” - LOL ….Normally the old/domain knowledge - read expensive - staff are the first against the wall to be shot by transient CTO’s.
I think the vast majority of people that compare DC costs to cloud costs are comparing, say, a single rack (or less) or a COLO situation. When you have a full row, it's a lot less clear cut in terms of price.
If you have heavy-duty compute needs that require a full row of racks in a data centre - that is, they are actually doing real work and not just sitting there idle most of the time - then this is the case that can work out *much* cheaper than cloud.
Say you are using AWS c5.metal (96vCPU, 192GiB RAM, $4.08 per hour): it only takes a few months before you've covered the cost of buying an equivalent server. Even if you got a 50% discount by taking, say, a 1 year reservation, you've still covered the cost in less than a year; after that you're just paying for the colo. And that's before looking at things like data egress pricing, which is the main gouging point.
> hat seems crazy. Why would business leaders make such a devil's deal? It's the externalities, you see.
You then go on to list a lot of costs. Externalities are costs bourn by society - e.g. the pollution caused by the electricity to use, the noise nuisance to neighbours caused by building your datacentre, etc.
I am pretty sure 37 signals are including all these cost
> You have to amortize your hardware purchases as part of your tax reporting - a $10k server depreciates in value every year you operate it and you get to write that off in taxes as capital expenditures (cool right?), except it's reallllly expensive to pay a finance department to do that kind of work.
You already have finance department (or atleast an external account for a small business) and they already do this work, so the additional cost of doing it for a bit more equipment is negligible. It has to be done for everything from office furniture to cars to factories.
> Your budget becomes operational expenditures, which are far simpler.
Harder to manage and requires specialist skills- that is why there are jobs for finops people.
> That VM is probably going to be $200-300/mo. on the cloud or more (depending).
A VM that you can get for $200 is not going to require all the costs you list. It could be replaced by a single server. You can co-locate it in an existing data centre so you get a single bill for bandwidth, cooling etc. Or you can rent a server for less than the VM and you still have simple to manage operational expenses - even simpler as you can get a fixed fee.
If you are big enough to be actually spending lots on cloud (you might notice that 37 Signals spends rather more than $200/month) you have economies of scale on all those costs so its still going to be a lot cheaper.
Overall these conversations leave me with the feeling that the finances of using cloud at any scale are being defended by tech people with a poor understanding of finance.
> because the tooling is very rich and enables doing more with less
Lots of complex tooling that requires people who understand that particular supplier's product (good job security for cloud architects, I admit) and a great way of tying you in to them too. Loss of option value.
"A VM that you can get for $200 is not going to require all the costs you list."
I see it as like renting a home. Your rent includes the mortgage, insurance, taxes, repairs and maintenance along with a margin of profit for the owner. If the owner has hired a management firm to look after the property, you are paying for that too. If you buy something, while you are paying for everything except profit and management, you are also in control of repairs and maintenance. Each time you make a payment on the mortgage, you accumulate some equity. Every year the home is likely to increase in value which creates more equity. With all of the advantages of ownership, there are people that claim that renting avoids having to pay taxes and upkeep as if the owner is going to take a loss each month even if some do wind up in that position. What renting does is take away the upfront investment at the expense of a long term cost.
If your company can afford to make the investment in their own data services, that's going to be a better value over the long term once it's at a certain size. That doesn't mean that hiring outside services to shove work to as a buffer when needs balloon isn't a bad idea. A sole trader can be better off hiring an outside bookkeeper so they can concentrate on doing paying work, but would save money by directly hiring somebody if it's a full time task. There's the cost and also the control over what and how the bookkeeper does.
The problem with that analogy is that, for a lot of the time, if you build equity in a house, you can sell it for more than you paid for it, and in many of the cases where that didn't apply, it was at least worth the same amount. After you've paid for a server, it decreases in value until you can't sell it at all and the remaining value is just continuing to operate it until it breaks. Owning a house means that, at the end of the process, you have an asset that someone else will probably buy. Even if you smashed up the house, someone will probably buy the land to build a new house on it. At the end of a server's life, you have some scrap metal and need to buy a replacement.
Renting might still be more expensive, but not necessarily. There are two other things you need to consider in your comparison. One is the different things you get when renting and owning. Renting means the power, space, and some types of hardware maintenance are included. Make sure to include those costs for owning the server when you compare them. The rental option may come with extra costs too. For instance, you're probably paying for data transfer by the gigabyte whereas you may not have any metering on your local connection. Add that one too. In many of these cases, those prices are not identical to what the other side would be paying so you can't eliminate them with the logic that "you're paying them either way". If you're paying indirectly for electricity usage which the cloud provider has optimized for because they run lots of electrical equipment and don't want to pay too much, they may have lower costs than you would. If you've got a lot of traffic to send, their bandwidth pricing might be an order of magnitude more than it costs you to rent usage of a sufficiently wide pipe. You have to know what the numbers are and only then can you compare them. No shortcut will give you that power.
"The problem with that analogy is that, for a lot of the time, if you build equity in a house, you can sell it for more than you paid for it, and in many of the cases where that didn't apply, it was at least worth the same amount. After you've paid for a server, it decreases in value until you can't sell it at all and the remaining value is just continuing to operate it until it breaks. "
There is a failing in the analogy comparing an appreciating asset against a depreciating one. There's still a cost of entry for both when purchasing either. You can run the wheels off of a server you own if it's suiting your needs and shift more demanding work to newer iron as you swap things out. A service is going to be 'upgrading' their hardware all of the time and customers will be paying for that even if they have no need for it. I do the same thing at home by getting older computers, often for free, and using them for tasks where monster speeds aren't required. If I ever get around to building a CNC router, a computer to run it can be ancient as it's not very demanding. I've also thought of getting a small lathe and slapping some stepper motors on it to do some basic turning such as balls or concave faces. There are manual tools for doing those things, but they are very expensive and dedicated. The computer to run that would be very simple. My media server is an old cheese grater MacPro. Maybe I could get $100 for it if I found the right buyer, but in it's current role, it works a treat.
This post has been deleted by its author
If you don't know how big your data is going to get, or how long it will take to get big, then yes, cloud storage makes sense for startups.
Just ensure as best you can, that you have a reasonably-fast and inexpensive way of getting your data back off of the cloud when you do go on-prem.
Even more importantly, no startup in its first couple of years of operation should be making financial decisions based on 3 or 5 year time horizons. 3 and 5 year horizon problems are firmly in the nice to have category when the average life of a startup up is below 18 months.
I guess what you call growth are those companies that allow free user subscriptions like FB and speculate and capitalize on those numbers. Personally I do not see a real business growth that requires to scale up to what is provided by hyperscalares. Not even a lot of big blue chip companies have that need to scale on hyperscalares. Only companies that their business model is based on billions of users for capitalization. So yes there is a big push over everyone throats that the cloud is the way to go. Some sale it as managed service as well. But the reality is if you have Davy and nimble team it is better to stay on premise. And even so may also implement cloud like availability using standard Linux and Kubernetes on premise.
Sounds like they got the hump with AWS or was that the way the journalist wrote it?
Once you reach a certain scale, cloud is just another profit margin to be paid. Where that point is, depends on the use case. Government should easily be of the scale where inhouse would be cheaper but they always seem to do that wrong and get in problems. They'll probably do cloud wrong as well, in a different way and as a citizen I don't like dependence on foreign companies. You don't really know the dependencies and have limited control so I hope we don't have any critical infrastructure in foreign cloud.
"and you don't need to go out and spend $1.5M up-front on 18PB of storage that you may or may not end up needing."
Indeed you do not, because operating your own hardware, just like using a 'cloud' service, allows you to gradually build out your deployment as you scale up. You don't actually need Amazon to get that property.
I'm no fan of DHH, but this is one point that he is correct on - cloudycloud providers like AWS have falsely convinced everyone that 'cloud' is the only way, to the point that these providers have almost mythical properties attributed to them and people don't even really *think* anymore about what's possible by running your own infrastructure.
You've somewhat missed the point here that if you start off small on-prem, and have to grow, there is either additional up-front cost (i.e. buying more expensive chassis and/or controllers that can handle potential future expansion), or significant additional cost as you go both in terms of replacing kit that is no longer fit for purpose and migrating data across.
Additionally, you can't just click your fingers and magic new kit into your datacentre. Procurement, shipping, and commissioning takes time, and if latency stymies your ability to scale then you're potentially leaving money on the table, or you're delivering a suboptimal experience to your existing customers.
I could launch a startup in the cloud tomorrow and literally be paying pennies for storage for months until subscribers ramped up. And if they ramped up rapidly (e.g. after a rave review online or by some influencer) the cloud would just keep scaling the storage for me, and I'd only be charged after the fact.
Yes, once I've got a zillion users it might be cheaper then to repatriate my data on-prem, and because I'd then be buying hardware in bulk it'd likely be cheaper overall than if I'd bought all the hardware piecemeal. I've still probably paid more overall than if I bought all that hardware up-front, but crucially I didn't have to gamble any CapEx on day one, and started off with extremely limited OpEx, giving my startup the best chance of survival.
I haven't missed the point at all, I've understood it perfectly well. I just do not believe it to be true.
To put it bluntly, if you are trying to start a tech company that does something with hosted services, and you cannot afford to purchase servers as you grow, then you should not be in business. Servers cost absolute peanuts compared to most everything else, even if you include maintenance and setup costs. You *should* be investing in the core infrastructure for the service you are providing.
And as for "latency stymieing your ability to scale", when does that *actually* happen? Because it sure is an argument that cloudycloud providers love to trot out, but I've just never found it to actually hold true in a real-world business setting. The scaling problems are pretty much always on the (non-maintenance) staffing side, not on the hardware side, and it doesn't actually take that long to get a bunch of hardware shipped to a datacenter with ample room to expand, if you've done even the most minimal planning ahead.
You're running a business. You're expected to have some long-term planning skills to turn it into a success. "Scaling instantly with zero planning or foresight" is a completely unrealistic expectation that *will* wreck your business on another front, if not on the technical infrastructure side.
AKA "ordering a server or two with delivery originally slated for 4 weeks out, oh, oops, sorry, those SSDs you specced got melted in a fire, we can't ship your order for another 4 weeks, Ah, DARN IT, we've had a production oopsie and you're holding the short stick so now it's 10 weeks, oh, dear, there was a shipping labour dispute so your hardware is stuck in a container somewhere and won't move for at least 2 more weeks."
And now your new hardware to scale up/out is 2 months later than expected and you're scrambling to throw anything you can find into the rack to prop things up in the meantime.
Supply chain limits for getting your own physical hardware are a factor a lot more organizations are keeping an eye on.
...I have used it for nearly ten years with only software updates. It hosts some VM's, one read only NFS share, and has a few CPU's crunching numbers for LHC@home. Running du /etc/ as root shows me that it is using 13040 kilobytes. 13mb of text files for to configure a simple home server. Plain text. Linux, *bsd, Illumos, these things may seem obscure but they are not hard because you don't have to (and absolutely should not) touch 99% of what is in /etc for your server to work. Sane defaults, anyone?
Yep. I discovered several servers with years of uptime. Record is > 1280 days on two Server 2012 R2 hyper-v cluster hosts in 2019. No one would have noticed until I came along and pointed out, and it had > 100 updates in the windows update queue. Server 2008 R2, hyper-v too, > 800 days seen. If you don't give the OS stupid application tasks which fail to de-allocate (hello IIS and Exchange!) it can chug along for a surprisingly long time with no issues at all. I've seen NT 4 / 2000 / 2003 servers with uptime greater a year too, but there you usually notice a performance degradation.
Insert Schrödingers Windows Server experiment here, since you never know what you will find until you look.
But, given the current way the world is, you need to update much more regularly. Every OS, no exception allowed any more. Don't try the "security by obscurity" thing, that does not work any more.
Well, no, hot patching without reboot is a relative new concept for linux too. And you still have to reboot for kernel updates.
There are system out there which can do that stuff without any reboot ever, literally for many decades. But those are special narrow specific purpose OS-es, and not general purpose like Windows/Linux/Mac/most other OS-es.
While kind of true, that's misleading.
Linux can install updates to huge parts of the system without rebooting, but the applications themselves have to be restarted to actually start using the patched version.
Installing generally takes a lot longer than restarting, so the application downtime can be kept very low - often such a short time that nobody even notices.
Same with the kernel. You can indeed install module updates and even kernel updates, but they will not actually be used until the updated part is restarted.
Hence rebooting is safest, because then you know that everything is using the new stuff.
Hence rebooting is safest, because then you know that everything is using the new stuff.
You also know it will reboot after that has succeeded, useful when you get a forced reboot on power interruptions, watchdog reset, accidental command in wrong terminal, etc...
I've got a Mac Mini at home, its day job before I aquired it was a file server for the art classroom at my last school. It had been running for a little over 10 years without being updated or powered down.
It's now my file server at home, and has been running for 3 years after I updated it to Catalina...
Ubuntu 24.04 serverBirth: 2011-07-19
Well... that's the problem with 'birth', isn't it? You're running a distro which was released 13 months ago but the root directory was created 14 years ago. In those 14 years your computer has probably been power-cycled, restarted, and crashed a zillion times. You've probably even replaced the drive with an SSD and kept the same birth date. So the birth date isn't really of much value.
And loads of people in the other comments seem to be confusing it with uptime. I used to think I was doing well with an uptime of a couple of years, but nowadays I'm getting 2 or 3 weeks with security update restarts.
> Ubuntu 24.04 server - Birth: 2011-07-19
When I use my main machine: Birth was 2011 too. I installed Windows 7 64 bit in UEFI mode. At that time on an i7-2500K OC @4.8 GHz air cooled (silicon lottery luck, Hyperthreading makes 100% speed increase at that clock instead of about 30% at ~3.5 GHz).
Upgrades: Win 8.0, Win 8.1, Win 10 (1511), then nearly every Windows 10 build, then Windows 11 21H2 (I wanted Nested V for AMD), skipped 22h2 and nearly 23h2 due to shadowcopy bug, now 24h2.
Upgrades²: i7-4960x (with other mainboard), then Ryzen 2700x (mainboard swap again), then 3900x, now 5950x for quite a while since AMD refuses to put 3d cache on BOTH chiplets - I would have to go Threadripper.
But birth is still 2011! I'd say you installed it 'round 2011, by that time probably Ubuntu 10.something. Then you upgraded. So pretty much the same as here. And I guess you upgraded the hardware as well.
If you start reading those plain text files you will see that many are commented to an extraordinary degree. More than a few, in fact, are all comment with the sensible defaults listed (but commented out) and prefixed by a paragraph explaining what it does and why the default was chosen.
Your 13MB is probably nearer to 13KB in terms of actual configuration data, and most (all?) of these files are only read at startup or if they change.
> Pure Storage kit that Hansson wrote will cost less than $200,000 a year to operate
Everybody knows that AWS S3 is costly and if you just want raw storage, it's not the appropriate tool for this.
But there's absolutely no way that operating 18 petabytes of storage could cost "less than $200,000 a year".
In salary alone, having the competent workers operating it would cost probably five times that.
And then if one's want to be fair, they have to compare what they actually get. And again, I doubt they'll have S3 SLAs with their in-house solution.
Which they probably doesn't need and that's fine, but just don't say "look how cheaper I paid for this M&S sandwich compared to this meal at a Michelin restaurant!"
It's annoying on both side: the cloud sycophants and the on-prem nuts.
Can't we just get pragmatism back? Use the right tool for the job?
On the SLAs, are you talking about the availability SLA for S3 standard that is 99.9% (designed for 99.99%) or something else?
If he doesn't achieve that, there are some big issues.....
More seriously, why does everyone think that cloud SLAs are great?
If your S3 standard has availability "less than 99.9% but greater than or equal to 99.0%" you get a whopping 10% service credit - on prem kit that didn't have a design of at least 5 nines of availability for a single site didn't look good against the competition in the 2010s.......
5 nines has long been a gold standard for high-availability systems. But dont conflate design intent with real-world practicality and cost-efficiency.
S3’s design availability of 99.99% reflects the reality of running at massive scale, with global durability and geographic redundancy built in. That's something even the best on-premises solutions struggle to match without significant investment in infrastructure, personnel, and failover mechanisms. And that’s before considering patching, upgrades, hardware failures, and human error—all handled automatically or abstracted away in S3. Yes, the service credits may not feel generous, but the real value is in not needing them. Outages in S3 are rare and typically short-lived—compare that to diagnosing and resolving hardware or software issues in your own data center at 3 a.m.
I'm not sure I'd consider 5 nines to be the "gold standard" - it was once, for a single site, but if you care, you'll be doing a stretched cluster across two sites and potentially have a 100% guarantee from your storage vendor (which again likely pays out in service credits, but at least it's against any loss of availability, not a "write the day off and you'll get some pennies back" level of outage).
Regarding:
"S3’s design availability of 99.99% reflects the reality of running at massive scale, with global durability and geographic redundancy built in. That's something even the best on-premises solutions struggle to match without significant investment in infrastructure, personnel, and failover mechanisms. And that’s before considering patching, upgrades, hardware failures, and human error—all handled automatically or abstracted away in S3. Yes, the service credits may not feel generous, but the real value is in not needing them."
Surely the real value is delivering something that beats S3's designed 4 nines of availability at a fraction of the cost?
If "Outages in S3 are rare and typically short-lived" why don't AWS put their money where their mouth is and up the SLA to something that isn't shit?
Cloud SLAs mean absolutely nothing. Not only are MS and Amazon suitably vague about how THEY calculate them, but if anyone does a Google and deletes your entire organisation, production and backups...who are you going to take to court? Especially if your idiot senior management are of the "It's cloud and it's triple redundant, we don't need backup"
MS lost SQL across the whole of South America a few years ago, imagine being a non $billion firm whose SQL servers are down for 10 hours while MS restore more important customers. At least onsite you can walk up to your staff and taser the fucker who unplugged the SAN for the vacuum cleaner.
SLAs in general don't mean a whole lot unless the cost of the service is high and the compensation for the SLA violation is high. In most cases SLA violations (cloud or not) result in tiny compensation that probably wouldn't actually compensate for the issue. Better of course is track record of achieving high availability over an extended period of time.
In one on prem example I am a bit familiar with, HPE had/has some 100% data availability guarantees on some storage systems. I bet in almost all cases they hit that without an issue. But if you do hit some problem that is your fault that takes your storage system(s) offline for a few hours or something the penalty for HPE is something like refunding you the cost of the support contract or something. My oldest HPE all flash 3PAR storage array for example has had 10 and a half years of 100% data availability(this system I think originally had a 99.999% guarantee), and keeps on runnin' just one component failure in a decade (one SSD died in early 2023). Long past EOL at this point.
My favorite data center internet provider has a 100% uptime SLA (though there are exceptions for things like DDoS). For the most part they have been flawless in the almost 20 years I have been using them, but there was a time in 2016 I think when we had many brownouts due to DDoS of other customers(we were never the target). A couple of years ago they got acquired and the new company was trying to figure stuff out and did some bad things unplugging stuff that shouldn't be unplugged causing brief outages (minute or two), when the 2nd part of their circuit was already down from a fiber cut. Fortunately didn't cause much pain so didn't bother trying to claim any SLA stuff. I yelled at them and they got the message and stopped messing around. I'm sure other much bigger customers did the same.
I recall one cloud management company that one of my employers used a long time ago, had a super high SLA, probably not 100% but close. All of the company's cloud servers relied on their DNS for provisioning stuff. One time after I left their DNS went down for something like an hour or so, taking the company's servers out more or less(load balancers couldn't reach them or something) big issues. The provider issued a full refund for the service for the month which amounted to something like $200. I recall being told the words the CTO said at the time "Clearly we aren't paying enough for this kind of service". The company's cloud bills were in the $300-500,000/mo range while I was there. I put together a proposal to pull them out of the cloud(ROI was between 3 and 6 months), and everyone up to including CEO/CTO was on board but the "board" shot it down, I left within a week which started a mass exodus where many of us went to the same company as a result. Company died a couple of years later.
Cloud SLAs are kinda meaningless when the infrastructure and Cloud services are being continually being overhauled, thus forcing continual development churn just to keep the AWS instances alive. Case in point some of our internal clients migrated to a major Cloud a three years ago - and basically they have an "incident" every other month, much of which is down to the churn and people making mistakes (the odd AWS outage or liquidity shortfall on our 'reserved' capacity happens too).
By contrast the on-prem clusters (even with rolling OS updates, service updates etc) have had *zero* downtime for two years solid (the last incident was literally caused by a back-hoe - but happily the workloads continued executing uninterrupted in the DC regardless). In addition to requiring less overheads within the internal client teams (no Cloud SAs/Operators/SREs needed) the on-prem clusters running costs are less than a third per CPU hour. It really depends on your workload at the end of the day, and ours are very definitely not the classic AWS use-case. Our workloads tend to run 18x6, saturating 100s-1000s servers at a time and the networks get eaten in bursts - we are very bad neighbours. The classic Cloud benefits of consolidating multiple workloads onto a physical host, and multi-region availability don't apply (our workloads are HA by default because you don't get a big enough maintenance window to bounce an entire cluster at once).
It is a case of the right tool for the job... For (our) workloads that span multiple hosts and saturate them for the majority of the day - Cloud has zero benefit - even the liquidity argument doesn't cut it as we found out when our internal clients tried to actually *use* their *paid for* reserved capacity (6 month lead time vs 3 months on prem - and of course that wasn't covered by the Cloud SLA). For the sake of fairness I have to point out that there are plenty of internal workloads that have happily moved to the cloud - just not the big ones we look after. :)
But there's absolutely no way that operating 18 petabytes of storage could cost "less than $200,000 a year". In salary alone, having the competent workers operating it would cost probably five times that.
$1M in salaries to run perhaps a couple racks of storage in two (or more) physical locations for HA? After the initial design and set-up, taking care of the storage should not be a full time job at all and I'd expect the storage admins to also partake in other infrastructure work or vice versa.
Now, if the devs and other users are constantly asking for changes or new file shares, object storage, block LUNs and such then it could be full time for a single person, but at that point automating a self-service portal or carving out a limited management to their own piece of storage could be arranged. That hardware itself requires very little work.
"I doubt they'll have S3 SLAs with their in-house solution"
Amazon Computing & S3 SLA's allow several minutes of downtime each month without giving back any credits.
Reaching 100% with mid-range storage with proper infrastructure in place (power, cooling, cabling, monitoring etc) is not hard at all.
It's annoying on both side: the cloud sycophants and the on-prem nuts.
Agreed.
Not wishing to be funny but some of us have dev rigs that are bigger than the whole infrastructure of 37signals, and looking after those is a part time job for one person alongside the other rigs plus other tasks.
It's not that complex or expensive a job if the kit is any good.
Funnily enough my latest personal dev rig is a new on-prem cloud because running tests in someone else's raises so many issues including controlling the ongoing costs.
> But there's absolutely no way that operating 18 petabytes of storage could cost "less than $200,000 a year".
The admins responsible are not 24/7 ONLY STORAGE responsible. Possibly a few days a year in total. The rest they do other things, like turning the mice around so the cursor moves as the user expects, plug in the wireless dongle, or swap the battery of such mice.
I as a storage admin of 12 arrays, and 4 fabrics, spend maybe 1 hour a week on average on managing them, that includes the time i have used to migrate to newer arrays. We have had 100% uptime over the 12 years we have had them and while migrating to newer arrays. The arrays were 3par and primera.
I do various other jobs, which takes up far more of my time.
I don't think that S3 runs itself at no extra cost. I mean, I'm quite sure that 37signals currently employs people to manage their cloud systems. This cost will not go away, of course, it will be moved to employing on-prem techs instead of cloud techs. Then there will probably be that extra 200K cost for energy, cooling, space, maintenance contracts, etc, and this is where the 200K figure comes from. It's not a total, it's a difference.
They'll need to learn some new skills, like swapping the right dead drive, and keep up to date on the VM host management software they chose.
On the other hand they don't need to continually relearn where the cloud provider moved things so they'll have time to do the above.
All their application stuff will be the same, no matter where it's hosted.
I saw the CTO say something like his staff is actually working less for the on prem stuff for the most part, a big headache for them in cloud was apparently forced upgrades with Amazon RDS. I haven't touched RDS since 2012, so my experience isn't recent, but it was a bad product back then anyway for anything but non critical casual workloads, I'm sure it's gotten better since of course...
I'm not sure the extra cost is really related to the SLA. Essentially, you can get that renting a SAN in a data centre, with redundancy if necessary. Apart from "convenience" with cloud services you're effectivaly paying a premium for the option to expand (and contract) at very short notice. If you know your load then there is no need to pay for this hot standby contingency. But, also, if you know what you're doing, AWS isn't where you want to be: way too complicated.
I believe the CTO posted some pictures, the 18PB from what I recall is just two Pure storage systems, something like 6-8U each ? One system at two different sites with replication.
operations overhead for such systems should be almost nothing. Mostly set it and forget it. From what I recall they don't actually do backups though, just rely on the data on the systems themselves, there is always a small risk of software or hardware bugs that could wreck havok even if the data is replicated. I have read of enterprise storage replication bugs that happily replicated the corrupted data(probably because it doesn't know it's corrupted). Also a remote chance of having a security incident and the storage systems compromised and the drives getting formatted or something.
Certainly a very low chance of such an issue happening, but I still think it would be a good idea for them to invest in a tape backup system with periodic (even monthly is better than nothing depending on their data churn rate I suppose) backups with that much data.
Tell us you've never admin'd Pure Storage without telling us.
One person can easily admin 18PB of Pure Storage as a subset of their duties. The interface is incredibly simple to use and everything is built-in, including S3. I work with people who admin PB-range of PureStorage and it's not their primary job.
$200K is probably mostly the hardware support cost from Pure.
Why wouldn't they have the same SLAs? Orgs plan on 100% availability for on-prem storage because that's how it's designed. We've used Pure for about 7 years and it's been zero downtime, and I recall other storage subsystems previously with the same track record.
I'm not trying to pimp Pure, but being a storage admin in 2025 is not rocket science. If the use cases is mostly S3, it's even easier.
Headline In six months? '37signals Loses All Customer Data After Someone Types the Wrong Thing Into a Linux Terminal, Backups Were Never Checked And Aren't Working'.
I certainly hope it never comes to that and they know what they're doing, but self-hosting your own stuff isn't a panacea either. In particular the 'whoops that was never backed up because of script issues' problem and the 'uh we never actually tried restoring the backups and... for some reason they don't work?' problem.
IF they are being super paranoid about data integrity and are still saving money, more power to them! I have just seen the above happen so many times.
I've seen that several times when using multiple providers. It's the shared data responsibility model.
It's the customer responsibility to setup, correctly backups, the operator responsability is only that you are able to access said backups (not that they will restore your data, you may have misplaced some required key, the system software version changed and is no longer provided and you didn't update when they sent you all those pesky emails telling you it that, etc).
Well back in the day (which wasnt that long ago) onperm was all there was (plus outsourcing but you still did the operations) and companies coped very well with system resiliency.
I spent 45 years in the onperm space as a server/database consultant to very large organizations and as far as I can remember in the onperm space you never had crap like storage (say S3 buckets) being left wide open for the world to steal from cause wide open was the default config or a customers entire data/servers/backups being deleted cause accounting lost a payment or customers thinking XXX service meant remote failover but it didnt so they went down hard for weeks or a DNS change unrelated to your systems taking them offline or ... (I could go on forever).
The BIG difference is your on perm team CARES about the systems they manage... cloud teams not so much.
It's way easier to setup local/remove fail over, setup and test backups, setup and test database/SAN replication etc when you have the primary servers in a room just down the hall from you... rather than trust a cloud 3rd party stack/personal to do it.
Bluck
"I spent 45 years in the onperm space as a server/database consultant to very large organizations"
Evidently much more professional ones than I've seen because several of the things you've never seen are not so rare in my much shorter experience, also on prem though cloud users got them too. For example, some of the stuff you've never seen:
"storage (say S3 buckets) being left wide open for the world to steal from cause wide open was the default config": I work in security. People leaving things open that they shouldn't because unauthenticated was on by default is really common. Where the server is makes no difference. That's down to config. People are often lazy with configs and it's often a problem.
"customers thinking XXX service meant remote failover but it didnt": Definitely seen it. People who thought that they had built redundancy into their system but they didn't happens whether you're using cloud services or not. Cloud services provide some tools that you have to build yourself for on prem, but it doesn't save you if you don't know how they work and what you have to do to use them properly.
"The BIG difference is your on perm team CARES about the systems they manage... cloud teams not so much.": You just made that up. I've dealt with on prem teams who didn't care, cloud teams who didn't care, dev ops teams that didn't care, managers who didn't care, and occasionally each of those things that did care. What they were doing was not a good indication of whether they would care about it. You can manage cloud services and care about them staying up and functional. You can manage your own servers incompetently. Your allegation that either of those doesn't happen is obviously wrong.
"The BIG difference is your on perm team CARES about the systems they manage... cloud teams not so much."
This is the absolute truth. When dealing with my own personal outages I'll be a bigger hardass than Patton but that circuit of yours I'm working on that your business loses thousands of dollars an hour on? For me that's just another circuit out of tens of thousands I've worked on in my career; I will sit here and watch your ticket clock without a care in the world while I eat my lunch. Your thousands vs my stomach? My stomach wins every time. And when I finally finish my lunch and get around to fixing it? You'll thank me profusely and remember my name with fondness and affection for years. Meanwhile, I forgot you existed as soon as I hit the ticket close button.
I also fully expect to be thumbed down by stone casters that did the equivalent for their job field during the last week. I, at least, am honest about it.
"I will sit here and watch your ticket clock without a care in the world while I eat my lunch. "
I expect that the smaller the client, the less the big Cloud companies are worried about getting them back up as quickly as possible. If you are getting pinged by the head of the company and Christmas bonuses are on the line in an on-prem situation, you might be typing with one hand and snacking with the other to keep your energy up. Human nature and all that.
Couldn't 37signals have done that in the cloud anyway - it's their data and backup (and HA) would have been there responsibility anyway?
Cloud in its most basic form is just paying to use someone else's computer - it's not a magically better computer that is impossible to screw up and in some ways you take a lot on trust..... (ask UniSuper https://blocksandfiles.com/2024/05/14/google-cloud-unisuper/ ).....
Couldn't 37signals have done that in the cloud anyway - it's their data and backup (and HA) would have been there responsibility anyway?
Sure, but you seem to be missing the point of the article. 37signals reckons it can do it for a lot less than Amazon are charging it, and probably with a better SLA.
You're missing what my reply to the original comment was about.
Irrespective of whether 37signals are in the cloud or on their own kit, 37signals were always responsible for their data, so needed to protect (and not delete) their data anyway - to paraphrase the OP, suggesting "you'll be sorry" as if 37signals will lose some amazing protection by not being on AWS in future is bollox.
As per the article, I've got no doubt that 37signals reckon they can do it for less than Amazon are charging.
I also happen to believe they'll pretty much save what they expect, as they're not a vendor pushing a position for an angle/extra sales, they're a company with the owners effectively spending what is in part their own money in the way they believe will generate the best outcome (or in this case, the same or better outcome at a significantly lower cost).
forgot to mention in my other comment but will say I believe I saw the CTO said when they were using S3, they had no backups either(so same as the situation with Pure storage on prem they have no backups still). Though I assume they backed up their other non S3 data. No idea what kind of stuff they do that stores that much data. Maybe that data isn't that critical, don't know.
If you're considering the possibility of fat-fingered staff not doing their jobs properly, Google tells me that AWS S3 misconfigurations account for 16% of cloud security breaches.
Cloud is not a panacea, nor does it relieve you of the responsibility of basic housekeeping. Although you can punt the responsibility for a great many operational tasks (like backup), the convenience comes at a considerable cost and significant dependency - the final part of this project has had to wait on an Amazon decision to waive $250k in egress fees. VMware customers have seen what can happen when that dependency is exploited.
Too often, people are simply resorting to cloud vendors in order to pass the buck or because it makes their accounts look tidier. With the amount 37signals claim they will save, they have the opportunity to employ the right people in the right numbers to make it work. It's also a rather curious view of the IT industry that you can trust a company to develop complex software, but somehow it will simultaneously be incapable of managing its own backups. If they're not, at least they'll only have themselves to blame.
That does bring up an interesting point.
IT staff with lack of skills exist everywhere but in an onprem situation (at least in my experience) this is weeded out with stringent testing and change control:
(1) dev->test->qa->staging->prod (noting not everyone has staging which is typically a full sized clone of prod including hardware... great for performance testing)
(2) the change being done goes through a change control board which approves the initial change and reviews it after each deployment level noted above
I get the feeling in the cloud space and with this new fangeled devops that a single person is "judge jury and executioner" and changes get stuck in with little oversight (like the multitude of failures due to mangled DNS and other configs that we have seen)
I think they are taking the "move fast and break things" mantra a little to seriously.... great for a startup but not for a BAU/production situation.
Bluck
In Mission Control ...
# dd if=/dev/zero of=/dev/sdc bs=1M [Enter]
(time passes)
PFY: "Hey, Charlie ... why is the system so slow today?"
BOFH: (looks up from his onion bhajis and sees write activity lights flashing madly on the SAN controllers) "WHAT. DID. YOU. DO?!"
This sort of thing can happen at cloud providers as easily as it can on-prem.
What is this "Linux Terminal" you write of?
I open a terminal emulator and type in `echo $SHELL` and I get; /bin/bash
Is that Linux? It doesn't seem to be;
/bin/bash --version
GNU bash, version 5.2-release (x86_64-pc-linux-gnu)
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
... telling a CEO "I told you so!" before setting to work, pulling them back out of the quagmire that is called "the cloud".
Since 2009, I've made more money pulling companies out of the cloud than I have from any other single aspect of IT.
The cloud is a marketing meme that is well past it's sell-by date.
Hands up those who remember the days of the Service Bureau. and later timesharing. And why we don't do that anymore.
It's not as black and white as you portray, but 'going cloud' is definitely not always the answer.
It depends on your use case and context. For some this works, but it should not be the default decision it has been for a while, and "because it's new and shiny" is IMHO not a valid argument :)
"Hands up those who remember the days of the Service Bureau. and later timesharing. And why we don't do that anymore."
That's were I started.....back in the late 70's.
Company had a full suite of accounting/inventory management software and cause only the big boys could afford their own inhouse systems, this company sold the applications and the bureau services.
Part of the "magic" (much like this new fangeled cloud stuff) was the back end partitioning with related system resource accounting to bill a customer for actual usage and not just a flat fee.
Ahh... those were the days.. when I was slim and trim and I could drink all night!!!!
Bluck
Anyone with basic reading and maths skills could work out that cloud isn't exactly a cheap option, especially for anything persistent; if you need something permanently dont rent it! There are plenty of cases where cloud does makes sense but some of those also apply to just running your own on-prem cloud.
Running your own kit also usually means you know exactly what the cost will be ahead of time and can have tight control, something a cloud subscription doesn't exactly guarantee. Very easy to get burned especially if someone gets careless with a deployment.
37Signals are just lucky they had what sounds like a relatively tiny setup to move and that it was set up so they could port without reinventing anything; very easy to become tied to a specific platform if you start using all those lovely features.
Because I'm guessing that there's going to be a nice bonus for this guy saving so much money for the company.
One question : the guy that convinced the Board/CEO that The CloudTM was a good idea in the first place, is he still withthe company, or has he moved on to get another fat bonus check from another company ?
I'm guessing this type of story is not going to help AWS's revenue stream . . .
Reading the Wikipedia article suggests Hansson is part owner. It refers to him as a partner in Basecamp which looks like a stale entry from the time before the company changed its name back to 37Signals. When you own the company, or a large part of it, it makes decision making a lot quicker although it helps if you know what you're doing.
It depends a bit, despite being binary it's not quite black and white.
It is possible that it was a better option at the time. Don't forget that Cloudy setups ran the same volume building technique everyone has ever used since trade began: start with selling cheap until you have the dependency (read: lock in) and then crank the prices up to find the friction point where customers leave and stay just under it.
It depends where you evaluate costs in this cycle, and no provider is going to tell you in advance how expensive they will become - sometimes they don't even know themselves yet, nor is that a good acquisition strategy :).
That said, I commend this outfit for having the brains to re-evaluate and decide it was time to change. Some don't because admitting that times have changed is for some harder than letting the company bleed money as it's not theirs anyway. Bad SHV management, but logic rarely features at that level.
we spend over £350k a year in Azure (and that's at UK HE rates) that could run our DC for years! thankfully we are pulling some back not enough but some. As the infrastructure team it is really annoying being told to spin up VM's in Azure rather than on prem knowing it was costing you £££££s per month for something that could sit in the DC and cost next to nothing.
My experience of HE is that the actual cost of your DC will be a small fraction of the booked cost once the institution has added its overheads. The accounting will make it look like the cloud is cheaper even though the institutional overheads are likely unrelated to either. Universities seem to specialise in a form of obtuse accounting that maximises their expenditure on everything but salaries.
Well, one would hope that the cloud deals that UK HE gets through the four-letter company from Bristol are at least decent... :-)
There are moments where being able to quickly spin something up in Azure or AWS is welcome, especially if you need to test something quickly. But production stuff? Nahhh, the 'on-prem' data centre is still useful, especially if you're plumbed into Janet... ;-)
Can anyone actually do the financials?
Requirements
Front Based App for Insurance
Used globally
Increases 1 TB per year of data - starting point 5 TB
Need redundancy so if site goes down, application is still available from another location.
You keep monthly backups for Accounting closing periods and backups for 5 days rotating
With cloud
You have costs for VM, Storage, movement of data, redundancy, network costs, computing costs, backups etc.
Assume all costs go up 10% per year
With on Prem
Hardware costs (with potentially a 10 year upgrade cycle)
Network infrastructure (routers/switches/etc)
Network costs to the Internet
Storage costs - SAN? NAS? others?
Security software, like cloudstrike (which of course itself could cause an outage)
Another location that is ideally a hot site with the same equipment and data is being updated in both places
Cost of electricity/cooling
Cost of Sys Admins (assume wages go up 3%/yr)
Cost of Network Admins (assume wages go up 3%/yr)
If someone could lay out the costs overall for the base scenario and also for these additional scenarios
1) Ransomware encrypts your data (can happen on cloud or on prem)- how fast will this be replicated across environments in either environment, and how do you recover?
2) Someone accidentally truncates all tables in the production database and you need to recover
3) Cloud provider goes down for 2 days due to attacks
4) On Prem goes down due to attacks (I know a large company took 5 days to recover production and a lot longer for all other servers)
5) On Prem equipment fails - do you have SLA for this? Did you purchases redundancy?
Maybe one more scenario where hybrid MIGHT make sense. Although having your data on the cloud but using the Internet for moving data to and from on-prem might get costly!
This way, maybe we can see some logical arguments on cost vs avaiability,
That answer is very simple: It works for some but not for all.
When we see our customers: Most enter the cloud not out of free will, but out of 100% pragmatism. And they only move that stuff into the cloud they have to move. Some went full cloud, but that is not the norm.
I have done all that. Takes ages. Hardware is not a 10 year upgrade cycle, it varies. I once went into that in detail and if you have a slick operation it can (not always) be cheaper if you upgrade much more frequent based on hardware capability improvements. No one does it except perhaps the cloud companies! At their scale and coherency of process it makes sense to get the latest and greatest when it appears. Anyway, you do all that and some manager says I think we'll wait or they just ignore you because they had a bad experience once. I developed a better way of modelling costs - took a year between the real job. I took it to my boss, who was ok, he burst out laughing because it took account of variance and probability. He said, "you're going to take that to our finance department and tell them the probability of the cost being correct? They want you to tell them to 2 decimal places so when it turns out different they have someone to point at, you'll be more welcome making a number up as long as you state it with great precision".
Lessons learned! Most are not interested in reality, preferring the comfort of their delusions.
While there may be some (a grain or two) of truth to that, the laws they (the BOFH's) follow involve critical thinking, reasoning, well thought out implementations, due care and risk mitigation.
Rather a crusty old BOFH with an ssh session than some fancy devops guy in a suit with his mates Jenkins, Puppet, Ansible and co along for the ride
Bluck
If you're running a vaguely stable workload then you don't need Cloud. You can run all this stuff in your own DC. I priced up a UK Council and on prem was 1/3 the price compared to Azure & that was before MS's price rises and fuckwittery.
HPE will do you green lake, which means you can buy a load of kit, pay a baseline amount and then scale up into the existing hardware if you need it. I"m sure Dell and Lenovo will do something similar.
What really annoyed me was the consultants coming up with "we don't believe it will be cheaper" & "did you take into account power costs" & "you should go into the cloud as your sysadmins will be able to innovate" No!
Not everyone needs to be in the Cloud, not everyone needs to be splitting legacy apps into kubernetes and doing 100000000 software updates a day.
you can get 7 year support contracts. Govt departments can get Crown Hosting, which is insanely cheap DC space. When this AI crap dies it's inevitable death, there will be Datacentre space everywhere going cheap.
I'm riding both horses here.
I think if you have a decent architecture then managing your own tin can offer real savings.
I also think there is a lot of value in building in the cloud, and then migrating to on prem.
The problem as I see it, is it's a lot easier to provide a secure self-service platform on AWS. If you have the staff to do it properly on-prem, you'll be fine on prem or in cloud, but most don't and worse don't know what good looks like.
They get rid of older admins, and slowly all the accumulated knowledge about reliable "systems engineering" disappears as you get software only people who don't understand why power and bandwidth matters.
AWS has a pretty good network, once you've built a decent PAAS on containers and VMs, sure run that on your own VM hosts, backed onto your own SAN.
For most people they don't have the skills to run it properly, by design, their companies refuse to hire and retain skilled staff, and so they pay the cost in cloud-rentals and software people to manage infra.
I will say that both on-prem and 'cloud' (i.e. commercial stuff like AWS/Azure) have their place. However, it requires some planning and careful evaluation to see what works better. Punting close to 2 TB into S3 storage to be processed there by a variety of things makes sense-ish, as does rapid prototyping of things. But production? No, on-prem will do nicely.
My biggest peeve is things like UDP fragmentation protection that Azure doesn't let you switch off (unless you really really really pester them about it and tell them that you're taking your business elsewhere if they don't switch it off), or doing some other stuff under the hood that annoys/upsets some protocols that are still in use across the world. Yes, RADIUS packet fragmentation is a thing, so stop making it difficult for people to run up stuff in their tenancy with network properties *they* need!!
You can get TBs of storage for peanuts and completely separate your intranet from the public internet if you avoid cloud storage and SaaS. Lower costs, lower risks, much more resilience. You can put airgapped internet-facing systems on every desk if you want to and still save money. And if your internet facing systems get hacked, they are an easy, fast replacement with no damage to your core systems and data.
If you are lazy and hand over sacks of cash to cloud/SaaS vendors, you have nobody to blame but yourself if they gut you of cash or you get done with malware.
The primary tech security threat is not China, Russia or malware groups, but your own tech people making a daft, lazy decision on how you organise your technology.
And companies that fail to keep that muscle struggle to get it back. Few companies, large or small, that have gone primarily cloud, could have managed moving back on-prem in less than two years (let alone switching providers, which is hard enough - by design). I'm impressed by what they accomplished.
I suspect that for many infrastructure support groups, a key focus becomes managing the cloud configuration personnel, to help them configure their resources better and with less error. However, those whose primary skillsets are solely in cloud configuration (as opposed to also having more general purpose skills) tend to not understand the systems or underlying reasons why things might be structured to work in a particular way (I don't mean all, I'm just pointing out what many of us have run into over the years). And it is inherently extremely dangerous to let those individuals operate without guardrails, so in general a lot of effort goes to making sure some dumb mistake doesn't wipe out the op-ex budget for next quarter, so bureaucracy and tooling enable a non-virtuous cycle, growing a rising percentage of barely-computer-literate individuals among IT, which causes a rise in guardrails and so on. So for some established companies with a short term mindset who don't want to waste investment on the future, the cloud just makes it easier to kick the can down the road. For others, the cloud is necessary for scalability and they treat their reliance on it with the appropriate long-term cautious mindset, where their use of cloud enables growth and margin that can be invested in people who want to grow their skills and give them the ability to do so. And for others like 37signals, moving on-prem turns out to be a cost-effective way to do all that at their scale.
AWS can be useful if you need to scale up quickly,. Or have something small (although I'd get a DigitalOcean droplet for that). Or you want things geographically dispersed. But all too often, one is renting some pretty fixed amount of infrastructure in one location (and are a big enough business to have somewhere good to pout trheir racks and get good internet connectivity to them), at which point they probably are better off on-prem,.
One thing to throw in here, one could run on site but use Kubernetes or the like, if they have admins used to AWS and applicvations written assuming S3 etc. then they can have an expandable on site pool of compute and storage.
They charge you to stick it in, charge you more to keep it there, charge you to make changes, and charge a boatload to take it back out, if it can even come out, and if something breaks while it's in there the most you can hope for is "Whoops, sorry about that, hope that lost data wasn't too important, by the way your bill is due."
I run my primary server (SMTP, IMAP4, DNS, HTTPS media/SVN/Git/Web/reverse-proxy) on prem via VDSL. All data backed up every night to 7-day, 5-week & 12-month incremental rolling via VPN every night offsite to a server at my secondary site and pay Oracle the outrageous sum of ~£1.90 a month for two additional VMs running DNS resolvers & SMTP MTAs in London & Frankfurt with the added bonus of a private VPN (that isn't blocked by the providers like most subscription VPNs are) to access German- & EU-only services via the 1TB/month allowance.
Of course I'm not exactly in the 200 petabyte range - more 10 terabytes.
As the icon says, "I'll get my...".
More seriously, cloud & other managed IT is great for start-ups and for companies whose primary business isn't IT-related but for large companies whose whole service is based upon large amounts of data and/or compute it's pretty obvious that on-prem - perhaps with some co-lo data centre - is going to be far more economical since you're already going to have staff with the requisite skill set.
I've been waiting for you, Cloudy-Wan. We meet again, at last. The circle is now complete. When I left you I was but the learner. Now, I am the master.
I remember saying years ago that this cloud stuff was so much bunk, and that eventually people would realize that running your company on someone else's equipment would be an extremely expensive mistake.
Icon, because it's the Vaderiest.
I want the money but I can't lie or just say what they want to hear for the money. This is why I'm never listened to. Our shop has been hybrid for a few years and its worked perfectly well. Yet, they decided, due to some knobby consultant, that cloud would be better and save them money. In our weekly meetings I'd point out it wouldn't. I was put down ever time by our MSP "no, it will save you money as you're getting rid of all the old kit blah blah blah". The fuck whits who have no IT knowledge would never listen to me and listened to the MSP instead. It benefits the MSP as it means no more site visits required.
So I started to keep my mouth shut and what happens. One weekly meeting its discussed "I thought you said this would be cheaper. Its costing us more money". I smirked, the MSP finally admitted "Of course its going to cost you more money, its the convince though". I left with a smile on my face but still annoyed I'm always ignored.
Well at least you had a few moments of feeling smug; but as you discovered, it’ll do you no real good, you’ll still be ignored.
From the management’s position, the MSP and their suited and booted reps are paid more than you and so hence ‘must’ know better!
I know I have posted this before but I think it is worth repeating; some years ago I was on an official MS sponsored Azure training course. The trainer, an MS employee, made a somewhat profound statement during the course; ‘cloud (in this case Azure but I suspect it applies globally), is never, ever, ever cheaper than on-prem. You will NOT save money (medium to long term) by going all-in on cloud. Where it is useful is for startups, those who favour op-ex against cap-ex, and for those companies where demand is very variable - see scale up and scale out’.
Which I think does make a lot of sense. Yes there certainly is a place for moving workloads to ‘the cloud’, but not every time.
There's the common wisdom that when something is working, don't mess with it. OTOH, companies feel they always need to be coming out with new products/services/UI's/Etc. If you have your on-prem workflow humming along quite nicely, you can avoid the temptation to "improve" it. Hardware can go "splot" (not the beverage) so some redundancy there is not a bad idea.
I'm not going to claim the cloud is the silver bullet for all IT, but depending on 37signals investment in people, knowledge and hardware on networking; will be interesting to see if their system uptime stays similar to the cloud. There's a lot of gotchas in running your own data centre stuff. Scaling this stuff is hard.
If they have very well executed architecture and patterns for the common stuff (load-balancers, DNS, deploy tools etc), they'll be fine. If not, they'll be some humble pie to be eaten. Especially those here wanking over the data centre.
If you are running at any kind of scale you will need a bunch of well qualified SRE/Sys Admin/Operator types to keep your compute estate pointing in the right direction - wherever it runs. With respect to your last comment, I refer you to a paraphrase of quote attributed to John Rollwagen: "A computer is like an orgasm - it's better when you don't have to fake it.".
I was lucky in that my career was mainly when we led the world.
The Charlie Chaplin effect for IBM and Microsoft set back the IT world decades, but the lack of real management training and the proliferation of a little bit of financial knowledge led to a real dumbing down of company management.
I can remember a company external audit by a well known player where I was asked to define my role in the company.
They expected reams of items and were surprised when I stated I had one role, which was to ensure that my IT provision allowed everyone else in the company to fulfil their roles.
They were taken aback, but a couple of audits later their new auditor made the same general IT provision statement not realising that I was the original source. Had to chuckle there!
Meanwhile, earlier in my career, when I worked for a County Council, we had a brilliant workforce and led the way for putting computing on desks.
The in house teams wrote software that worked for the local CC staff and with minimal effort, sold it on as a commercial product for other Councils.
We developed other PC and hardware solutions and sold them on.
Than a consultant visited whilst I was on holiday and convinced the new management team that we were no good so it was pointless relying on internal staff to deliver world leading software, despite the fact that we also sold a product to the UK police Forces.
The new yes men management agreed and wound down in house provision, using outside contractors and providers with OTS solutions, leading to widespread County staff sickness epidemics due to stress, coincidentally losing ALL the external users who contributed real money to the budget.
Last I heard, whereas a user could contact the development team and point out a bug which would then be fixed immediately, it was taking two years of negotiation with the provider before they would even look at it.
So you need a good management team, not one that relies on adages like "you can't be sacked for buying xxx" etc.
Then you need a good team of workers who actually know what they are doing rather than those "experts" who don't.
My old CC lost all the good workers, but the management went on to bigger and higher wages even though the relative providion was down to about 10% of what we used to provide.