back to article Want a well-paid job in tech? You just need to become a cloud-native god

At KubeCon North America, I did a little exercise I've done before at major technology shows. I went around the booths in the exhibition hall and asked a very simple question: "Are you hiring?" The answer from two-person startups still building up from their personal credit cards to Fortune 500 companies was always the same: Yes …

  1. breakfast Silver badge
    Holmes

    Someone Else's Computer certification

    I'm looking at starting a new project for myself that I'd like to grow to a good size and it has been really interesting trying to figure out whether I should go cloud-first when it comes to hosting it. I've talked to friends who specialise in the area and the general conclusion I have come back to time and again is that simply renting a server to host it is going to be cheaper than using cloud services.

    That is really interesting to me - it has been hard to figure out the right architecture because any search for good practice leads to thousands of content-free articles of marketing dross so finding trustworthy sources has largely been a matter of asking people who know - but it does explain why there is such a push for Cloud everything - it is presumably far more profitable for the companies hosting it.

    From what I can tell, it is useful if you are likely to have very bursty traffic and you want to be able to scale out when a lot of requests come in then scale back afterwards. Running a server capable of handling those bursts is likely to be more expensive, but if you're expecting steady traffic levels then expect higher costs for equivalent cloud services.

    Is this right? I'm still trying to figure out how I can best make this work, but that's what I have gathered so far.

    1. elsergiovolador Silver badge

      Re: Someone Else's Computer certification

      Yes, a single dedicated server could typically serve millions of users without breaking sweat. That provided your services are not doing any stupid things.

      However, investors are unlikely to back such a setup, as they often have preconceived notions. They might ask, "What if a marketing campaign is hugely successful and you suddenly gain 100 million users? How will you scale up?" or "What if your server malfunctions?" or "You'll need to employ a full-time dev ops team."

      The responses to these concerns are straightforward. Firstly, such a user surge is rare, and if it does occur, there's a chance that your service hasn't been tested in the cloud for such scenario and could fail regardless. Simply adding more machines isn't a solution; your entire application architecture must be designed to handle such a scenario from the beginning. Cloud or not, this requirement doesn't change.

      In case of server failure, having a failover system is essential. Setting this up isn't too complex, and cloud systems can fail too, often with less control on your part.

      Regarding the need for a dev ops team, even a small team can be more cost-effective than using the cloud.

      However, convincing investors to support self-hosting is challenging. They prefer having a third party, like a cloud provider, to hold accountable if things go wrong, rather than their direct investment.

      Traffic patterns typically fluctuate, often resembling a sawtooth wave, depending on the service. Therefore, your servers need to be equipped to handle these peak periods.

      1. Code For Broke

        Re: Someone Else's Computer certification

        Thank you ChatGPT for chiming in.

      2. Doctor Syntax Silver badge

        Re: Someone Else's Computer certification

        "However, investors are unlikely to back such a setup"

        But is the OP even looking for external investors?

        1. Anonymous Coward
          Anonymous Coward

          Re: Someone Else's Computer certification

          Yes, unless they plan to pluck money out of thin air or are willing to wait years for the revenue to pick up.

      3. Anonymous Coward
        Anonymous Coward

        Re: Someone Else's Computer certification

        "Firstly, such a user surge is rare"

        Depends on the service and what it does. I look after platforms that are related to sporting events, it can be quite easy to predict surges in certain circumstances...surges rarely come out of nowhere...they usually arrive as the result of marketing campaigns, events linked to your service, special offers etc etc.

        "there's a chance that your service hasn't been tested"

        100% this, there isn't a chance it hasn't been tested, if your entire team is comprised of nothing but software developers there is a 100% chance it hasn't been tested.

        "dev ops team"

        Guys let's keep this family friendly ok.

        You don't need DevOps...ever. In 2023 we still aren't entirely sure what it actually is.

        "However, convincing investors to support self-hosting is challenging. They prefer having a third party, like a cloud provider, to hold accountable if things go wrong, rather than their direct investment".

        Fully agree.

        I would typically argue against self hosting for the most part...at best a hybrid setup is worthwhile sometimes...entirely self hosting is far too expensive these days...unless you use a datacentre in the middle of nowhere like Milton Keynes...even then, it's not really competitive and the admin overhead of sending a guy out there once a week for routine maintenance, hardware replacement etc is rarely worth it...if there is a problem it's far quicker to respond to an issue with cloud infrastructure, than it is to send a guy to the DC.

        1. damienblackburn

          Re: Someone Else's Computer certification

          >entirely self hosting is far too expensive these days

          You had me agreeing up until this point.

          Even factoring in travel time and expenses to go to a remote datacenter, the costs weigh heavily towards self-hosting being cheaper. Whether that's in a dedicated datacenter in your office or doing colo at a nearby site. Colo costs are almost a rounding error to any medium or large business, and you can get a rack or two for almost nothing a month. The cost of actual iron can be amortized over their lifetime, making them an asset which is beneficial for taxes, rather than an expenditure like cloud would be. Trying to argue CapEx vs OpEx is just financial masturbation, it's still money coming out of the same bucket.

          The only time exclusively cloud can be cheaper is if your systems and architecture is built for it. To run as lean as possible and to scale up and down as needed. That can be cheaper, since you're paying for compute time, vs having a hypervisor or even bare metal run with a lot more empty cycles. But as I said, your system has to be built for it. If you cannot handle that kind of dynamic scaling without getting hands-on with the machine, you're at best going to want to do a hybrid setup with the parts that can rapidly scale in the cloud and those that can't on-prem.

          None of this is even remotely questionable. The costs of compute time, storage, bandwidth, and even other ephemeral services like Kubernetes on the cloud are known, documented, and publicly available. Doing a cost analysis is child's play and quickly shows where the benefits lie.

          1. Anonymous Coward
            Anonymous Coward

            Re: Someone Else's Computer certification

            Everything you say makes sense, but you're only approaching "cost" from the perspective of the outlay to get the infrastructure built and maintained...you're not factoring in other, far more expensive factors.

            For example, potential lost earnings due to down time.

            I was called into what is now a client of mine that was entirely self hosted because it was "cheaper". They had a problem with the site that could only be fixed by sending an engineer onsite. The engineers were London based but the datacentre was out in the sticks somewhere near the midlands. It took 3 hours for an engineer to get out there by which time, they had lost 5% of their subscribers...which came to around £500k of lost revenue for that year...at least. It picked back up again the next year, but it's hard to quantify the actual overall loss because it's impossible to tell which subscribers that left that year, came back the next year and which "new" subscribers were actually returning customers. So it's entirely possible the knock on effects of losing that 5% continues to this day...who knows?

            You could argue that it was worth their while to find a local engineer near the datacentre to look after things in the datacentre, but because of where the DC was located, that was nigh on impossible...they tried but they just couldn't find anyone.

            Ultimately, despite Cloud costs (at the time) being slightly higher than self hosted costs, they worked out that overall it would save them more money than the price difference in just resolving problems and cutting back on downtime.

            I was tasked with taking their 4 rack self hosted solution and making it cloud based. It is now spread across a couple of vendors (who you could consider to be the equivalent of additional racks) and their downtime has been zero ever since because if one provider is having problems, you still have 2 other providers that probably aren't. The chances of all three going down at once is negligible...which ever one is struggling the least, you ramp up your resources on, and ramp down elsewhere while it gets sorted out.

            For this particular client, since moving to cloud only, their customer retention has skyrocketed. They don't have anywhere near as many people unsubscribing now as they used to when the self hosted solution was in place...simply because the customer experience is better...it's incredibly rare for them to have to send out an email about "scheduled maintenance" etc these days because they don't have to take the whole solution down to perform maintenance on it. It's a much smoother experience all round.

            They also don't have to wait for "out of hours" maintenance windows to make changes / updates to the system...they can do it piecemeal, test it and if all is well...roll it out to the whole solution. If the test results come back negative, they can scrap the knackered box, spin up a pre-update box from a template and no harm done.

            Yes, you can do this sort of thing in your own racks, but there are certain things you just never have to deal with when using the cloud...disk failures, power supply failures, fucking holes in the roof that pigeons can get in then shit on your racks, get feathers in your fans, the roof falling in after a storm etc etc....managing racks in a datacentre is a pain in the ass...it's fucking cold (or fucking boiling), uncomfortable, takes ages to get there...it just sucks...and if your client is too cheap to use a datacentre and insists on having the racks at "the office" then you tend to find yourself in rat infested basements, you get shitty cooling, possible damp...all sorts of potential problems because of cutting costs.

            "Colo costs are almost a rounding error to any medium or large business"

            That largely depends on the business and how it models its forecasts and which growth stage it is in. Colo and datacentres come with long term contracts...if you're in a growth phase, then COLO or a datacentre can be a really poor choice because you might outgrow the contract too quickly...the last thing you want is a datacentre with you over a barrel, because they know it's a pain in the ass for you to move elsewhere...with the cloud, there are no long term contracts and you can easily move at anytime...

            Cloud is expensive short term, but in the long run saves you money.

            Self hosted is dirt cheap short term, but in the long run fucks you.

            1. damienblackburn

              Re: Someone Else's Computer certification

              >For example, potential lost earnings due to down time.

              That is a problem on both cloud and on-prem hosting. Cloud providers routinely shit the bed. Not even a few months ago Azure lost connectivity to South America. That's not an 'oopsie', that's catastrophic. We've seen multiple instances of S3 going down faster than a crackhead at the end of the night.

              >snip a lot of whinging about an on-prem client having downtime

              First, if 5% of your customerbase leaving due to a short downtime like that costs you 500k GBP a year, you've already fucked up your architecture extremely bad. That tells me they didn't have redundancy in place, or at the very least didn't test it to ensure that failover works. It doesn't matter if you are hosting on-prem or in the cloud, if you don't look at every layer of your architecture and see where SPoFs are and how to mitigate, and then *test* that mitigation, to quote Nigel Ng: "you fucked up".

              A lot of this sounds like they needed redesigning which you did and then built a system that actually has HA built into it, something they either A) didn't have or B) didn't test.

              ED: I didn't even realize, why did the DC not even have hands and eyes available? Yeah, they charge a lot. But they're still available. And if it's hardware, you should have service contracts from a vendor who can get out there quickly.

              >Yes, you can do this sort of thing in your own racks, but there are certain things you just never have to deal with when using the cloud...disk failures, power supply failures, fucking holes in the roof that pigeons can get in then shit on your racks, get feathers in your fans, the roof falling in after a storm etc

              You absolutely DO have to deal with that shit in the cloud, you're just third or fourth party to it. Even though it's "the cloud" it still is a virtual machine that exists on a hypervisor running on a real piece of equipment somewhere. Ephemeral doesn't mean it's all existing in the land of makebelive. Cloud provider DCs can and do suffer failures, whether those are manmade or natural. Again, you build HA into the system from the get-go, it doesn't matter if a single drive or PSU goes or a whole rack goes.

              >That largely depends on the business and how it models its forecasts and which growth stage it is in.

              If you're suffering from rampant growth, it's because you didn't plan shit out. That's the whole point of the SDLF. You should know before you go to prod what you need to achieve system stability and allow for growth. Your SIT/preprod should entirely be a clone of production and subject to all the full load testing you can throw at it, simulating real world consistent and peak load, and your resources adjust to meet that demand. You have rack space to grow as needed as well. Make sure you have enough CPU and memory on your hypervisors to spin up additional VMs as required. Make sure you have enough JBOD or SAN storage for databases and more. By the time you hit 50% load you should know how long before you get pegged out and can grow without everything being on fire.

              1. Anonymous Coward
                Anonymous Coward

                Re: Someone Else's Computer certification

                "That tells me they didn't have redundancy in place"

                Exactly, the entire DC lost power and when it came back, loads of kit was fucked, quite a lot of it not in our racks and part of the DC infrastructure...old crappy switches that hadn't been tested or replaced for years etc etc.

                We had redundancy in our kit, but the DC itself (across multiple racks) was a single point of failure...replicating the whole setup to second DC wasn't feasible.

                When you start exploring multiple DCs for added redundancy, the costs go well over the cost of the cloud. Multi-region cloud costs peanuts compared to a multi-site self hosted solution...and with cloud it's only 1 contract.

                "5% of your customerbase leaving due to a short downtime like that costs you 500k GBP a year"

                It wasn't short, it was nearly a week and mostly out of our control...we could have flipped to cloud quite easily, but the DC was incredibly vague on it's status updates so we weren't armed with the right information to make that decision..."tomorrow, tomorrow, tomorrow".

                The fact of the matter is, if you are cloud first, it's really easy to spin up on a different region...or even a different provider if you need to. With a DC, you can't just rock up with a few Luton vans and shift all the kit on a whim...it's just not feasible or practical.

                Unless you are massive and absolutely made of money, self hosting makes no sense.

                Also, £500k isn't actually that bad in the grand scheme of things, it could have been a lot worse...£500k to this client is approximately 1,000 subscribers (more or less £50 a head a year)...they had around 20,000 subscribers total (at the time, it's much larger now)....so it wasn't a massive a userbase at the time...it is quite a data heavy site though so more expensive to run than most sites per head.

                The major headache is that the site relies on sports statistics and users making decisions based upon those statistics week by week...so a weeks worth of downtime ruins the platform for some people, because losing a week means you have to catch up a week with a lot of manual fixes to the data, so by the time you complete all that, the user has 1 day or less to make changes to their setup in time for the next set of sports fixtures...so one week of downtime for you translates into a few weeks of degraded experience for the user before things settle down again.

                "If you're suffering from rampant growth"

                I wasn't alluding to to constant unpredictable growth, I was alluding to being stuck in a 10 year contract and having years with no growth as well as large amounts of growth...sometimes you have to scale down as well as up...outside of Silicon Valley it's not just grow, grow, grow...sometimes it's shrink, shrink, shrink!

                Even if you do plan for massive growth, your 10 year contract is still a fixed number of racks, power and bandwidth. It's often very difficult to renegotiate a datacentre contract halfway through, because they have you over a barrel...I defy anyone to be able to go to a DC and say "Mate, can you do us one more rack at the same rate and just tack it on the existing contract, ta!" and actually get it...you will always end up with a worse rate, and even worse than that, you'll seldom get a rack in the same row, unless you're lucky and the DC you're in has an empty rack right next to you, which brings it's own massive headaches because now you've got a link between racks that isn't 100% managed by you and goes through kit you have no clue about.

                As an engineer do I love the concept of self hosting and having all my own kit to mess around with...hell yeah...does that matter from a business perspective and is it practical? Hell no.

                Heck, as a freelancer, I self host a lot of my own stuff...but none of it is customer facing...I self host my git repos, wiki, storage etc etc and I have infrastructure for modelling other peoples networks etc on a Proxmox cluster...but for client facing stuff, I use the cloud, across several providers and several regions...because I'm not fucking mental...I need to spend my time doing business and sorting other peoples problems out, I don't get paid to maintain my own kit...any time spend looking after my own stuff is time I could have billed someone else for.

                Similarly, my clients don't pay me to build the "sickest racks" known to man, they pay me to build something stable, predictable and scaleable for a reasonable price.

        2. Fred Daggy
          Unhappy

          Re: Someone Else's Computer certification

          Agree ... and disagree.

          Surges rarely come out of nowhere, but short sighted thinking and a lack of communication by management often throw IT under a bus. Imagine this conversation "Shit, numbers are bad this week." "ah, do we launch a sales promotion, 30% of products, x, y and z" "Great thinking".

          Communication goes out ... to the market, but not to IT. Sales go bananas ... or would have, except no capacity exists on the infrastructure that works well 99% of the time. Or could have been provisioned with just a little notice, but this communication hits after working hours.

          Also, no money for training or modernising a monolithic app. Dev team wants to modernise, Infrastructure wants to modernise, but bean counters win the argument "No money for training, no money for outside consultancy and we have to amortise this investment".

          Cloud native could have saved the day here. But on-prem works well 99% of the time. Its up to the management to work out what they want and the workers to make it happen.

    2. Code For Broke

      Re: Someone Else's Computer certification

      @breakfast: You nailed it: Fear. Uncertainty. Doubt; Fear. Of. Missing. Out.

      All the fear leads to FUBAR, which I won't break out here.

    3. Lyndon Hills 1

      Re: Someone Else's Computer certification

      You might want to read this Troy Hunt of Have I been pawned. This is an article about bursty traffic and scaiing

      1. breakfast Silver badge

        Re: Someone Else's Computer certification

        This is really interesting, thank you!

        1. Dickie Mosfet

          Re: Someone Else's Computer certification

          Here's the argument for sticking with monolithic architecture (with extra humour and sarcasm)...

          "Death by a thousand microservices: The software industry is learning once again that complexity kills"

          https://renegadeotter.com/2023/09/10/death-by-a-thousand-microservices.html

          [Not sure how I'm supposed to hyperlink if I can't use HTML]

    4. Steve Button

      Re: Someone Else's Computer certification

      The answer is... it depends.

      Is your project going to be truly "steady traffic" 24x7, or does it have steady traffic only during the day? If you design for the cloud you could use much smaller servers and scale out as needed. Migrate to a bigger server instance size? Piece of piss.

      Also, what happens if your power goes out for several hours? Are you going to keep a UPS?

      Have you factored in the cost of electricity for this server?

      When you say "simply renting a server" do you mean at a colo, or in your own premises?

      Even with a colo, you are risking a long outage if they have a flood / fire / something you haven't thought of. So, you're gonna need two which are geographically apart.

      If your project is down for a week, are all your customers going to disappear and go somewhere else? Or will they not care?

      I, personally would not dream of hosting something myself unless it was very small, and I don't care about down time or very big, and not at all bursty. I actually can't think of a small project that I would host myself, but I guess there might be some.

      Cloud just takes away so many headaches. They have top class security (Not Azure though). Three AZs if you want it. Backups are super easy. Power supply is reliable. Database administration is super easy. As much bandwidth as you can handle. You can go "serverless" with FaaS, at least for some jobs / workloads, which means you can scale to zero.

      OTOH, it can be a confusing nightmare and they seem to release dozens of new services every month. IAM can be particularly mind blowing. If you want to even just list the tags on an EC2 snapshot with Python, you'll end up digging into some horrid JSON queries. Oh, and cost CAN get way out of control if you don't keep on top of thing. It's just too easy to spin up a new xyz and forget to shut it down. But there are LOTS of things you can do to keep cost down.

      1. breakfast Silver badge

        Re: Someone Else's Computer certification

        Most cloud services go down a few times a year, I don't think this would be significantly better or worse, but it's really helpful to have this point of view put forward. Pretty glad I asked this question here, honestly, because there's a lot of real-world experience on these forums.

        Hopefully however I host things, if I have my architecture well thought-through and everything containerised I should be able to change my mind if I need to, but both this and the article linked above make me think maybe it's possible to use the cloud in a way that isn't just throwing away money.

        1. jake Silver badge

          Re: Someone Else's Computer certification

          "Most cloud services go down a few times a year,"

          The systems I install go down maybe once per decade, and that's usually a fire, or direct hit by lightning, or malicious employee kind of event.

          Can you imagine what the reaction in Corporate America would have been if DEC or Burroughs or Sperry or IBM had made just one release that went down "a few times per year"? The company's stock would have tanked, they would never have been trusted again, heads would have rolled ... ugly wouldn't even begin to describe it. And yet, in today's "Clouds" Service Bureaus it's not only normal, it's touted as a benefit ... "ONLY GOES DOWN A FEW TIMES PER YEAR!!!!! WOW!"

          I blame Microsoft for making inherently broken computers and computing the norm in corporate culture.

          1. ecofeco Silver badge

            Re: Someone Else's Computer certification

            I blame Microsoft for making inherently broken computers and computing the norm in corporate culture.

            Where it properly belongs along with the cockwomble stan MS fellators who prepetuate it.

        2. Roo
          Windows

          Re: Someone Else's Computer certification

          I think the biggest win w.r.t to (a big) Cloud provider is the ability to serve content globally - and provide some redundancy in the case of an entire region going down. That said there is nothing stopping you from developing your system to be capable of being run in the cloud or on your own host(s), it's not rocket science (IMO).

      2. Anonymous Coward
        Anonymous Coward

        Re: Someone Else's Computer certification

        It's probably better to think of "self hosting" from the perspective of your setup having a "frontend" and a "backend".

        Typically, you wouldn't bother self hosting the "frontend" of your infrastructure, but some elements of your "backend" might be self hosted, even if they aren't publically accessible.

        A lot of businesses will have their public facing infrastructure on the cloud, but have a self hosted "light weight" replica in house. For two reasons. Firstly, it makes development a lot easier if you have a couple of clones of your live site. One for staging and one as "semi-production"...secondly, if you have a "sane" in house clone that you have to maintain, it makes it easier to understand your setup and redeploy it somewhere else if you need to change provider or something.

        Having a self hosted setup isn't always about serving the visitors to your website. Sometimes it can just be there to keep your development team honest.

      3. damienblackburn

        Re: Someone Else's Computer certification

        >Is your project going to be truly "steady traffic" 24x7, or does it have steady traffic only during the day? If you design for the cloud you could use much smaller servers and scale out as needed. Migrate to a bigger server instance size? Piece of piss.

        And this is where cloud shines, if you'll pardon the pun. The *only* area it does.

        >Also, what happens if your power goes out for several hours? Are you going to keep a UPS?

        Depending upon colo, they'll have redundant power at the rack, the floor, or nothing. You can get a rackmount UPS, they're cheap. They may also have onsite batteries that take over before generators do. This is pretty common tech.

        >Have you factored in the cost of electricity for this server?

        I've not seen a ton of colos charge for power, they charge by the rack and then charge for bandwidth over a certain amount.

        >Even with a colo, you are risking a long outage if they have a flood / fire / something you haven't thought of. So, you're gonna need two which are geographically apart.

        You're risking the same thing with cloud. You need to put your application into different availability zones, georedundant, of course minding any laws applicable. Your application also needs to be able to handle that. Doing that without the cloud is also stupid simple, you can use a forwarding service like Akamai or Cloudflare, or even just DNS round robining. And you better hope that the regional outage, like S3 going down for all of South America, doesn't affect your multiple AZs.

        >Cloud just takes away so many headaches. They have top class security (Not Azure though).

        lmfao. Security in cloud is what you build into it. Also you have zero visibility into their internal layers, so you have no idea if it's paravirtualization or true virtualization, which can provide mitigating factors if someone bypasses the hypervisor. Plus the other parts like S3/Blobs/etc.

        >Backups are super easy.

        So are they with on-prem or even shared hosting.

        >Power supply is reliable.

        Ditto.

        >Database administration is super easy.

        Ah yes, because you can get a database on an engine that you have no direct control over.

        >As much bandwidth as you can handle.

        That you pay for. Same as with self-hosted.

        >OTOH, it can be a confusing nightmare and they seem to release dozens of new services every month.

        That doesn't scream reliability, that screams "sh!t changes all the time and you'll be scrambling to find the bug^H^H^H undocumented feature".

    5. Anonymous Coward
      Anonymous Coward

      Stackoverflow

      The history of IT is full of startups who did not build scalability in from day one and have had to pay the price later for their lack of foresight.

      Stackoverflow started as a .Net hack by people who came from the MS sphere. Now, they have maintenance windows every now and then and the servers are off and their infra is a significant constraint on their business model.

      Facebook started in PHP and had to migrate to faster technologies.

      You can still have a single server (as in "single machine", not as in "monolithic app") AND be cloud native (as in "container based micro-services"). Nothing prevents you from running in a small K3s env on some rented HW. When the day comes that your app enjoys a significant traffic, you're ready to scale up. You can enjoy the benefits of ISSU (In service system updates) and multi region deployments. You can deploy on EKS, AKS or anything similar. You don't have to refactor from the ground up.

      1. breakfast Silver badge

        Re: Stackoverflow

        This is definitely something I'm trying to design for from day one, it seems that a well-separated and containerised application can be moved around as needed quite easily.

    6. Anonymous Coward
      Anonymous Coward

      Re: Someone Else's Computer certification

      It really depends who you speak to and there are a lot of factors at play depending on the type of site / service you're planning to run.

      I personally prefer the manual scaling approach because it allows finer control over costs and despite having to pay for a person to manually scale things you will save money in the long run.

      I have several clients with large sites that I manually scale. Sometimes I have to be up and about a 1am - 2am to scale things back down, and sometimes I have to be up at 6am to scale things up...but using previous information captured we are able to predict with a decent degree of accuracy what we need to scale to and from.

      One of my clients I snatched off a firm that insisted on auto scaling everything, they were so adamant that autoscaling was the way, that they went completely off the fucking wall when I was asked to step in to try and reduce costs...they called me all kinds of names...essentially calling me an old fashioned dinosaur (but not as polite). I've had this client on and off for about a decade, but I've recently had them back for about 18 months, and in that time, I've reduced their AWS bill by around 75%. Which comes to approximately £90,000 a year. I cost them around £12,000 a year. It doesn't take Stephen Hawking to calculate that even though they are paying me double what the previous guys cost, who left the scaling to happen automatically, they are still saving a whole bucket load of cash.

      The key thing though is proper testing...see not all of that saving was due to timing the scaling more effectively, I also trimmed a decent amount of fat by properly testing the site, finding the heaviest SQL queries and optimising them. I also tested out various tiers of EC2 instances and RDS instances to work out where the best bang for the buck was when choosing tiers to scale to. The previous guys were going straight to m5.12xlarge on all the RDS instances each weekend for 48 hours because they were lazy as fuck...with some testing, I was able to work out that the performance on the 12x was actually worse than the 8x, 8x is roughly half the price of 12x...not sure why, but I suspect the 12x instances run on different CPUs or something...I was also able to spot that the bottleneck on the database was storage...the original guys had only allocated a small amount of storage on bog standard GP2 storage, which meant no matter how much RAM or CPU you threw at the database, you would still be limited by IOPS. So I optimised the storage as well to bump up the IOPS (which is remarkably cheap compared to just throwing more RAM and CPU at the problem)...this allowed me to drop the RDS tiers another notch, which essentially halves the price again.

      After all this optimisation, the scripts the previous guys wrote to "spin up more EC2 instances when needed" rarely kick in, because the CPU usage of the EC2 instances remains below the threshold that was set (60%) before additional instances are spun up...so we no longer stray into having tons of extra EC2 "app server" instances running. We end up with one extra box, at most.

      Circling back to the "it depends" comment...never let your software developers determine the infrastructure that their software runs on. They will always throw RAM and CPU at the problem, they will rarely blame their own code or lack of optimisation. Always find an infrastructure specialist that also has some reasonable software development skills so that they can overlap with the dev team and understand the software...yeah I know, we are rare...but we are out there...yes, we cost quite a bit of money...but that cost is often considerably less than the money you will save and often we will charge you based on what we think we can save you. The kind of clients I look for are spending over £100k a year on AWS, turnover at least twice that and I look to save about 50% in costs, of which I take 10-20%. I aim to eventually pay for myself out of your savings, so that I essentially cost nothing.

      At worst, someone like me will result in costing exactly the same as the "shitty guys" but the user experience on your website will be far better and result in less complains, downtime and problems and at best will slash up to 80% of your costs. At worst, it will take 6 months for you to start seeing the savings, at best you will get savings immediately simply by moving to a manual scaling model.

      Automatic scaling can work well, but you will be paying through the nose for it up front, and you won't know if the resulting hosting bill is optimal or not. A lot of the time, you can end up with an autoscaling setup that is still profitable, so you might end up with no reason to optimise it, on the face of it, but a good infrastructure guy will still find you savings and make your business even more profitable...they are worth paying for.

      Not only that, but if you are looking for further investment to take the business forward, an infrastructure guy can help you properly describe the functionality of the site, the layout of the infrastructure and the usage patterns of the site to help you make forecasts to take to potential investors...a software developer will never be able to do that.

      Modern Software Developers come off the University production line, they are ten a penny. Infrastructure engineers like me typically don't...they are the guys that build stuff on the weekend for fun and have done for years because no matter how much you spend on going to University, you're never going to get your hands on a wide range of kit to experiment with...the only way you can build a decent knowledge in infrastructure, is to spend money buying second hand kit off eBay, messing with it, building actual infrastructure with it, then flipping it back when you're done.

      I think most of us in a certain age / skill bracket, probably got into the industry this way.

      This all said, not all Uni grads are fucking hopeless, they just don't know any better...I've taken quite a few "fresh from the factory" folks and turned them into machines...but you have to catch them early...because once they've been out there for a while, playing the same fucking dead song, it becomes impossible to tune them and they end up at "development agencies" churning out badly optimised, expensive bollocks.

      Digital Agencies out there, if you want to keep your clients beyond the last milestone of your development "roadmap", you need infrastructure guys...the very guys waiting to pounce and steal your clients as soon as you launch the product...you are incredibly vulnerable because CEOs talk to each other and people like me talk to CEOs...so it doesn't take long to sniff out badly run projects then swoop in for the kill...arguing that your large team is better than a one or two man band is easily drowned out by saving £100k-£200k a year and you can't compare a team of 10-20 juniors with a two man crew of highly experienced engineers. We will stomp you in to the mud every time.

      1. martinusher Silver badge

        Re: Someone Else's Computer certification

        University degrees are not vocational qualifications. We were told this when I started my engineering degree -- the degree gives you the tools and background to learn and you learn by working hands on for a few years.

        So as far as graduates go, there's a big difference between "I aced my coursework" and "I got by -- just -- because I was messing with stuff all the time".

        1. Anonymous Coward
          Anonymous Coward

          Re: Someone Else's Computer certification

          "So as far as graduates go..."

          I dunno man, I recently helped my nephew with his comp sci coursework...he did manage to pass the course with a first...but my god was the course content basic...it's come a long way since I considered uni (I ultimately didn't go because at the time the internet was relatively new and the course material on most comp sci courses was insanely out of date, I essentially worked out that that by the time I'd finished the course and the dotcom boom was over, I'd actually be over decade behind everyone else, therefore I took the decision to go straight into work and grind it out for a few years, which paid off massively).

          Going back to modern Comp Sci courses...they seem to be very broad but not very deep and they appear to rely on students taking the time to mess around with technology to fill in the gaps, which a lot of students are not doing.

          The coursework for my nephew was essentially a technical pre-sales document. They gave him a spec and set of requirements and he had to go off, price it up, diagram it and prototype it. Boiling it all down, it was a centred around the requirements for a 10 person business with a single server....so an absolutely bog standard Windows Server small business domain setup. One server, 10 PCs, a router, a switch, a WAP, a printer etc etc...it seems ridiculously basic to me because it set me off thinking about the Microsoft certs I did in my earlier years and "doing stuff" for Bluesky Airlines and Contoso involving massive networks with multiple sites etc etc...

          I would go as far as to say that anyone that managed to get the full Windows 2000 MCSE or the 2003 MCSA essentially studied more technical concepts and considered scenarios more complicated than a typical present day Comp Sci graduate...and back in the day, you could do an MCSE in a week or two at a boot camp.

          The one thing that none of the students leaving comp sci courses have is confidence...which after a few years, you fucking should have...I went into my Microsoft exams cocky as fuck, I did a couple of them rat arsed (due to scheduling problems, I was told my exam was 3 hours earlier than it actually was, so me and another guy I met on the course just went and played pool and got pissed in the intervening 3 hours)...both of us destroyed the exams...we still work as a partnership to this day, 20 years later and as a partnership we've never had a project that has caused us to break out in cold sweats...we've had some tough ones, but nothing that would ever cause us to lose sleep etc...that is all because we are both the kind of people that use resources to buy whatever kit we can to set it up and test it to destruction...we've recently equipped ourselves with a small fleet of Cisco Meraki kit and we've been hacking it let right and centre to get a full understanding of it...and crucially...figure out how to continue using it when it goes EOL or a license expires...there is currently quite a lot of demand for that as a cost cutting measure right now. :P

          Anyway, my nephew and others on his course have been led to believe they are ready for management positions now they have their degrees...so they've all been applying for management roles and naturally, none of them are getting anywhere with that...so they are stuck...they believe that they are ready to manage teams of engineers and that hell desk work is beneath them...yet none of them have the first clue about how to handle projects or deal with customers.

          On the one hand, I'm pleased that more people are deciding to go into tech via university, but on the other hand I feel sorry for people that go through that system, because they're going to have tough careers and a lot of sharp wake up calls before they are comfortable in their profession.

    7. jake Silver badge

      Re: Someone Else's Computer certification

      "it has been hard to figure out the right architecture because any search for good practice leads to thousands of content-free articles of marketing dross"

      Most of which seem to have been written by an AI. They are usually full of misinformation, outright lies, and in some form of English that doesn't parse for shit.

    8. Displacement Activity

      Re: Someone Else's Computer certification

      @Breakfast - good question. It's next to impossible to find useful information, and these comments are better than anything else I've found online. I had a simpler problem recently. I've been renting (one) server for years, for personal stuff. I now have an app that I need to roll out to a few hundred customers, who have very predictable, and low, CPU and traffic requirements, and who don't need particularly high availability. How do I handle this? Do I stick with renting servers, and put everyone on their own VM? Or do I roll out a cheap VPS for everybody? The only fairly obvious thing was that any solution with the word "cloud" in it wasn't appropriate and would work out way too expensive. I'm trying the VPS route, which seems to be working.

      If you find out anything useful, you should post it somewhere. There's obviously a lot of interest.

  2. abend0c4 Silver badge

    So what do you need for these jobs?

    Far too much.

    In the beginning, developers wrote programs and operators changed tapes and emptied the printer when they weren't on the phone to the service engineer. Then along came minicomputers with the implication that you could do away with operators and the programmers could mind the machines and apply the occasional system update: never quite worked in production environments.

    Cloud operations are vastly - and, often, gratuitously - more complex. It's no longer enough to know how one vendor does things, but several. And the domain-specific languages that are used for "infrastructure as code" are just glaring examples of the old xkcd truism.

    And none of this stuff stays still - look at best practice two years ago and a lot of it has since changed, along with a growing number of deployment environments offering different characteristics and all, of course, distinctly managed. If you were really trying to stay on top of this stuff, what else would you get done?

    And security is a real nightmare and because it's completely opaque to the hosting environment it's effectively off by default so it only takes one microservice to fail to validate an access token and you've got a potential incident.

    And a lot of outfits think that "devops" once more means it's something that programmers can manage in their spare time.

    There aren't that many people in the world who are sufficiently close to the coalface that they're completely au fait with the current status of the tools - and they're probably too busy developing them to do much else.

    In short, it's all far too complex for the average business to manage with the staff they have or are likely to be able to recruit or afford. There will be places that need to tinker with the plumbing to get the best result, but for this technology to be usable in the majority of situations, it has to Just Work out of the box.

    1. Steve Button

      Re: So what do you need for these jobs?

      Isn't this just progress?

      150 years ago, your local blacksmith and carpenter (might be the same person) could put together a working vehicle. You can still have that now, if you want. But you'll need a horse, and it'll take you a long time to get to Manchester instead of just jumping in the car and going up the M6.

      OTOH, if you want a modern car it's going to take a team of hundreds (or thousands) to design and build it. And it's fucking complex.

      But we choose the car (or train or plane) over the horse and cart because it's so much more efficient.

      Isn't that the same thing with DevOps and Cloud Native, compared to building a server by hand?

      As an aside, there aren't THAT many Infrastructure as Code systems. Pretty much everyone uses Terraform, and you've got AWS CloudFormation, Azure ARM and GCP Deployment Manager. I don't count Puppet/Chef/Ansible as they are really primarily Config Management.

      Having said that, my job as a DevOps Engineer (or SRE or Platform Engineer) is SO much harder than it was just 10 years ago, as a humble Systems Administrator. (which was already quite hard at times, but in a different way). I no longer have to worry much about inodes and mapping out filesystems and disk blocks, or fsck, or VxFS drivers but I do have to build CI/CD systems and mangle YAML / JSON in weird esoteric ways.

      1. Throatwarbler Mangrove Silver badge
        Meh

        Re: So what do you need for these jobs?

        I believe this is what is now called a full-stack engineer position. Companies, especially smaller ones, have always looks for jacks of all trades. Back when I was in IT, that meant that I knew how to manage and troubleshoot storage, networking, server operating systems (*nix AND Windows, thank you very much), desktop systems, miscellaneous server applications, etc., while occasionally backstopping the help desk and helping developers figure out how to do things like check in their source code to a source repo instead of saving it to their local hard drive. These days, the specific skills have changed a bit, but the need for generalists is still alive and well.

        For my part, I'm glad I got out. Doing "cloud stuff" seems boring to me, but that might be because I always found programming boring and frustrating.

      2. abend0c4 Silver badge

        Re: So what do you need for these jobs?

        Isn't this just progress?

        I think the point is that it needs to progress further - it's not really yet in a suitable state for widespread deployment and the effort required to make it so would in principle be a good use of resources. Unfortunately we seem to have lost the intermediate step between invention and delivery where the product is made suitable for ordinary mortals.

        I'd draw an analogy with the vehicle industry. In the last couple of years a spate of people I know have encountered real difficulties in getting problems with cars and vans fixed because of firmware discrepancies between spare parts and originals or because new parts have to coded to the control system or because the complexity of sensor systems makes faults almost undiagnosable. Main dealers would presumably have the specific knowledge and equipment, but smaller local garages are really struggling to deal with all the different makes and models that they encounter. This isn't really progress either - it's driving up the cost of both repairs and insurance and there does not seem to be a compensating benefit. Indeed if it's making vehicles impossible to repair at an affordable cost, it's self-defeating.

        Progress has to be deliverable which means it has to be incremental and built on existing knowledge or, where it is transformative, it has to be be reasonably accessible without significant existing knowledge. The point about automation is that it reduces either the number of skilled staff required, or allows them to be replaced with lower-skilled staff. Automation that requires an army of experts is not progress.

        1. ecofeco Silver badge

          Re: So what do you need for these jobs?

          "Automation that requires an army of experts is not progress."

          Exactly. It's not even really automation, is it? But words mean nothing these days. It's all Humpty Dumpty:

          "When I use a word," Humpty Dumpty said, in rather a scornful tone, "it means just what I choose it to mean—neither more nor less."

      3. 43300

        Re: So what do you need for these jobs?

        "Isn't this just progress?"

        Is it?

        What useful things does it allow which weren't possible before? Cloudy stuff (especailly SaaS) is frequently LESS reliable than a well-designed and maintained on-prem setup, because they are constantly pissing around with it and breaking thing, adding 'new features' which nobody really wants or asked for. On-prem setups get changed far less often.

        Is all this 'progress' actually adding anything to society anyway? It can certainly be argued that the internet (and even more so smartphones and social media) have huge negatives. The balance seems to be tipping more and more towards 'innovation' for the sake of it, ignoring any downsides.

      4. jake Silver badge

        Re: So what do you need for these jobs?

        "But we choose the car (or train or plane) over the horse and cart because it's so much more efficient."

        Efficiency is as efficiency does.

        Instead of one of the trucks, I took the buckboard to the feed store this morning. The truck runs on diesel and pumps 'orrible carbons into the atmosphere. At today's prices, the truck would have cost about ten bucks for the round trip. The horse runs on grass and water and pumps out fertilizer. Total cost for this morning's excursion? Zero dollars.

      5. Anonymous Coward
        Anonymous Coward

        Re: So what do you need for these jobs?

        "OTOH, if you want a modern car it's going to take a team of hundreds (or thousands) to design and build it. And it's fucking complex".

        Sure, if you want to build things at scale. If you wanted to build a one off, you could still do it in your garage using off the shelf components and scrap parts.

  3. frankyunderwood123

    Huge experience required...

    The cloud may be the marketing buzz-word of the moment and infrastructure as code may be the developer buzz-word, but the bottom line is you need some serious chops to be managing cloud infrastructure at a high level.

    The "cloud" and how services are deployed now is more aligned with automation of the grunt work - automation has taken out many jobs in the IT sector.

    However, the chops required to understand the interaction of services, to understand how to handle load, how to monitor and how to secure infrastructure is much the same.

    The level of knowledge required to excel and command a high salary is ridiculously steep - there's incredible depth required from an experience point of view.

    Anyone with a little bit of tinkering nous can deploy services to the cloud - the barriers to that have dropped considerably - but without years of experience, they will never cut it.

    You still need all the levels of knowledge that were around before "cloud" and before automation of grunt-work.

    Probably more these days.

    There's a reason competent devops engineers are in demand - it's a ridiculously difficult job!

    1. This post has been deleted by its author

    2. Anonymous Coward
      Anonymous Coward

      Re: Huge experience required...

      "it's a ridiculously difficult job"

      I would argue that DevOps, along with a lot of tech jobs, are actually pretty easy to learn, but extremely hard to master.

  4. Gene Cash Silver badge
    Pint

    "bursty traffic"

    Remember the days when we used to call that "getting Slashdotted"

    Ah, the good 'ol days. Of course that's when a "lot of comments" on an article was over a thousand, instead of over a dozen.

    1. chivo243 Silver badge
      Trollface

      Re: "bursty traffic"

      Ah, yes the Slashdot effect, when a story links to a resource not equipped to handle incoming tsunami of clicks. Kind of a poor man’s DoS…

  5. Snowy Silver badge
    Megaphone

    Want a highly pay job

    Do not be in Tech be in management!!

  6. Anonymous Coward
    Anonymous Coward

    Want a well-paid job in tech?

    These days the best approach is to create app updates ... updates with bugs because then you can updated the app to remove the bugs and replace them with new bugs that will keep you employed. Everything is updated all the time these days - create apps with all of today's languages so that you support other workers too. Nobody is getting well paid for FORTRAN because it hasn't been updated for 4 years now. COBOL is OK, it was updated only a year ago.

    1. jake Silver badge

      Re: Want a well-paid job in tech?

      "Nobody is getting well paid for FORTRAN because it hasn't been updated for 4 years now."

      I am. Very well paid, in fact. There are more lines of Fortran out there doing useful work in the corporate and scientific worlds than the average kid who never used a dial telephone can comprehend.

      "COBOL is OK, it was updated only a year ago."

      Competent COBOL coders can pretty much write their own paycheck. It is ubiquitous when it comes to moving around the world's money supply.

  7. jake Silver badge

    During the meanwhile ...

    ... I've been making pretty good money pulling companies back out of the cloud for over a dozen years now. There is no one-size-fits-all model of computing. Never has been, never will be. Clouds aren't for everybody.

    In fact, I'd go as far as to say that "the cloud" is a marketing meme that is long past its sell-by date. Just call it what it is: Centralized computing, where the user has absolutely zero control over the infrastructure that he is trying to run his business on.

    Do railroads not usually own their rolling stock, rails, and rights of way? (We never went the nationalization route here in the US.)

    1. Giles C Silver badge

      Re: During the meanwhile ...

      Not in the UK

      The rails and infrastructure are owned by one company currently Network Rail.

      Then most of the train operating companies lease there rolling stock from other investment group.

      1. J.G.Harston Silver badge

        Re: During the meanwhile ...

        Yes, the UK model is "public highway". The running route (road, rail) is owned and maintained by the state, anybody can have access to it by paying the required access fee (vehicle excise duty, rail access charge). Those people accessing the running route can be anybody (privately-owned National Express, publically owned LNER, privately owned Virgin, publically owned Ipswich Buses, privately owned Fred Bloggs in his car, etc.)

        1. 43300

          Re: During the meanwhile ...

          "Those people accessing the running route can be anybody (privately-owned National Express, publically owned LNER, privately owned Virgin, publically owned Ipswich Buses, privately owned Fred Bloggs in his car, etc.)"

          Not really these days - the capital requirements and the bond required have been so large for quite a few years that only a company with deep pockets can enter the market.

      2. 43300

        Re: During the meanwhile ...

        Network rail is owned by the government. Most of the passenger train operating companies are currently either owned by the government or on such micro-managed outsourced contracts that they can't scratch their arses without approval from the Department for Transport.

        The only bits operated on a fully commercial basis are the very small number of open-access passenger operators, and all except one of the freight operators.

        Rolling stock is mostly leased, although the parent companies of some passenger operators do own a small amount. In the freight sector a lot is leased too, but a larger proportion is owned outright than is the case with passenger stock (the largest freight operator owns all their rolling stock, I believe).

  8. pip25
    IT Angle

    So, uh, you went to a conference about a cloud technology...

    ...and was shocked to realize the companies that bothered to be present were looking for people with cloud-related qualifications. How unusual indeed.

  9. gurugeorge

    Brits - What do you think of the salaries mentioned here?

    1. Roo
      Windows

      We're cheaper because we live in a 2 bit slumlord economy.

  10. Anonymous Coward
    Anonymous Coward

    what do we call this role? what are the levels? what are the skills at each level?

    As a first principle,

    - organisations want skilled people

    - people want to earn

    but

    - organisations can't find skilled people

    - people can't find the right skills to the right levels

    ...then perhaps the challenge is establishing | agreeing | accepting | rewarding which skills people should have.

    [my understanding is that this is a more genuine "skills shortage" than 'I can't find anybody to flip burgers for $£€1/hour', but if it was $£€20/hour there would be no "skills shortage"]

    Let's exclude Universities from this specific conversation; there's lots to talk about there, but that's the 'how' while I'm still trying to determine the 'what'

    Assuming it is a single skills domain, what do we call this skills domain?

    - DevSecOps

    - SRE

    - DevOps

    - something else?

    My understanding from both a general understanding, this article and comments, is that DevSecOps - as in the separate and interconnected skills domains development, security and operations, not some vague, imprecise new concept - is the right term. But I'm asking, so appreciate answers.

    Next, can we have a list of skill sets?

    AWS Certified Solutions Architect

    Microsoft Certified: Azure Administrator Associate

    Google Cloud Certified – Professional Cloud Architect

    Certified Kubernetes Administrator (CKA)

    Terraform

    AWS CloudFormation

    Azure ARM

    GCP Deployment Manager

    Puppet

    Ansible

    Chef

    Git

    Apache Web Server

    Apache Tomcat Application Server

    SQL

    Java

    C#

    Agile

    SCRUM

    Kanban

    PRINCE2 | PMBoK

    VMware | Hyper-V

    application delivery controllers (ADCs)

    YAML

    HTML | CSS | Java

    OIDC | CIAM | SAML

    [what else]?

    Next, is it possible | useful to have levels?

    - staff | principal | distinguished |fellow - the model for engineers

    - https://www.indeed.com/career-advice/finding-a-job/engineer-level

    Is it possible to say skill level for each level? A staff DevSecOps Engineer may, for example, have a PRINCE2 Foundation certification, but an Fellow will probably just get things done.

    As another post suggested, there's awful lot out there already. But emphasis on awful.

    As another writer contributed, something optimal from two years ago might no longer be so. If this stuff is 10 years old, then, in really simple terms, even the precise, correct and comprehensive stuff is now out of date - 80% of it!

    Sometimes 'architect' is used, but I don't particularly like the term - even Google allude to this in their SRE book.

    It does seem you need to be a master of all | full stack technologist

    There exists...

    https://sfia-online.org/en

    ...but I am guessing - based on it never being mentioned by anyone - it may be fine approach, but irrelevant | out of touch | whatever

    I am guessing that - by it's very nature - this is a truly global, universal career - the same skills, levels, etc are needed whether Australia or Zambia. Terms and conditions might be different, but a professional needs the same skills set.

    A current, specific and evolving global SFIA is what we need, I think!

    1. Anonymous Coward
      Anonymous Coward

      Re: what do we call this role? what are the levels? what are the skills at each level?

      HTML | CSS | Java

      should have been

      HTML | CSS | Javascript

      The skills list should perhaps include

      JSON | XML

      Visual Studio | Eclipse | Visual Studio Code

    2. Anonymous Coward
      Anonymous Coward

      Re: what do we call this role? what are the levels? what are the skills at each level?

      oh, it's 2023, so AI, obviously

      /s

  11. Anonymous Coward
    Anonymous Coward

    actually, the

    SFIA 8 The Framework Reference

    https://sfia-online.org/en/sfia-8/documentation/sfia-8-the-framework-reference-v8-0-sfiaref-en-210928.pdf/@@download/file/SFIA%208%20The%20framework%20reference%20v8.0.sfiaref.en.211101.pdf

    [free; registration required]

    ...seems to be good idea.

    Systems and software life cycle engineering SLEN

    Establishing and deploying an environment for developing, continually improving,

    and securely operating software and systems products and services.

    seems to describe this skills domain

    but it is *far* too vague.

    It would benefit from

    - skills certifications (PRINCE2; PMBoK; etc)

    - technology certifications (Terraform; AWS; etc)

  12. EricB123 Silver badge

    "Economists, tired of playing Chicken Little and finally realizing that the sky has not fallen, are no longer certain that any recession is on the cards."

    The economists I listen to seem quite certain still.

    1. J.G.Harston Silver badge

      When economists stop predicting a recession, you know one's just around the corner.

  13. darklord

    Local cloud servers

    This just cracks me up, a lot of our customers are migrating to local cloud servers . Ahem on site server and data storage. Well bugger me weve gone back 25 years when they had on site servers and storage and backup facilities except now they dont want on site support staff. which is great until stuff breaks (which it does) then the whole world is down as the link network is still a bit iffy in some parts of the world.

    How is that cloud progress. so yes more cloud engineers are needed . (ahem server support staff) as its not really a cloud in anything but name.

    1. 43300

      Re: Local cloud servers

      I think the term "private cloud" is sufficiently flexible to cover 'we have our own on-prem servers and some / all users access them remotely'!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like