back to article Google Cloud takes a gap year. It may come back with very different ideas

Taking a look at the latest financial results from Google/Alphabet made some of us do a double-take ... and not because of the $40bn+ in ad revenue. If you read closely, you'll see that Google Cloud has lessened its habitual loss by extending the operational lifespan of its cloud servers by a year, and stretching out some of …

  1. jake Silver badge

    Or perhaps ...

    "Maybe Google's gap year is an indication that business will not resume as usual."

    Or perhaps it's because people are starting to notice that all "cloud" means is that your business is being run on someone else's computer(s), and as such you have no control whatsoever over that very important part of modern business.

    I think the clouds they are walking on are wearing thin. Snake-oil is snake-oil.

    1. J.Teodor

      Re: Or perhaps ...

      As someone, who had to deal with on-prem servers for a long time, I never want to go back.

      To beg BOFH for another test server instance to be deployed, or get the god-wannabe DBA to increase the size of your production data store. God forbid the request would be after three o'clock, because that is when the "gods" would be ready to go home, and you would get immediately denied.

      No, never again.

      My applications run on servers that get automatically patched, I can resize my DB as needed, I can spin up a new test server in a minute or few... and the management is happy, because the costs are way, way down, stability and reliability are up.

      This may be a different story if your $JOB has hundreds or thousands of servers, with a dedicated team of data center professionals supporting those servers. But this is perhaps 1..2% of the companies. For the rest, cloud is godsent.

      1. Doctor Syntax Silver badge

        Re: Or perhaps ...

        What you describe is probably a people problem. The problem may lie with the wrong people administering the systems. Equally it might lie with the people holding the purse strings and your sysadmins & DBAs may have equivalent but different problems in getting the sign-offs to provide you with your test server. But ultimately you're using somebody else's computer as a sticking plaster to cover the self-inflicted wounds of a dysfunctional company.

        1. J.Teodor

          Re: Or perhaps ...

          Partially, it is a people problem, but partially it is a resource and cost issue.

          Up to a certain number of servers/resources, AWS, Azure and GCloud are a lot cheaper to use than on-prem. I do not know where the break-off point is - it may depend on service type, e.g. if you run mainly simple web APIs or processing jobs, the break-off point may be at tens of thousands of servers, but for databases (including licensing costs), it may be way below hundred, especially if you are one of those unfortunate souls still using Oracle.

          And I am not just talking about cost of the hardware, licenses, and running the service - but the environmental footprint of highly optimized cloud data center is probably tenth of what those servers would be in average "cubicle next to toilet" on-prem. Then comes the maintenance costs, and failover, and backups, and regionality, and cost of setting up a new dev server, and... you get the point.

          There are hybrid models (we actually use Azure like that) and private clouds, both of which may offer better benefits than relying on a big cloud provider. But YMMV with that.

      2. Nate Amsden

        Re: Or perhaps ...

        As someone who has dealt with on prem servers since ~1998 I never want to go back either (back to cloud that is). I was exposed to 100% cloud infrastructure for about two years across two different companies. Moved the second(current) company out almost 10 years ago exactly now(company was "born" in cloud from day 1), I think the day was Feb 23 2012 that we moved our first infrastructure out, and about July 2012 when we moved the last of it out.

        DBAs have always been on my team, and generally always reasonable when requesting resources. We have tons of metrics so can generally 100% agree upon when something needs to change(I can even add CPU or memory or disk live to my systems with zero impact to the DBs). My team has always controlled administrative access to every system in the environment(all linux systems managed by Chef configuration management). Developers and others had no issues with that(except one guy, I told him he had to host his stuff on the IT side of the house if he wanted a windows server and have admin rights to it). Developers praised us on occasion for providing such a robust environment that just worked. Not a single server has had to be rebuilt as a result of hardware failure in a decade of operation. Prior to the ops team forming the developers (with IT's help) ran their own VMs that they managed and even they knew it was a shit show.

        Speaking of databases I remember to this DAY (this was over 10 years ago!!) a phone call with amazon support about TERRIBLE database performance on our RDS at the time, I even took a screenshot and kept it all these years:

        http://elreg.nateamsden.com/rds-cloudwatch.png

        I remember a comment the support person said, they said oh we are getting great performance look at those IOPS, oh yeah 3,000 IOPS is good but look how much data was transferred... 200 KILOBYTES? Write latency over 150ms ? CPU usage maybe 5%? WTF IS GOING ON.

        We were(at previous company) a "beta" tester for amazon's early "performance" EBS system. I forgot the technical term this was back in 2010, basically you get more IOPS with more space. The idea was interesting but the implementation(at the time) didn't work. I'm sure they've fixed that since though.

        Back to cloud - the lack of reliability, the lack of in depth monitoring, the endless list of small failures, the forced reboots, the head scratching moments WTF is going on and why? Because of the variability and constant manipulation of the infrastructure drove me mad. The lack of ability to precisely size systems, the lack of ability to oversubscribe. The lack of control, the INSANE COSTS.

        My former manager was talking with google cloud last year, and the cost for hosting our production databases(about 30 systems) was about as much as it cost us to run our entire datacenter operations(about 750 systems) - according to him I never spoke with the google people. That wasn't even taking into account the extra capacity we have waiting to be used(which could easily run another 500 systems). It's comical. 2-3 years ago we had a VP who wanted to go cloud(no reason other than to help his resume I think), we told him it was too expensive. He said "he had a guy" who can make the numbers work. Well ~6 months later the VP was gone didn't really hear about the concept again.

        I have seen many people who have loved cloud stuff, those people also don't seem to care about the costs. Many others don't believe cloud (generally) so much more expensive than running it yourself.

        The last company I was at I hit a wall in convincing the board of directors to move out of cloud, despite having CTO/CEO onboard, and the rest of the company really with a $1.6M savings in the first year of operations. But I left shortly after that, my (original) hiring manager at that company hired me at the next(current) company. Previous company collapsed a couple years later. Their cloud spend was upwards of $500k/mo for a tiny startup(maybe 100 employees?) at peak, more common was in the $200-250k/mo realm. Current company was about $80k/mo when we moved out.

        Current company I'd say conservative savings has been $10M, more practical savings of over $15M over the past 10 years, that is with a peak of ~5 racks of equipment. Currently about 3.5 racks. Not talking super scale here.

        I remember hosting a load balancing software called Zeus(at the time) in Amazon cloud because the ELBs were such pieces of shit. The cost of running Zeus (as an appliance distributed by the amazon store thing), which was CRIPPLED because it could only have a single IP address was huge. It alone would come to about $10-20k/year for a single system I think? That could pay for a real hardware load balancer very quickly(my current load balancers ran upwards of about 450 IP addresses(at peak) on several networks for various workloads and fail over within 1 second, Zeus as it used Elastic IPs took something like 20 seconds).

        I'd be all in on cloud if it provided a superior experience(or at least equivalent - control my network, connectivity, end to end storage metrics down to the links and disks/ssds etc). That is offering the level of control and availability that on prem can offer (that includes data center facilities where everything is N+1 power/cooling). If cost was never a concern would help too.

        Oracle said it pretty well at one point in the last year or so, they want their customers to go cloud because it makes Oracle so much more money.

        One area where cloud can make sense though is SaaS. Abstracting most of the failings of the major cloud providers behind an application that is hopefully robust. But as we've seen with recent cloud outages even that can fall apart.

        I've been told by multiple people over the years nobody can run a data center operation like I can(at least in their experience). So I am somewhat unique in that ability. It is sad that companies have yet to realize that even if they need 3-5 people on the team to do stuff it's likely going to be done far cheaper and better on prem.

        The cloud marketing bullshit hype cycle is deafening though. People talk about "hybrid cloud" a lot. Current app stack I manage has nearly 20 micro services. There had been talk in the past about running some of those in public cloud to provide more scalability. Despite the fact we had no lack of capacity on site, people were clueless when it came to things like latency between services. Whether you are distributing an application across two different data centers or a data center and a cloud there will be a huge latency hit regardless unless that 2nd location is very close(say within 50 miles). But many people(for some reason) don't realize that. Some apps wouldn't care but most transactional ones would care a ton.

        Most of the failings of the major cloud providers are BY DESIGN, and haven't changed in the past decade(not betting on them changing in the near future too).

        But as with anything, you can do things on prem very poorly, and you can do things in cloud very poorly. You can do things in both very well in rare occasions.

        1. Anonymous Coward
          Anonymous Coward

          Re: Or perhaps ...

          Thanks hugely for that - I'm not at all in the business but found it a fascinating and apparently well-reasoned discussion, hope it continues well for you!

  2. andy 103
    Stop

    We're beyond needing to upgrade anything

    "sometimes even letting you pick the [CPU] you want"

    Here's the thing. Most people deploying and running software they develop don't really know or care what underlying hardware it's running on. The old days of dedicated web servers were a great example of this. What's the difference for my use case between CPU "A" and CPU "B"? Unless you're doing something really specific and know the underlying differences of how that could affect your code, it's just not something that's high on anyone's list of priorities.

    In the same way as using a mobile phone from 3 - 4 years ago is probably still good enough for most peoples' needs, the cloud market has got to a point where upgrading all the hardware all of the time is simply counterproductive.

    One of the things I've learnt over the years is that when it comes to infrastructure there is a rule: Boring = Good. In other words things running predictably isn't a bad thing at all, and frankly if it works...it works. Personally I'd prefer to use a cloud provider that stays clear of pointless upgrades.

    1. Nate Amsden

      Re: We're beyond needing to upgrade anything

      Boring = good, I like that.

      My production ethernet network hasn't had a state change since Oct 2019 (maybe the last time I did a software update on them many of the devices are 8-10 years old now software is super stable/boring). Boring, good no issues. If you ask the vendor how to deploy their equipment and how I deployed it they wouldn't suggest my method because it's not sexy, it's boring(the core relies on a unique protocol they haven't touted in 15 years).

      My primary all flash storage array has had 0 hardware or software failures since it was put in service in Nov 2014. Boring, good no issues.

      I can really probably count on one hand the number of different hardware server failures that I've had in the past 3 years. I can probably count on one hand the number of VMware support cases I've had in the past 5 years. I don't even need to count the number of full or partial power outages of our primary data center in the past 10 years (we had 2 partial power outages at a 2nd facility in Europe maybe back in 2017? I had two network devices that were single power supply, of course lost one of them each time the power went out but they were redundant so no impact in the end).

      Last time we had a major internet connectivity issue was either the first big DDOS against Dyn DNS years ago(cloud provider...), or perhaps DDOS against our ISP (targeting other customers mainly gaming customers I think and it made news here on el reg at the time) maybe that was in 2016 or 2017 I don't recall. I'm excluding CDN outages(cloud again..) as that only impacted our website not the entire "datacenter". Have had probably 2-3 hrs of CDN downtime over the past several years.

      What vmware am I running ? ESXi 6.5 + vCenter 6.7. BORING = good. v7 seems like it needs more time to mature. I ran ESX 4.1 past EOL, and ran ESXi 5.5 past EOL, I'm thinking 6.5 past EOL will be fine too. Upgrading even to 6.7 doesn't get me anything. If you think I'm worried about running EOL vSphere, well I'm really not given the track record, and given the number of other software products internally that are even much more EOL than that(EOL years ago) that I don't have control over.

  3. Mellipop

    Waiting for ARM Server processors

    Or Alphabet is waiting to replace their servers with lower power ARM chips. Think of the marketing coup that would be. It'll save millions in Lx bills and stop heating the planet so much.

    Oh, you didn't mean to link to two stories together?

  4. Anonymous Coward
    Boffin

    Waiting for Google server chips

    Google has already designed a custom chip (the Tensor) for its phones and has announced it is designing one (presumably Tensor based) for its Chromebooks.

    I would not be surprised if they are also designing a server focused variant.

  5. xyz123 Silver badge

    Google said Stadia was "taking a break from updates".

    Google said Nest was "taking a break from updates"

    Both things in the middle of cancellation (layoffs of all support and technical staff and management first - announced cancellation sometime in Q2 2022.

    Now Google cloud is "taking a break from updates" - hmmmm

  6. elsergiovolador Silver badge

    Catch-22

    If people don't start to run services on premise, the monopolies will not break.

    There is also a myth that you need to have plenty of hardware coming from where servers weren't powerful but pretty expensive and cloud was indeed helpful to mitigate that.

    Most services corporations run could be served from a single modern dedicated server without breaking a sweat.

    And for the same money they are spending on the cloud, they could afford full time dev ops team and redundant servers.

    Difference is that when someone goes wrong, you are not at a mercy of some nameless workers on the other side of the globe.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like