back to article Google porting all internal workloads to Arm, with help from GenAI

Google has revealed it’s ported around 30,000 of its production packages to the Arm architecture and plans to convert them all so it can run workloads on both its own Axion silicon and x86 processors. The search and ads giant documented its move in a preprint paper published last week, titled “Instruction Set Migration at …

  1. Tim99 Silver badge
    Trollface

    FTFY?

    "Doing so will likely save money... Axion-powered machines deliver up to 65 percent better price-performance than x86 instances, and can be 60 percent more energy-efficient... the web giant will need fewer almost no x86 processors in years to come".

  2. Dinanziame Silver badge
    Unhappy

    "up to" 65 percent

    I was very impressed until those two words registered. I do wonder what's the average savings, but it's probably much lower than that

    1. DS999 Silver badge

      Re: "up to" 65 percent

      Well the price/performance is obvious. Google pays a foundry like TSMC to make their chips, rather than buying from AMD who is paying TSMC to make their chips. Cuts out the middleman, a middleman with a very high gross margin thanks to the x86 duopoly. MUCH cheaper to license ARM Neoverse cores than to pay AMD/Intel sized markups.

      1. Tom66

        Re: "up to" 65 percent

        Out of the frying pan and into the fire though. ARM is the sole licensee of the ARM architecture, and whilst I'm happy to see a UK company see success here, I wonder what will happen in a decade as ARM becomes the dominant architecture in servers.

        Would have been nice to have seen them embrace RISC-V, though admittedly the performance of the silicon is not quite there yet, with Google throwing money at it, I imagine it could become pretty competitive.

        1. DS999 Silver badge

          Re: "up to" 65 percent

          Companies that are locked in to x86 are locked in because of Microsoft. If you don't need to run Windows you don't have anything holding you to x86. There is nothing holding anyone using ARM like that, so ARM can't extract as much as a middleman as Intel/AMD do from their x86 duopoly or they'll get left behind like x86 is (for those not dependent on Windows)

          The investment required to get RISC-V (or something else) to where it could replace ARM in servers would take some investment, but that investment will appear if ARM gets too greedy. I could see Google, Amazon and Microsoft getting together with some others to spec a new ISA designed specifically for the needs of servers / hyperscale clouds. Yes they could use RISC-V which already exists, but it isn't exactly problem free so starting fresh might work better for their needs in the long run.

    2. Charlie Clark Silver badge

      Re: "up to" 65 percent

      Doesn't really matter at the scale they operate, that could equate to millions (or many multiples thereof) per year. I bet they'd recoup their investment on 1-2% improvement on YouTube alone. See their previous work on energy use.

      1. Anonymous Coward
        Anonymous Coward

        Re: "up to" 65 percent

        They could do that by less endemic advert pushing and learn from ‘Skip’ behaviour demonstrated. Shame it can’t detect TV put on mute, coffee made or balls scratched whilst ad’s spin through.

        Gemini AI certainly knows exactly the worst point on a video to drop the ad’a. It’s got an almost perfect record of annoyance.

        1. Charlie Clark Silver badge

          Re: "up to" 65 percent

          What ads? If you don't like them and haven't installed an ad-blocker or something like ReVanced, then that's your problem.

          The ads follow the money and they wouldn't do them like this if they weren't effective, though I should add the, ahem, old adage that fifty percent of adverts are ignored. The problem is figuring out which fifty percent.

  3. sarusa Silver badge
    Devil

    God damn

    Jaysus Christ, the amount of broken fuckery this will cause! Well, I guess in 202x we are 100% committed to enshittification for shareholder value.

    1. DS999 Silver badge

      Re: God damn

      It would be hilarious - and timely - if it turned out that Amazon's recent downtime was due to this. I know they've said "DNS" but the thing that caused DNS to break could have been something that was AI ported to ARM and had a subtle bug that caused the DNS borkage.

      1. FIA Silver badge

        Re: God damn

        It would indeed be funny if Amazons recent downtime was due to this, mainly as this is a Google story.

        1. DS999 Silver badge

          Re: God damn

          Since Amazon also has their own ARM chips they are using in the cloud, and a lot of software that was originally written for x86, they are in the same boat of needing a way to port a ton of legacy x86 software to ARM. And may have turned to AI to help speed that up just like Google. That was my point, not that Google's porting directly affects Amazon.

      2. Aaronage

        Re: God damn

        Nah, most Amazon services have been running on both x86 and aarch64 for years at this point

    2. FIA Silver badge

      Re: God damn

      Jaysus Christ, the amount of broken fuckery this will cause!

      The article implies they've migrated Gmail, Youtube and a significant amount of others. (30K is mentioned).

      This is an example of using AI as a tool, not thinking it's the second coming of Christ.

      This is the kind of thing AI is genuinely useful for. It's not going to replace all your devs, it's not going to write your software for you, but it does have use for these kind of tasks and so long as you use it responsibly and don't just blindly accept it's output then it's useful.

      We see this time and time again in IT, useful technologies appear, they get vastly overhyped, but then people forget that they're also useful.

  4. A Non e-mouse Silver badge

    Am I being naive or is having AI "fix" your test code just a little bit scary?

    1. Charlie Clark Silver badge

      I don't think that's really how this works. Certainly machine learning, but the process looks pretty deterministic.

    2. Anonymous Coward
      Anonymous Coward

      The “ Google found the agent succeeded about 30 percent of the time under certain conditions, and did best on test fixes, platform-specific conditionals, and data representation fixes.”

      Was the scary bit. That’s worse than an intern.

      (El Reg - what’s with all the *endemic* fucking pop-up ad’s today…! Worse than YouTube)

      1. FIA Silver badge

        The “ Google found the agent succeeded about 30 percent of the time under certain conditions, and did best on test fixes, platform-specific conditionals, and data representation fixes.”

        Was the scary bit. That’s worse than an intern.

        The scary bit is you'd consider giving this kind of non trivial task to an intern.

    3. munnoch Silver badge

      We've all worked with the guy who thinks that fixing the failing tests means either commenting them out or adjusting the test expectations to match what the code does. I'll admit to very occasionally being that guy...

      Seems like he's the one training the agent...

  5. that one in the corner Silver badge

    Amazing what maths can do

    > started with an assumption “that we would be spending time on architectural differences such as floating point drift, concurrency, intrinsics such as platform-specific operators, and performance.” ... "it turns out modern compilers and tools like sanitizers have shaken out most of the surprises"

    Sorry, what?

    They planned to be spending their time on dealing with minute details of instruction sets and opcode interactions? The things for which we've been building decades of maths - you know, the ONE area that can actually deal in proofs! - from formal theories of languages to planning and optimisation of completely defined[1] and constrained operations in hardware? Variances in floating point accuracy and stability, the bread and butter of published mathematicians and computer scientists; published, as in actually told everyone and provided worked examples as compiler patches to present to the examining board?

    But were then, apparently, surprised to find that their time was really needed on fixing things that were rather less rigorously created in the first place, like build and release systems?

    Worse (!) having discounted the fine and detailed maths described above, they hadn't planned to be tackling the bits of maths and stats that they should have been expecting to find in their own systems, such as overfitting tests. Or the issues inherent in keeping existing running interacting systems stable (hint: keep the machine room clear of butterflies).

    Ok, it wasn't my best subject, but being sent out of the terminal room, away from the blinking lights, and into the lecture halls of the Department of Mathematics, Statistics and Operations Research was once seen a basic to producing a well-rounded geek.

    Let alone one with the title of "engineering fellow".

    And then ... they patch it all up with "AI". Run a learning model, get it to print out what it has discovered, sanity check that, then engineer the results into your improved systems, good idea. Letting it run wild on its own...

    1. MatthewSt Silver badge
      Trollface

      Re: Amazing what maths can do

      Someone's never tried adding 0.1 and 0.2 together...

      https://0.30000000000000004.com/

      1. Richard 12 Silver badge
        Boffin

        Re: Amazing what maths can do

        IEEE 754 is explicitly specified. Every compliant toolchain will produce exactly the same output for the same input.

        It really will. It's in the actual standard.

        GCC, clang and others do have some "non-compliance for speed" compile-time options, however those only affect some pretty degenerate edge cases that your static analysis and GCC itself has long warned about if you're likely to go there.

        The real difference between ARM and amd64 is the memory barriers. ARM has several different types, amd64 only the one.

        So if you chose the wrong one, it won't matter on amd64, but it will on ARM.

        1. Bebu sa Ware Silver badge

          Re: Amazing what maths can do

          The preprint is worth a quick read just to appreciate the scale — 100,000 applications to port !

          To get the floating point output to align between arm64 and and64 they appear to have rebuilt applications to use absl::float128.

          The ultimate problem isn't with the ticklish problems they catch and solve but with those subtle differences hiding in plain sight that they and their AI code buddies fail to detect. But that is likely no different from new code and probably orders of magnitude less for coded ported from well tested applications.

          I imagine if I were a googol sized cloud operation thinking of moving from a legacy architecture to a new one that I could specify I would seriously think about retaining the commonalities that don't affect performance or at least make the new architecture a runtime configurable superset of the old architecture in respect of those features. Porting ginormous codebases might well be the largest cost component from now.

        2. david 12 Silver badge

          Re: Amazing what maths can do

          IEEE 754 is explicitly specified. Every compliant toolchain will produce exactly the same output for the same input.

          From one of the background papers for the 2019 standard:

          "Unfortunately, the IEEE standard does not guarantee that the same program will deliver identical results on all conforming systems. Most programs will actually produce different results on different systems for a variety of reasons."

          https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/addendum.html

          For more simple generic information on IEEE-754, you can look at Wikipedia

        3. This post has been deleted by its author

        4. Anonymous Coward
          Anonymous Coward

          Re: Amazing what maths can do

          > IEEE 754 is explicitly specified. Every compliant toolchain will produce exactly the same output for the same input.

          This doesn't mean what you're suggesting and is easily demonstrated to be false.

          Clause 9 has a list of operations which are expected to be rounded the same way, but a program doesn't have to comply.

      2. Anonymous Coward
        Anonymous Coward

        Re: Amazing what maths can do

        This was always coming up in finance - people would use floats for payments and exchange rates, then wonder why $1B deals would have crazy maths errors.

        My poor mate had to replace an old trading system full of floats - he used high precision types, and the business wouldn't sign it off because the output was so different. In the end he had to bring back floating point errors.

  6. This post has been deleted by its author

  7. This post has been deleted by its author

  8. Bluck Mutter

    Confused

    The article states number like:

    - as do around 30,000 more applications.

    - 30,000 of its production packages to the Arm architecture

    - 30,000 applications,

    - 70,000 more apps in the conversion queue

    - another 70,000 packages to port.

    So what is it? applications or packages given one is not the other.

    I can believe that Google have 10,000's of packages (i.e. small programs that are called from a mainline) but not applications.

    If this article is to be believed the statement "70,000 more apps in the conversion queue" indicates that Google have more than 70,000 APPLICATIONS which with with a total work force of 180,000 means that broadly (rounding up a little) that every second employee has their own unique application all to themselves.

    Poor reporting OR poor info from Google.

    Bluck

    1. Anonymous Coward
      Anonymous Coward

      Re: Confused

      Just wait 12 months and they'll have abandoned most of them anyway - so there's probably little point either to counting them or porting them!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon