back to article You're right not to rush into running AMD, Intel's new manycore monster CPUs

Intel recently teased a 128-core Granite Rapids Xeon 6 processor, and your humble vulture thinks you can ignore them – indeed, ignoring them might be your safest course of action. That’s because Intel, and AMD, will encourage you to put a lot of eggs in their manycore baskets. I’ve heard both argue that parts such as the 72-to …

  1. cyberdemon Silver badge

    Many cores on power-limited package = poor single-thread performance?

    I've often been skeptical whether a 200W TDP 16-core CPU can perform as well as a 200W TDP 4-core CPU for single-threads (let alone 64+ cores). I wonder what the maximum "TDP" of a single-core is. It will be more than 200/16 of course, but how much more?

    Going to very high core count, that's what GPUs are for, not CPUs?

    1. Philip Storry

      Re: Many cores on power-limited package = poor single-thread performance?

      My understanding is that GPUs scale well because they're only doing vector work.

      That is "single instruction multiple data", where one instruction is applied to multiple numbers.

      Think of it this way - if in a game you move a little away from a light source, you now need to recalculate the lighting. You could do this in a linear manner across the frame - which is what a CPU would have to do. Or you could stuff all the data into a bunch of GPU cores and ask them to calculate the light levels all at once.

      The CPU method can only be faster if you lack the bus bandwidth for the GPU approach. That's why - back in the day - 3D cards were never produced for the slower bus types. They can't be, they need high I/O to keep the cores fed.

      This has been a highly contrived and incomplete example, but hopefully serves to demonstrate things.

      Why do we still have CPUs? GPUs only compute. They rarely do branch evaluation - IF this THEN that. Looking at the results and deciding which branch to take is still the domain of the CPU in most cases. Good luck running an OS on only a graphics card, it lacks the branching logic to make it worthwhile.

      A suitable visualisation would be a factory floor full of people at calculating machines - these are the GPU cores. And a manager - or several if you like - controlling what numbers they will be calculating. The manager is of course the CPU core.

      And yes, you get diminishing returns by adding managers, and you need better communications if you're to make use of more calculators - or even more managers. But bad managers everywhere have already taught us this. ;-)

      1. Richard 12 Silver badge

        Re: Many cores on power-limited package = poor single-thread performance?

        GPU branching is rather ridiculous.

        It works by executing both branches, but marking the "branch not taken" as not being able to write to anything.

        Which of course indicates that loops are even more stupendously expensive, as they have to be faked by unrolling them...

        1. PenfoldUK

          Re: Many cores on power-limited package = poor single-thread performance?

          It depends on how you define "stupid".

          Yes, from a pure steps executes point of view it is wasteful. And must increase power consumption to a degree.

          But can be incredibly useful to save a CPU sitting largely idle and save overall runtime. Especially if you write the code so that any rollbacks happen in a minority of cases.

          Of course, the downside is when the implementation of the rollback isn't 100.00%, which has allowed a number of exploits in recent years. So you have to decide whether to have speculative execution switched on or not, depending on your use case and security concerns.

          1. Mast1

            Re: Many cores on power-limited package = poor single-thread performance?

            "It depends on how you define 'stupid'."

            Agreed. Many years ago I was optimising code for an in-PC DSP card running on real-time audio (when DSP chips out-performed x86).

            The code ran standard library routines with glue logic written by me. Near the end I profiled the runtime, and found that it was only spending about 8% of the time in the library code, and the rest in my glue logic (hand-optimised assembler). The biggest culprit was in the branching. With block-based processing, the sledge-hammer approach, ie compute for all cases, was more efficient to enable the code to complete on time than putting in conditionals for edge cases, the otherwise "logical" way to save time.

      2. CowHorseFrog Silver badge

        Re: Many cores on power-limited package = poor single-thread performance?

        No GPUs scale well because the scope of the data they work on is very local and there are very few external dependencies that cause deadlocks. Its all individual pixels, gpus are not trying to all synchronize and deadlock on a few or single global variables.

    2. rgjnk Bronze badge

      Re: Many cores on power-limited package = poor single-thread performance?

      The game is finding the solution with the best combination of single core performance and core count, and that is *really* tricky without spending all sorts of effort wading through the huge list of random alphanumeric CPU models that are options now.

      Plus the prices become ridiculous very quickly as you climb the spec sheet.

      Last time I played this game the sweet spot gave more cores *and* more single core performance for less money than what was initially specced, but at the cost of having to add other now necessary options into the base box.

      If you really want to get into the power limited game it's monster laptops where it becomes comical - very few have enough cooling to run flat out. I've found that 'full fat' luggable desktop replacement machines could in reality outperform their 4 year newer nominally much faster but more compact workstation cousins because they weren't throttling anywhere near as much.

    3. Probie

      Re: Many cores on power-limited package = poor single-thread performance?

      I can answer that question - "I've often been skeptical whether a 200W TDP 16-core CPU can perform as well as a 200W TDP 4-core CPU" at least as close as I can with SpecJVM and CPU pining.

      Not much in it.

      1) If you do core for core count, the Turbo function means the comparative cores will be running at the same speed. = Not much in it

      2) If you do maximum bin packing on the CPU, so 32 1 thread workloads on a 4 core CPU and on a 16 core CPU = The 16 core machine will annihilate the 4 core CPU At which point someone will point out its not fair as its not straight single threaded workloads there is contention and switching - go back to point 1.

      Much better question is this - which is better for your Cost of Ops (IE. Putting the server into production and dealing with the configuration of the options).

    4. bazza Silver badge

      Re: Many cores on power-limited package = poor single-thread performance?

      These very high core count CPUs have become possible simply because the silicon process used to manufacture them lays down very power efficient transistors. The result is a lot of cores that can all run all at once somewhere near (or at) full bore and produce only 200Watts of heat.

      It's also allowed for more things like memory controllers, cache, to be integrated on the same die(s) to help keep the cores fed.

      Both Intel and AMD have been pretty successful at judging a good balance between thermals, core count, cache spec, memory bandwidth, etc for the "average" compute workload, with AMD benefitting significantly in this quest for balance thanks to TSMC's very good silicon process.

      It's a good question, is this not what GPUs are for? Well, there is the already given answer that GPUs are good for vector processing (so they're not well suited for general purpose compute). But CPU cores these days are also pretty well equipped with their own vector (SIMD) units, with extensions like AVX512. It's not clear cut that GPUs always win on vector processing.

      CPUs are very well suited to stream processing. GPUs typically have to be loaded with data transferred from CPU RAM via PCIe, the GPU then does its number crunching (in the blink of an eye), and then the result has to be DMA'ed back to CPU RAM in order for the application to deal with the result. The load / unload time is quite a penalty. Whereas one can DMA data into a CPU's RAM whilst the CPU is busily processing data elsewhere in its RAM. Provided the overall memory pressure fits within the RAM's bandwidth, the CPU can be kept busy all the time. This quite often means that the GPU isn't the "fastest" way of processing data.

      One good example is that of the world's supercomputers, machines such as Fugaku and the K Machine - which are purely CPU based - often achieve sustained compute performance close to their benchmark scores; they cope well with data streams. The GPU based supers are also good, but only for problems where you can load data, do an awful lot of sums on that data before moving on through the input data set.

      This is why NVidia have NVLink, to help join networks of GPUs together without total reliance on CPU hosts doing the data transfers for them.

  2. JRStern Bronze badge

    Risk, interesting, but there's more

    That "risk" factor is very interesting, needs some further treatment.

    On the RAM question I think that's off, RAM should be about constant per-app wherever it runs.

    And failover, in my experience, is less than magical, and recovery from failover worse than the original fail.

    Certainly these are meant for clouds. And I strongly question whether they gain or lose performance for individual apps. I hope everyone has the sense to go slowly and carefully on these.

    And I'm way out of touch for licensing issues, which were a major pain, Microsoft doing only per-core licensing last I looked into it, almost ten years ago, whether they're idle or not. SHOULD offer volume discount, if not just per-processor flat charge to small users, big users get flat pricing anyway.

    I've spent a lot of time on performance and scalability issues, it's great to have more cores to use for peak loads, if you can afford to have them sit idle 50% or 90% or 99% of the time. But most people don't even try such fancy stuff, they just hope paying for cores is the last thing they have to think about. Nope, doesn't work that way.

    1. Philip Storry

      Re: Risk, interesting, but there's more

      RAM may be somewhat constant, but everything running on the server needs it.

      And it needs power.

      Adding all those cores implies more RAM. I'm suddenly reminded of a decade ago when I was speccing a machine that needed (according the vendor's recommendations) half a terabyte of RAM. Which is getting to be a mundane figure today, but back then it had to sit on its own dedicated daughterboard. Which required its own PSU.

      My manager balked at the cost and halved the PSUs, thinking he was smart. This of course meant that the failure of a single PSU would kill the whole machine. I have no idea if the hardware team let him get away with that - as he wasn't the best boss, I was gone before the machine actually arrived.

      But it did teach me that RAM likes power a lot more than you'd think.

      And it scales linearly per Mb. CPU power consumption doesn't quite do that per Mhz, nor does GPU. And storage - whether spinning rust or solid state - doesn't quite scale linearly either.

      If you're planning on just doubling the cores for your existing VMs, this isn't a problem. But if you want to run more VMs, it may well be a problem...

  3. Saigua

    Compute in memory is coming for us all.

    This minder is a nice resetting of the 'one does not simply refresh into a €€230k cpu' (or four, plus I suppose a proofing oven and/or district heating,) and improved schedulers that get along nicely.

  4. Kevin McMurtrie Silver badge

    Missing the point

    High CPU-core systems excel at scaling a single large task. Sometimes the static resources of an application (code, data tables, cache) is large compared to the dynamic resources (state, buffers, cache, compute, I/O). This is where high core counts are a huge money saver by not replicating static resources. The argument of a painful failure is silly. You should always have spare processing power and graceful fault tolerance.

    Lower core systems are best when you have many smaller tasks that don't share static resources.

    1. botfap

      Re: Missing the point

      Absolute bollox, you clearly dont work in this arena. Super high core systems have generally very low clocks which makes them poor systems for running single application images. There are pretty much zero applications that scale up upto a 192 core EPYC Turin in a single application image. Those that do generally get better performance from running multiple images across lower core count, higher clock speed servers anyway

      High core counts are for managing multiple, lower single thread performance, application images on the same physical hardware, they generally dont scale up a single image. You have got this completely back to front

      1. DaveLS

        Re: Missing the point

        Higher clock speeds don't help if you have to wait microseconds for results to be exchanged between parts of a large computation running on separate servers linked by, say, Infiniband. Some classes of large engineering and scientific simulations benefit from the lower-latency (tens to hundreds of nanoseconds) communication within a single many-core system.

      2. Kevin McMurtrie Silver badge

        Re: Missing the point

        All you're saying is that it doesn't work for your app. I'm betting that they people buying monster CPUs are running a different app.

  5. Philo T Farnsworth Silver badge

    More cores the merrier. . .

    . . . at least for me.

    I have a Ryzen Threadripper 3990X 64-Core Processor machine (128 thread), which is getting a little long in the tooth, sitting next to me at the moment happily running Ubuntu. I don't need all those cores all the time but I don't need a car that will get up to 110 MPH1 without complaining but in both cases it's darned nice to have it when you need it2.

    But I'm admittedly the lunatic fringe, at least when it comes to computing. . . we won't tell anyone about my driving, now, will we. . .

    ________________

    1 Unprofessional driver on a (somewhat) closed course to paraphrase all the car ads on the telly.

    2 Gignudo database jobs and passing a slow moving semi and a RV on US 95 in Nevada, respectively.

    1. Roj Blake Silver badge

      Re: More cores the merrier. . .

      Your car can get up to a hundred and ten,

      You've nowhere to go but you'll go there again

      - Pulp, I'm a Man

      1. Ken Shabby Bronze badge

        Re: More cores the merrier. . .

        Mine goes to one hundred and 11

        Possibly Nigel Tufnel, Spinal Tap

  6. Anonymous Coward
    Anonymous Coward

    Meh.

    Don't worry, Microsoft is certain to find a way to waste all that extra CPU power..

    By the time they're finished it'll be just enough to run EDLIN.

    1. bazza Silver badge
      Holmes

      Re: Meh.

      It’s not CPU power that Edlin needs, it’s the user who needs the brain power (several kiloHolmes) to “envisage” the file they’re editing.

      I used to use Edlin (only when forced, and running throttled down to a single brain power) on a teleprinter connected afar to a PC with the DOS console running down a serial port. It’s a desktop experience that gave carriage returns and bell real meaning!

    2. CowHorseFrog Silver badge

      Re: Meh.

      i heard co-pilot and recall will solve all mankinds problems...

  7. Fazal Majid

    Analytics

    One good use-case for this kind of machine is analytics and data-science, where simpler single-node tools like DuckDB outperform complex multinode distributed big data systems like Spark or Hadoop, and are far more agile.

    1. Michael H.F. Wilkinson Silver badge

      Re: Analytics

      Exactly, I can see a good use of these CPUs in HPC tasks, especially those which do not work well under the SIMD type parallellism that sits well on GPU architectures. Shared-memory parallel processing, which is where these CPUs should shine, is so much easier than distributed-memory parallel processing.

  8. Doctor Syntax Silver badge

    "The chipmakers suggest that replacing your current servers with machines running their monster silicon will free as much as half your rack space"

    The unspoken implication is that you're to free up half the rack space to make room for more servers.

  9. Bob Whitcombe

    Less is More - Not

    I appreciate your concerns for risk consolidation into units that may fail, but the reality is that the very hyperscalers you cite have already shown how to scale density for profit. And they want more, not less. I am with them. Your argument boils down to - don't even think about using more than 64K Ram. Think of the huge numbers of transistors, each of which could suffer a fatal flaw and kill the integrity of your memory systems. Given we now buy Terrabyte devices for under $100 - that does seem silly.

    1. martinusher Silver badge

      Re: Less is More - Not

      Actually, I'm more interested in power density.

      The biggest enemy of any silicon is heat, and the more compact a device is the more concentrated the power is and so the greater the problem of getting rid of the generated heat. Since Big Silicon still tends to think of heat dissipation as '"someone else's problem" without a designed in cooling strategy I'd regard these parts as a failure waiting to happen.

      Put another way, if I was concerned about long term reliability I'd be a lot happier with 200 separate components each dissipating one watt than one large component dissipating 200 watts. The latter has a place but I wouldn't put it into anything mission critical.

      1. rcxb Silver badge

        Re: Less is More - Not

        I'd be a lot happier with 200 separate components each dissipating one watt than one large component dissipating 200 watts.

        Ah yes, the ago old question...

        https://external-preview.redd.it/pPTZUhoeh4yBV7CBhgHJz4lB7_BCSEgFSqHxFCc_mfg.jpg?auto=webp&s=65381afbab38be4c1803db6c6756402c811b56d0

        1. martinusher Silver badge

          Re: Less is More - Not

          Its more about single sourcing, reliability of distributed systems and all those little details that people seem to overlook these days.

          As someone else put it "We've reinvented the mainframe" -- except that mainframes were designed as complete systems. They had to be, because they might have been incredibly slow by modern standards but they certainly used the power and so had to have both well thought out power supplies and thermal management.

        2. CowHorseFrog Silver badge

          Re: Less is More - Not

          But 200 separate components mean distance between each of them, which leads to delays in electricity jumping around which makes for a slower system.

  10. Bitsminer Silver badge

    Reliability?

    Fewer parts means the system is more reliable. It's just that simple. A CPU with 4 cores is not significantly more or less reliable that a 288-core CPU.

    Adding lots of RAM (in the terabytes) to a many-core CPU for the purpose of running VMs (or Docker containers) is more reliable than "n" copies across "n" CPUs. This is also simple to understand. So naturally you bought VMware and ran all VMs on a very big multi-core box.

    What I've seen with some VM systems is the irrational choice of the admins to build a fleet of VMs each with 2 cores and 2GB memory. Nope -- you generally want only one core and the memory allocation for the software should be measured. Giving a VM 1.3GB is not at all unreasonable for a virtual machine when it's the right amount of RAM.

    For on-premise computing, if the computer goes offline (most likely due to GBICs or power supplies, not RAM or CPU) then the office and the workers are idle.

    But the same is true when the electricity cuts out or the proverbial back hoe does it's thing to the fibre optic Internet.

    Almost no offices ever need 5-nines of reliability. If the CPU breaks, take the afternoon off. It'll happen, but only once every five years.

    1. bazza Silver badge

      Re: Reliability?

      Regarding resources for VMs, it depends. If the hypervisor can de dupe memory (some can) then the amount of memory allocated to a VM starts becoming irrelevant. If they’re all running the same OS, you end up with only one copy of the OS in memory shared (unwittingly) between all the VMs.

      So the resource consumption equation then focuses more on workload compute requirements rather than how much memory is needed to host their static code.

      That approach can end up more efficient than a container tech that doesn’t dedupe.

      It’s the same with core count. Just because a VM has multiple cores defined doesn’t mean that it is actually exclusively using them, or using them at all. If the VM goes quiescent the cores can support other host workloads such as other VMs. Admittedly there is some memory consumption per virtual core, but not so much.

    2. Piro

      Re: Reliability?

      Rightsizing is something that never really happens.

      There are several reasons, one of which is that the system administrator who wants to optimise is not the one deploying VMs

  11. Grunchy Silver badge
    Happy

    Accelerated obsolescence

    I have not bought “new” equipment for many years now, to be honest I already have far more compute power than I will ever need. It is with immense pleasure I drop by the local recycling depot and peruse the latest & greatest industrial computational equipment from a few years ago, often selling for 1% original cost OR LESS.

    Don’t mind if I do!

  12. 0laf Silver badge
    Coat

    But...

    Can it run Crysis?

    1. Colin Miller

      Re: But...

      Image a Beowulf cluster of these!

      1. 0laf Silver badge
        Happy

        Re: But...

        I've not heard that term for a long time.

        It was always the coolest name for a piece of cobbled together IT infrastructure.

  13. Anonymous Coward
    Alert

    Worry not...

    Microsoft will be hard at work on a new version of Windows designed to suck up any additional capacity these new CPUs might have, and more; full of unfinished and corpulently bloated functionality you don't want, don't need and wouldn't dream about, even after a long evening of warm cheese and cold beer. I believe their marketing department still refers to these ongoing efforts as "Teams".

  14. Steve Davies 3 Silver badge

    Better not let Oracle know

    (If you are an existing Big Red customer)

    Otherwise they might subject your business to a surprise audit just in case said server is running one line of ORacle code.

    IF you are, expect a bill the size of the UK national debt to be delivered by a courier within an hour.

  15. Roland6 Silver badge

    Mainframe reinvented, just missing all the stuff that made mainframes rock solid…

    Might be worth seeking out all the stuff from the late 1990s when “the mainframe is dead” was all the vogue, and simply swap the headings to the disadvantages and advantages columns when comparing mainframes with a cluster of commodity off-the-shelf systems….

  16. Anonymous Coward
    Anonymous Coward

    imagine the licensing running oracle or MS SQL on those! hahahahahaha ;-)

  17. FensMan

    Why the sodding comma in the title instead of "or"? This ain't "Variety" mag!

    1. Roland6 Silver badge

      suspect the title writer is an Intel fan and so wanted to imply Intel's manycore monster CPUs are better (than AMD), hence read the article as to why... ie. it was clickbait

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like