Imagine the Oracle licence costs for a 128 core chip - each DB cluster would mean a new yacht for Larry...
Oracle has teased a further tie-up with Arm server processor aspirant Ampere Computing, perhaps around the latter's upcoming Altra Max silicon that comes in 10 variants packing up to 128 CPU cores and running at speeds between 2.4 and 3GHz. "The cloud was built on x86 processors, but the promise of Arm-based cloud computing – …
Okay, I readily admit that I am not a hardware CPU geek with full credentials, but how is it justified to say that ARM CPUs scale linearly and imply that x86 CPUs don't ?
The wiki page on ARM does not mention scalability in any way.
I've seen server motherboards with 4 CPU sockets for x86 CPUs. It seems to me that that means they scale.
Could someone enlighten me ?
Well, "scalable" is not a property of the ARM architecture really, but of a specific implementation. It seems this is 128 cores with a single socket; if it runs 128 times faster than a single core then they can call it "linear scalability". Now admittedly I haven't read anywhere whether you can add 2, 3 or 4 sockets to a server.
For the 4 sockets with x86 CPUs: If four CPUs run faster than one, then it is scalable. If they run 4 times faster than one, then it is linear scalable. I have no idea how fast these four CPUs would be.
(Non)linear Scalability is dependent on so many things. If each core has sufficient cache memory and the data (and code) for your application fits in said cache, then you might get somewhere close to linear scalability, but that requirement pretty much excludes 99.9% of software out there, and the remaining .1% that it does apply to will still need to ship the results somewhere at some point and that definitely won't scale linearly.
Linear scalability for multiple core designs is hard to achieve because the cores end up sharing (and competing for) access to the same memory system.
Caches reduce the number of times that one core has to wait while another core is accessing SDRAM, but as the number of cores increases it becomes inevitable that the number of such waits increases.
Various things in CPU designs - especially Out Of Order execution - aim to mitigate the effects of delays accessing memory - but there are limits to how much delay can be mitigated. As the number of cores goes up, resulting in the delays going up, then at some point the performance of each core suffers from the delays. Various things affect the amount of Out Of Order execution that a core can do, and the cost of implementing it. 64-bit ARM cores have a fixed length instruction encoding of 32-bits per instruction, which makes it easy to decode multiple instructions in parallel because you know where each instruction is located in memory without having to decode the instruction before it. x86 / AMD64 cores use a variable length instruction set, so each instruction has to be decoded in order to determine the starting location of the next instruction. This makes it easier to build wide issue machines for the Arm architecture than for the Intel x86 / AMD64 architecture. "Issue width" is the number of instructions that can be decoded and issued into execution units each clock cycle.
Details of the cache hierarchy design can also change the performance of multiprocessors by changing the number of clashes that cause waits because if the memory access can be handled by an on-chip cache it will be fast, and lower power consumption, as it doesn't need to go out to SDRAM. As transistor sizes get smaller, it is possible to put more transistors on a chip, allowing larger on-chip caches. Intel's problems in moving to smaller geometry fabrication have caused them to fall a generation or more behind TSMC, which is one of many contributing reasons for Arm processors like Apple's M1 having such good performance and battery life compared to Intel solutions. It is similarly a benefit in servers such as AWS Graviton 2 Arm based servers.
Biting the hand that feeds IT © 1998–2022