back to article SK Hynix boss predicts CPUs and RAM will merge, chipmakers will hold hands to make it happen

The CEO of SK Hynix, the world’s second-largest memory manufacturer behind Samsung, has tipped the merger of RAM and CPUs, and the rise of the Compute Express Link standard. Delivering the keynote at the Institute of Electrical and Electronics Engineers’ International Reliability Physics Symposium (IRPS), CEO Seok-Hee Lee …

  1. John Savard

    A Good Idea

    Since a larger cache is one way to improve the performance of a CPU, having a wider-bandwidth pathway to memory would obviously bring benefits. Look at the innards of the NEC SX-9 supercomputer, which ties each of its CPUs directly to sixteen DRAM modules.

    So not having to worry about pin count, not having to drive an external interface, would be greatly beneficial.

    The problem, though, with going all the way to putting everything on a single die instead of just some type of module, the way HBM does already, is that die sizes are limited. Putting, say, eight cores and 16 gigabytes of DRAM on one die isn't likely to be possible for some time.

    Of course, though, one thing chipmakers are looking for is a way to eliminate die size as a constraint. If you could have a multichip module where the connections between dies were essentially indistinguishable from on-die connections, imposing no additional delays or requirements for driviing, then, while major units like CPU cores would still have to be within a single die, cache and memory on other dies would be as good as on the same die.

    1. Pascal Monett Silver badge

      Re: A Good Idea

      Not to mention that 16GB may be enough for many, but I need 32GB and I'm looking forward to upgrading my PC to 64GB by next year.

      With that system, I will need to by a 64GB version of whatever CPU I choose. I'm not sure they'll be making those.

    2. Mike 137 Silver badge

      Re: A Good Idea

      "connections between dies were essentially indistinguishable from on-die connections, imposing no additional delays"

      Difficult to achieve. Bond wires have significant inductance at the frequencies we're discussing, which is why DIL chips have widely been replaced by small surface mount packages, and ultimately why we already need ball contact CPUs. So multiple chips gets round the fabrication problems but can degrade speed. It also doesn't help much with dissipation as the entire package is the radiator in most cases.

  2. rtfazeberdee

    Transputer come to mind?

    parallel computing again?

    1. Doctor Syntax Silver badge

      Re: Transputer come to mind?

      My thought exactly. My reactions was simply "Again?".

    2. Roland6 Silver badge

      Re: Transputer come to mind?

      Also waferscale integration - where many of the key patents have long expired.

  3. Mike 137 Silver badge

    Ultimately, speed will increase further in Computing in Memory(CIM)

    Heat will increase too. Heat generation is a function of switching speed and there's an ultimate limit at which a single die can't dissipate heat fast enough to stay within safe temperature limits because heat generation is volumetric (the third power or cube) but dissipation is areal (the second power or square). This is well known - indeed a colleague wrote a PhD thesis on it in the mid-80s, and I think we're getting near that limit already.

    Admittedly, there are a few use cases where blindingly fast hardware is necessary, but if we improved the currently appalling efficiency of code, it would reduce the need for raw hardware speed by quite a lot in most cases.

    1. nintendoeats

      Re: Ultimately, speed will increase further in Computing in Memory(CIM)

      That's all well and good, but the fact that memory access is so slow is a problem for writing efficient code.

      Yesterday I was discussing implementing a large data set as a linked list so it would be very efficient to break off chunks of it and move them to over objects. He point out that the disadvantages of memory fragmentation were so significant (largely because the data won't be loaded into cache together) that it was usually better to just put the data in an array and accept the cost of copying things when it happens.

      Yes there is alot of inefficient gibberish in the world (interpreted languages still exist), but even when you are trying; the unbalanced nature of modern hardware performance heavily favors some practices that should logically be slower.

      1. Brewster's Angle Grinder Silver badge

        Re: Ultimately, speed will increase further in Computing in Memory(CIM)

        x86 only loads 64 bytes of data at once ("cache line"). So if your array is bigger than that, the linked list approach would probably work. (But that data has got to be 64-byte aligned.)

        With modern CPUs, complicated algorithms that cram data into 64 bytes often perform better than algorithms which eat fewer cycles. Obviously YMMV

  4. Shadow Systems

    I'm left wondering...

    What about DIY system builders that find the best components to do a specific job, including the ability to add more RAM at a later point to maintain system viability ("future proofing"), but then won't be able to do that because the CPU+RAM is all one unit that would require upgrading the entire package rather than just the bit (RAM) that needs the upgrade?

    I mean, if I've got a 5GHz CPU that still does what I need, but I want to go from 32 to 64GB of RAM, will I end up needing to buy an entire new motherboard, CPU+RAM package, & all the newer add-on cards to go with the new motherboard "improvements" used to screw us out of our money for no damned good reason other than they can?

    I don't see DIY enthusiasts scrambling to acquire the CPU+RAM over separate CPU & RAM components if it means having to buy all new hardware every time they want to upgrade.

    *Hands a pint to whomever can answer this in terms we non-theoretical-electrical-engineers can understand*

    =-J

    1. ITengineer

      Re: I'm left wondering...

      I don't know what they are referring to but it sounded to me like he may be talking about an architecture like the one Upmem have developed where there are additional CPU cores on the dimm. These cores are simple and distributed with each group of cores only having access to a subset of the memory but combined they have a very high aggregate memory bandwidth because they avoid the bottleneck of the bus.

      You could then still upgrade your CPU and RAM separately but your RAM upgrade comes with additional horsepower.

    2. David Hicklin Bronze badge

      Re: I'm left wondering...

      Or your 16GB of on-chip ram becomes part system/part huge cache

    3. Roland6 Silver badge

      Re: I'm left wondering...

      > but I want to go from 32 to 64GB of RAM, will I end up needing to buy an entire new motherboard, CPU+RAM package

      There is paging... yes, 64GB as on-die RAM is going to be slightly faster than 64GB as 32GB on-dia RAM and 32GB off-die RAM (aka RAM disk), but I suspect for most workloads the difference won't be noticeable.

    4. Anonymous Coward
      Anonymous Coward

      Re: I'm left wondering...

      or just add another cpu+memory module

  5. Anonymous Coward
    Anonymous Coward

    Wot! Apple innovating?

    {looks at calendar} Nope... it is not April fools day yet.

    Seriously, they do seem to have got this right and I'd expect many others to follow them. I just have some reservations about how it will scale up to what we know of as 'Big Iron'. In-memory databases will fly with this sort of thing if the memory is big enough.

    1. NeilPost Silver badge

      Re: Wot! Apple innovating?

      Apple M1 is 16 Gb on chip.

      It won’t be long until M4 is perhaps 128Gb.

      Apple could easily diversify into chucking these into compute blades and say throwing some OSX/BSD/Unix/Linux HADOOP or Kubernetes/Containers at it and everyone else is floundering.

      I think even AWS would be interested in some Apple fab’d M1 ARM compute blades as they would give it’s in-house Graviton 2 a run for the money.

  6. John Robson Silver badge

    For the relatively few people who upgrade RAM

    Because let's face it that is a minority most of the time...

    Directly exported PCIe lanes seem to offer a reasonable option - after all both Intel and AMD are already "sharing" the GPU memory when they can.

    It will be slower than the on chip stuff, but it has ever been thus - consider it 16GB of cache, and then plug in "RAM" as needed. Could even take that memory between systems.

    1. NeilPost Silver badge

      Re: For the relatively few people who upgrade RAM

      Given 8-16Gb is entry level these days unless you are a bonkers power user RAM is not a constructing factor that much these days.

      My new Thinkpad came with an i7 and 16Gb. No real need to upgrade. Similar for even entry level M1 Mac’s now.

      1. Roland6 Silver badge
        Pint

        Re: For the relatively few people who upgrade RAM

        >My new Thinkpad came with an i7 and 16Gb. No real need to upgrade.

        Well, I would upgrade it to a Thinkpad with a Rzyen 7 Pro and 16GB... :)

  7. guyr

    HBM?

    I thought HBM was designed to address this problem. You'll never be able to solve this problem generally via merging memory into the CPU, because different applications need different amounts of memory. The latest Intel Xeon Cascade Lake has "only" 38.5 MB of onboard cache. So, figuring out how to expand HBM seems to be a better expenditure of time, energy and money.

    1. Brewster's Angle Grinder Silver badge
      Trollface

      Re: HBM?

      You mean if you need lots of memory we might have to sell you a rare, super-expensive CPU? I fail to see the downside here! The only snag is convincing buyers to accept it.

      We definitely need a "K-ching!" icon

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like