back to article Highway to HBLL: The missing link between DRAM and L3 found

A new cache is needed between memory and the tri-level processor cache structure in servers in order to avoid CPU core wait states. That's the point of a Last Level Cache chip designed by Piecemakers Technology with help from Industrial Technology Research Institute of Taiwan and Intel. As data travels from slow storage to a …

  1. Alan Sharkey


    Why not just make the L3 cache a bit bigger and faster? Surely by adding another layer of complexity with refetch and prediction, it's going to make the chip that much more complex or (shock horror) slow it all down....


    1. Flocke Kroes Silver badge

      Re: Why?

      DRAM has been produced at successively smaller scales. This reduces the power requirement per Mb and the latency per Mb. Both improvements have been trashed by the increased capacity. The CPU cache could be made bigger or faster - but that would cost speed or capacity.

      Putting the cache on a separate die allows using a process dedicated to DRAM, which is cheaper than using a process that can create DRAM and a CPU on the same die. Also, using a separate die means the DRAM does not suffer from having to work at the CPU die's temperature, allowing the cache to be bigger, faster or cheaper.

  2. Ken Hagan Gold badge

    Might DRAM be squeezed out altogether?

    With this stuff pushing from one side and XPoint pushing from the other...?

    1. bombastic bob Silver badge

      Re: Might DRAM be squeezed out altogether?

      at some point it might make sense, particularly on higher end systems, to use the 17ns HBLL in place of existing 30ns DDR RAM. Sure it will cost more. But it would (theoretically) be faster than an L4 cache, at least for edge conditions.

      And with volume sales and production comes the potential for rock-bottom pricing in a few years' time.

      1. Colin Tree

        Re: Might DRAM be squeezed out altogether?

        you've never bought fast static ram, have you ?

        go on, how much $more

  3. Mage Silver badge


    What about a smaller CPU (less transistors, like ARM) and then a really big L1 cache?

    1. Colin Tree

      Re: Or

      Or, of course.

      Why not a very large number of much smaller cores, immediate processing, no caching.

      Chain a large number of cores, each carries out a small subroutine then passes the result.

      100,000 cores simultaneously processing, minimal storage, data is always in transit.

      If you want a small core - a serial single bit processor,

      handles unlimited word length calculations with no overflow, no floating point, a group of cores might preform any one calculation.

      Now the cores are so small you could afford millions of cores.

      They are so cheap you can afford to have many sitting around waiting for some more data to be crunched.

      This is just a different idea.

      You're flogging a dead horse with super hot, super big, super complicated processors.

      Think of something different.

      We're stuck on an architecture over 70 years old that's just been jazzed up to overcome bottlenecks. This is just another bit of jazz, interleaved, wide bus, static ram, fuck who cares, ho hum.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like