Why not just make the L3 cache a bit bigger and faster? Surely by adding another layer of complexity with refetch and prediction, it's going to make the chip that much more complex or (shock horror) slow it all down....
A new cache is needed between memory and the tri-level processor cache structure in servers in order to avoid CPU core wait states. That's the point of a Last Level Cache chip designed by Piecemakers Technology with help from Industrial Technology Research Institute of Taiwan and Intel. As data travels from slow storage to a …
DRAM has been produced at successively smaller scales. This reduces the power requirement per Mb and the latency per Mb. Both improvements have been trashed by the increased capacity. The CPU cache could be made bigger or faster - but that would cost speed or capacity.
Putting the cache on a separate die allows using a process dedicated to DRAM, which is cheaper than using a process that can create DRAM and a CPU on the same die. Also, using a separate die means the DRAM does not suffer from having to work at the CPU die's temperature, allowing the cache to be bigger, faster or cheaper.
at some point it might make sense, particularly on higher end systems, to use the 17ns HBLL in place of existing 30ns DDR RAM. Sure it will cost more. But it would (theoretically) be faster than an L4 cache, at least for edge conditions.
And with volume sales and production comes the potential for rock-bottom pricing in a few years' time.
Or, of course.
Why not a very large number of much smaller cores, immediate processing, no caching.
Chain a large number of cores, each carries out a small subroutine then passes the result.
100,000 cores simultaneously processing, minimal storage, data is always in transit.
If you want a small core - a serial single bit processor,
handles unlimited word length calculations with no overflow, no floating point, a group of cores might preform any one calculation.
Now the cores are so small you could afford millions of cores.
They are so cheap you can afford to have many sitting around waiting for some more data to be crunched.
This is just a different idea.
You're flogging a dead horse with super hot, super big, super complicated processors.
Think of something different.
We're stuck on an architecture over 70 years old that's just been jazzed up to overcome bottlenecks. This is just another bit of jazz, interleaved, wide bus, static ram, fuck who cares, ho hum.