I see the 80s are back again.
Slots in the back face of microchips for cooling. See the Petersen review article ("Silicon as a [micro?]mechanical material) IEEE. stacked processing chips would be the Hughes airborne computer efforts. IIRC the ultimate goal was a sensor layer (UV, IR, vis whatever) with multiple layers underneath for memory and processing. The plan was to use spots of tin on the chip surface and a front to back temp gradient to drive in the tin and create a front to back conductive channel.
They might also look at what happened when Gene AMdahl set up "Trilogy" to do wafer scale integration. Teh eky problem (which broke them) was the inability to find a way to make wafers which could cut the failed sections out of path and leave the rest running, at a reasonable cost.
Note that current liquid cooling methods, most of which seem to use water in a heat pipe arrangement, which is controlled boiling.
I would suggest that all of this is a bit of a red herring. The ultimate source of *most* of that heat are the clock driver transistors to distriibute the upteen Ghz system clock, *regardless* of wheather or not that particular section is actually even operating.
Putting more chips together in closer space merely means they will swast even more energy in a confined space.
If you want lower power you implement asynchronous (clockless) systems l ike the manchester ARM develeopments or the design libraries of Phillips.
People will only start looking at this when soemwone works out a way to sell clockless processors while differentiating different grades (IE cost) based on some sort of parameter people can compare. Some kind of agreed "throughput" measure would be reasonable, but the time from reading some values from off-chip RAM to writing (to off-chip RAM) is likely to rather longer than the sub-ns duration of a clock cycle.
AFAIK the limitations on speed up bought by parellel processing have not gone a way. Somewhere in the 10-16 processor area is where the shared memory approach hits the skids. There were very good reasons why the transputer was conceived as it was. People who ignore them are asking for trouble.
As for "Neuronal density," individual transistors are long past that in the 2d sense. The *huge* relative thickness of dead silicon substrate is the problem here. All the real action is within roughly <5microns of the surface.
And of course making a unit mimic the actual action of a true net of neurons and axons is another matter.
Mines the one with Carver Mead's Analogue VLSI circuit design in it.