Re: Core issues
"The one thing that they did not explain is exactly how TLBs on modern processors tend to be architected"
This tends to be part of the secret sauce in the chips - along with the speculative execution, etc - to increase performance. As the paper notes, exactly how the TLBs work isn't fully documented, and they needed to train a classifier to predict it.
This is common: Intel won't tell you, for example, how it distributes cache slices around its CPU core ring bus on large shared caches.
Also, as the paper states, the TLB design changes from microarch to microarch. So I tried to keep it general :-)
C.