Nice to see someone's awake
I've already run into this on my systems. Server has dual 6 core, HT Xeons, 48 GB ECC RAM, PCIe SSD, at least four SSD's in RAID 0 (currently up to 8 on a PCIe x8 controller btw), and you can see that there's some serious choke-points in normal operation, let alone benchmarks. We'll have to revisit pretty much the entire kernel, component by component, to remove constraints introduced to compensate for the impedance mismatches that either no longer exist or require rebalancing. Certainly Linux and the BSD's look good here since we can examine the code. Windows? Ouch.
Meanwhile, I'm looking to upgrade my storage server and the server above to 10Gbps ethernet. Not enough oomph on the storage server to think about anything faster.
Oh yes, NUMA. Makes me reach for a couple of my 750mg aspirins thinking about that in the mix. Still, the concept applied to storage makes sense. What we're down to is mixed IOPS and latencies but not anything where we used to seeing in the differentials. What I do see, now, is that we may (Hell, no may about it!) need to revisit the assumptions and methods in all database products especially when it comes to all the packages we've been toying with for "Big Data."
I need a beer.