I've been reading this back-and-forth with interest. It does not appear that you are using the term "NVMe" correctly.
Namely, to whit: "NVMe speeds blows away the fastest interconnets FC has to offer, so you're paying a lot of extra money to get that SAN bottleneck."
NVMe itself doesn't have any speed separate from its transport. Whether it be PCIe, InfiniBand, FC, RoCE, iWARP, NVMe/TCP - all of these are going to have a profound impact on the relative latency and performance envelope for any NVMe solution.
NVMe itself is completely agnostic to its underlying transport layer. That is by design. In fact, as we (the NVM Express organization) began refactoring the specification for 2.0, we noted that there are some table stakes that exist regardless if transport, so we are making it cleaner and easier to develop to that type of architecture.
It sounds as if you are confusing NVMe with PCIe, which is without question the most latency-performant transport for the protocol. However, PCIe fails in terms of scalability. What SANs (assuming you are referring to FC SANs here) provide is a scalability factor and consistency at scale. That is, your 10th node is going to perform as expected as your 10,000th node. Other transports cannot even get to 10k nodes, much less have predictable performance at such a level.
When you are talking about the kind of performance in the sub 100us level, your big concern is not latency, it's jitter. The *variability* in latency is where you are going to get into big problems. That's where the hero numbers fall apart, because the +/- variability is rarely (if ever) mentioned.
So, while the hero numbers do show varying degrees of point-to-point latency, the question will always need to be addressed: What's it like in production?
Either way, however, NVMe will inherent every advantage and disadvantage of an underlying transport, no matter how good or how bad, and does not compensate in any way for either.