Seems to me, the latency reductions will initially come from the host side stack.
An NVMe driver (with streamlined instructions) should reduce latency over the traditional SCSI / FC HBA driver stack. This should be the easy part.
Next up would be the storage end. Would need NVMe direct to storage (flash or whatever) instead of some NVME to SCSI to flash, which would probably be the 1st gen implementation.
Reducing latencies at either end will help, but there would still be the FC fabric limitations / latencies.