NVMe is a protocol for block storage across the PCIe bus. Like SCSI, it is intended as a method of storing block in a direct connected system and assumes lossless packet delivery. When FibreChannel came around, SCSI drives could be placed in a central system allowing the physical drives of a server to be located in a single box. When this happened FC was designed to deliver the SCSI QoS requirements across fiber.
A few brilliant engineers got together and found out they could provide virtual drives instead of physical drives over FC and iSCSI while still placing the same demands on the fabric to support SCSI QoS.
This is where things begin to go wrong... people wanted fabric level redundancy as well. This meant designing an active/standby solution for referencing the same block devices. The problem is, SCSI and now NVMe are simply not a good fit for this.
1) The volumes (LUNs) being accessed as block storage ARE NOT physical devices. They are files stored on file systems.
2) The client devices accessing the LUNs ARE NOT physical computers with physical storage adapters. They are virtual machines with virtual storage devices.
3) The computational overhead to simulate a SCSI controller in software, then translate the block numbers from the virtual machine to a reference in a VMFS or NTFS file system then look up the virtual block to reference in the virtual file system, convert that reference to a virtual file position, then lookup that block within a virtual file, translate that block to a physical block and the perform everything in reverse is wasteful and consumes power and slows everything down. In addition, it severely limits scalability.
4) Dual ported storage exists to compensate for limitations in block based storage. It would be far more intelligent and cost effective to plug a large number of single ported drives into a PCIe switch and then multi-master the PCIe bus. This technology dates back 20 years and is solid and proven. The problem is, PCIe is too slow for this. When facing NVMe and new storage technologies, the bus would max out at about 32 NVMe devices.
5) Scale out file servers simply scale out better than controllers. SCSI and now NVMe really can't probpey scale past two controllers and since NVMe and FC lack multicast, performance is simply doomed.
The solution is simple... build out either :
2) Windows Storage Spaces Direct
Build up each storage node with hotest(NVMe)/hot(SATA SSD)/cold(spinning disk)
Build 3 or more nodes
FC (if needed)
Use proper time markers (not snapshots) for backup.
Be happy and save yourself millions.
PS - Hyper-V, OpenStack, Nutanix and more have this built in as part of their base licenses.