back to article Tintri, thrown on the El Reg grill: We'll support NVMe! We promise!

El Reg has been quizzing array vendor after array vendor on their views about the technology change from SAS/SATA to NVMe flash drives and to NVMe over Fabrics array access. Today it's the turn of Tintri, and it thinks NVMe is an important technology watershed, but not as huge as the change from disk to solid state storage – the …

  1. CheesyTheClown

    NVMe fail

    NVMe is a protocol for block storage across the PCIe bus. Like SCSI, it is intended as a method of storing block in a direct connected system and assumes lossless packet delivery. When FibreChannel came around, SCSI drives could be placed in a central system allowing the physical drives of a server to be located in a single box. When this happened FC was designed to deliver the SCSI QoS requirements across fiber.

    A few brilliant engineers got together and found out they could provide virtual drives instead of physical drives over FC and iSCSI while still placing the same demands on the fabric to support SCSI QoS.

    This is where things begin to go wrong... people wanted fabric level redundancy as well. This meant designing an active/standby solution for referencing the same block devices. The problem is, SCSI and now NVMe are simply not a good fit for this.

    1) The volumes (LUNs) being accessed as block storage ARE NOT physical devices. They are files stored on file systems.

    2) The client devices accessing the LUNs ARE NOT physical computers with physical storage adapters. They are virtual machines with virtual storage devices.

    3) The computational overhead to simulate a SCSI controller in software, then translate the block numbers from the virtual machine to a reference in a VMFS or NTFS file system then look up the virtual block to reference in the virtual file system, convert that reference to a virtual file position, then lookup that block within a virtual file, translate that block to a physical block and the perform everything in reverse is wasteful and consumes power and slows everything down. In addition, it severely limits scalability.

    4) Dual ported storage exists to compensate for limitations in block based storage. It would be far more intelligent and cost effective to plug a large number of single ported drives into a PCIe switch and then multi-master the PCIe bus. This technology dates back 20 years and is solid and proven. The problem is, PCIe is too slow for this. When facing NVMe and new storage technologies, the bus would max out at about 32 NVMe devices.

    5) Scale out file servers simply scale out better than controllers. SCSI and now NVMe really can't probpey scale past two controllers and since NVMe and FC lack multicast, performance is simply doomed.

    The solution is simple... build out either :

    1) GlusterFS

    2) Windows Storage Spaces Direct

    3) Lustre

    Build up each storage node with hotest(NVMe)/hot(SATA SSD)/cold(spinning disk)

    Build 3 or more nodes

    Run

    NFS

    ISCSI

    SMBv3

    FC (if needed)

    Use proper time markers (not snapshots) for backup.

    Be happy and save yourself millions.

    PS - Hyper-V, OpenStack, Nutanix and more have this built in as part of their base licenses.

    1. taliz

      Re: NVMe fail

      "The solution is simple... build out either :

      2) Windows Storage Spaces Direct"

      Have you even tried S2D? Its a pile of garbage. Microsoft has never delivered anything proper in the storage world, and S2D is their latest failure.

      Hyper-V is terrible compared to ESXi as well. We recently had a whole Hyper-V host crash, taking all the VMs with it, just because a customer formated a drive INSIDE their VM.

      We have a few thousand VMs on Hyper-V, and tens of thousands on ESXi. Guess which plattform requires 90% of the admin's time?

  2. Anonymous Coward
    Anonymous Coward

    1) The volumes (LUNs) being accessed as block storage ARE NOT physical devices. They are files stored on file systems.

    No. They *can* be stored as files on a filesystem, but they don't have to be. They could instead be direct access to logical volumes such as LVM, and the overhead of that is very small.

    3) The computational overhead to simulate a SCSI controller in software

    ... is irrelevant. If the clients are virtual machines then they'll expose block devices as virtio devices. There's no need to emulate a SCSI controller anywhere in the chain.

    The problem is, PCIe is too slow for this. When facing NVMe and new storage technologies, the bus would max out at about 32 NVMe devices.

    What "bus" are you talking about? The throughput of PCIe is determined by how many lanes you use. You can in principle build a switch fabric with as many lanes "in" and "out" as you like.

    since NVMe and FC lack multicast, performance is simply doomed

    Riiiiight. We all use multicast to our storage servers. Not.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like