I remember...
When FCoE was clearly the future
The future is unwritten.
A walk through the vast and spacious exhibition arena at HPE Discover in London can bring you to Cavium's stand. There Roberto Angelo Polacsek, a senior account exec, will tell you why he believes NVMe over Fibre Channel will be important. Cavium bought Fibre Channel HBA vendor QLogic recently, and Polacsek says its 16Gbps HBAs …
FC-NVMe is a protocol which can run on FCoE, or a traditional FC SAN. Trying to equate the two is like trying to equate 10GbaseT and cat7.
The case for FC-NVMe is probably better than for FCoE, which had major adoption roadblocks obvious to anyone with a clue due to the tower model of administration that separates the storage and network teams in most organizations. But you'll need to run FC-NVMe alongside traditional FCP (SCSI/AHCI) for years, as there are strong cost reasons to have two storage tiers, with the second using SATA drives for capacity storage, despite the Reg going crazy on NVMe articles lately (kind of reminds me of how they went crazy on all the FCoE articles for a time...)
despite the Reg going crazy on NVMe articles lately (kind of reminds me of how they went crazy on all the FCoE articles for a time...)
This place seems to have a thing for storage in all flavours. Remember all those articles abour Violin and its Pals...
For us mere mortals, we go WTF? when we see the price of a 2TB SATA SSD.
FCoE was not really that great. From a protocol perspective, it had tons of overhead. Reliable Ethernet was absolutely shit because it depended on a slightly altered version of the incredibly broken 802.3 flow control protocol. Add to that that FCoE is still SCSI which actually needs reliable networking and it's a disaster compounded ontop of another disaster.
iSCSI was about 10,000 times better than FCoE since the overhead was roughly the same and it implements reliability at layer 4 which is highly tunable and not network hardware dependent. Add good old fashioned QoS on top and it's better.
Better yet, why not stop using broken ass block storage protocols altogether and support a real protocol like SMBv3 or NFS? They are actually far more intelligent for this purpose.
the entire point of NVMe is latency and concurrency. how does mixing FC into the picture help this? NVMe latency is currently in the <50 us range, which is still pretty slow by IB standards, but what's the latency of FC fabrics? I had a hard time believing that FC, traditionally the domain of fat, slow enterprise setups, is going to suddenly become capable of dropping 2-3 orders of magnitude in its delivered latency.
although fat old enterprise bods might be comfortable with FC, it's completely obsolete: it has no advantages (cost, performance) over IB. I'd be much more interested if Mellanox (the only IB vendor) or Intel (the only IB-like vendor) started letting you tunnel PCIe over IB, so you could have a dumb PCIe backplane stuffed with commodity NVMe cards and one IB card, connecting to your existing IB fabric. That would require some added cleverness in the cards, but would actually deliver the kind of latency and concurrency (and scalability) that we require from flash.
Wait, what?
NVMe over Fabrics is transport-agnostic. Regardless of whether you're using IB, FC, or some Ethernet-based RDMA protocol (or even emerging networking protocols such as Omnipath), NVMe is not "mixed" with FC; it sits atop it.
Untweaked FC-NVMe has been demoed (at Flash Memory Summit) at 27us, down from ~70us for SCSI environments. That's a tech *preview* number, so a GA'd tweaked version would have possibly some slight modification.
"FC is the domain of fat, slow enterprise setups." I honestly don't know how to respond to something so blatantly false. 80% of AFAs are connected via FC, which would not be the case if there was anything even remotely accurate about that statement.
Comparing FC to IB is a true apples-to-oranges situation. In the past several years I have met only one customer who would even *consider* moving from FC to IB. The knowledge requirements for IB go in a completely different direction than most environments wish to take - which is why IB storage is always hidden from administrators. There is simply no interest from people to retrofit their servers from HBAs to HCAs. If they already have NVMe-capable HBAs in their environments, being able to run NVMe side-by-side with SCSI is a compelling argument.
"Started letting you tunnel PCIe over IB." Nope. Not even close. That's not how NVMe or NVMe-oF work. Tunneling PCIe over an RDMA-based protocol makes no sense whatsoever.
"This would actually deliver the kind of latency, concurrency, and scalability that we require from flash." Again, no, it wouldn't. Extending PCIe means that you are accepting all the limitations of PCIe - just putting it over an IB network. You're still limited to 126 devices (I believe 2 addresses are reserved), you're still extending a bus architecture, and you're still completely misusing the NVMe Submission Queue location (local to the host in NVMe, local to the target in NVMe-oF). It's so much unnecessary busywork that it makes no sense whatsoever.
Using native NVMe-oF with IB makes a lot more sense, if you're adamant about using IB. It works well, but you have to face 2 specific and very real problems: you have to find people who understand holistic storage architectures *and* IB (few and far between) and you have to cope with some form of a dynamic discovery protocol. All that is assuming that you happen to have the inclination for running a net-new storage hardware network in your data center, which most people do not.
Infiniband has nothing like the toolset available for FC. Can you even create a simple zoneset for an IB fabric? I'm sure that could all be added, but FC still has the advantage of compatibility with all the current stuff. Enterprises aren't going to toss out all their FC gear to make everything run IB just to get a small latency benefit - especially if FC, despite being slower, is still more than "fast enough".
It took many years for SCSI to be displaced by FC, and it happened from the top down. The same will be true for FC (and it may not be IB that displaces it, IB could turn into a dead end like FCoE) Whatever displaces FC will do it on high end projects that require that minimum possible latency, and slowly work its way down to more ordinary deployments as the feature set fills out to replace all the things FC can do.
Good to see this HPE announcement re FCNVMe... Storage vendors being (like their enterprise customers) a bit more conservative than your average tech start-up, they wait until their products have seen some early testing before they start pitching... Add in the need for some back-end architecture tweaks and it's very predictable that FCNVMe products announcements would lag other NVMe-over-Fabric announcements. For sure, buyers need more than "marketing slobber", but real-world numbers (more stable than the respectable 27us J_Metz mentioned) cannot be demonstrated until a critical mass of real-world products are at least in beta. These are beginning to emerge; similar upcoming announcements will show that enterprise storage buyers will be able to smoothly transition from FC-SCSI performance to competitive NVMe performance without having rip/replace their familiar, reliable, FC SANs. But for few more months, until more products are out, have to live with the slobber.
The vast majority of FC installations are 8Gb/s with only newer ones going faster (we dropped FC a couple of years ago)
You're going to need a lot of FC ports on your devices to match the bandwidth of the average consumer NVMe drive, let along something like the higher end stuff like P3700 and friends.
WRT latency: FC on fibre has latencies which come close to matching NVMe, but any kind of fabric is going to impose a substantial penalty vs direct connect.
Ok... this is 2016 almost 2017... WE DON'T SEND RAW BLOCK REQUESTS TO STORAGE!!!!
Let's make this very clear, SCSI and NVMe are the dumbest things you could ever put in the data center as an interconnect. When we used to connect physical disks in an array to the fabric, they wouldn't have sucked so bad. But now, we have things like :
1) Snapshots
2) Deduplication
3) Compression
4) Replication
5) Mirroring
6) Differencing disks
There are tons of nifty things we have. SCSI and NVMe are protocols designed to talk to physical storage devices not logical ones. There are two needs when talking to a storage array :
1) a VM is stored on the array
2) a physical host is stored on the array
When you install 5-500,000 physical hosts with VMware, Linux or Windows, you will use the exact same boot image with a fork in the array. This is REALLY REALLY easy and with some systems (like VMware) which can do stateless boot disks, you can use the exact same boot image without forking at all.
When you install 5 or 50 million virtual machines you do roughly the same thing. Clone an image and run sysprep for example.
What does this mean? The hosts or virtual machines DO NOT talk directly to the disks and therefore don't need to use a disk access protocol. Instead, a network adapter BIOS or system BIOS able to speak file access protocols will be far more intelligent.
There is simply no reason why block storage protocols should EVER be on a modern data center's network. Besides being shit to begin with (things like major SCSI SNAFUs) block storage protocols generally don't provide good security, they don't scale and you end up building impressively shitty networks... I'm sorry fabrics in pairs because FC routing never really happened.
iSCSI almost doesn't suck... but it's just an almost.
People are saying "NVMe is about latency..." blah blah blah... no it isn't. It's about connecting Flash memory to motherboards. It's basically PCIe. It's a system board interconnect. It is not a networking protocol and should never be used as one.
If QLogic is actually bent on making something that doesn't suck... why not make an Ethernet adapter which supports booting from SMBv3 and NFS without performance issues? I should be able to saturate a 100Gb/s network adapter on two channels when talking to GlusterFS or Windows Storage Spaces without using any CPU.