* Posts by J_Metz

7 posts • joined 28 Dec 2011

The shifting SANs of enterprise IT: You may have been burned in the past, but live migration is and will be your friend


I've been reading this back-and-forth with interest. It does not appear that you are using the term "NVMe" correctly.

Namely, to whit: "NVMe speeds blows away the fastest interconnets FC has to offer, so you're paying a lot of extra money to get that SAN bottleneck."

NVMe itself doesn't have any speed separate from its transport. Whether it be PCIe, InfiniBand, FC, RoCE, iWARP, NVMe/TCP - all of these are going to have a profound impact on the relative latency and performance envelope for any NVMe solution.

NVMe itself is completely agnostic to its underlying transport layer. That is by design. In fact, as we (the NVM Express organization) began refactoring the specification for 2.0, we noted that there are some table stakes that exist regardless if transport, so we are making it cleaner and easier to develop to that type of architecture.

It sounds as if you are confusing NVMe with PCIe, which is without question the most latency-performant transport for the protocol. However, PCIe fails in terms of scalability. What SANs (assuming you are referring to FC SANs here) provide is a scalability factor and consistency at scale. That is, your 10th node is going to perform as expected as your 10,000th node. Other transports cannot even get to 10k nodes, much less have predictable performance at such a level.

When you are talking about the kind of performance in the sub 100us level, your big concern is not latency, it's jitter. The *variability* in latency is where you are going to get into big problems. That's where the hero numbers fall apart, because the +/- variability is rarely (if ever) mentioned.

So, while the hero numbers do show varying degrees of point-to-point latency, the question will always need to be addressed: What's it like in production?

Either way, however, NVMe will inherent every advantage and disadvantage of an underlying transport, no matter how good or how bad, and does not compensate in any way for either.

The Fibre Channel NVMe cookbook: QED from a storage whizz's POV


Almost there

Greg is a great guy and very knowledgable about storage technology. Unfortunately, there are still a few things that need some correction/clarification.

'NVMe has no concept of controller and LUN; the two concepts are merged in an NVMe "Name Space".' I'm afraid this is simply incorrect. The NVMe Controller is a major part of the NVM Subsystem. It is the NVMe controller that not only creates and maintains the access to the NVMe Namespace (the equivalent of a LUN in NVMe-speak), but is also the structure in a NVMe solution that handles the Queue Pair (QP) relationships.

Not only are NVMe Controllers not equivalent to Namespaces (or combined), it is possible to have "public" namespaces that are accessed by more than one NVMe Controller.

From there, many of the optional features in NVMe (and NVMe-oF) are managed by several controller functions that are indigenous to the relationship between the host and the NVM Subsystem that are independent of the Namespace(s).

"For an array controller, they will layer services behind each NVMe name space (software and hardware)". Given the previous reference to layering services "on top" by way of the storage controller (i.e., not NVMe Controller, but the array controller), this is also incorrect. The architecture for the NVM Subsystem does not place services in between a namespace and the corresponding media, (i.e., "behind the namespace.").

It is true, however, that several companies (both startup and established) have created mechanisms for enhancing services both within and outside of the NVMe controller, but no "array functionality" is inserted between namespaces and media to my knowledge. (Note: this is not to say that someone couldn't create, or possibly doesn't create, a "meta-namespace" in which array functionality could be inserted in between, but this is not a standards-based approach).

"For the array case, for all practical purposes they will advertise multiple Name Spaces, typically one per exported storage area (read this as Controller + LUN). This will be virtualised storage, so each exported storage area will be RAIDed, replicated or erasure-coded." This is mostly true, but with one minor tweak. Storage arrays that do this are likely to treat itself as both the host and the storage target.

That is, the multiple namespaces that it manages will not be presented directly to the hosts. Instead, what happens is that the array controller will often establish individual namespaces with each NVMe drive, which in turn will be aggregated by the array controller (just like you would find with any other disk volume). Then, on the host-facing side, the array will maintain a separate target functionality and present a NVMe target to the corresponding host.

Why is this important? Because the QP relationship between the actual host and the storage media is not contiguous. There are trade-offs - both positive and negative - to doing this, so it's neither a 'good thing' or a 'bad thing.' It's just an architectural implementation choice, that's all. But it is important when comparing to the "distributed file storage" mechanisms that are also discussed here.

With respect to FC-NVMe, "Essentially the effort is to export a virtualised LUN in the form of an NVMe NameSpace, and register this name with the FC switch NameServer; so it can be discovered by FC-NVMe Initiators." This is 100% correct. :)

I would also make a comment about the last paragraph ("Comment+"), but I see there is a clarification here in the comments already. :)

Hey blockheads, is an NVMe over fabrics array a SAN?


Re: No VMware?

Upvoted for the great response.

I agree with many of the points here, but I do wish to caution Hugo@Datrium on a couple of sleight of hand maneuvers. I would suggest going back and reading the first two paragraphs of the article:

'A Storage Area Network (SAN) is, according to Wikipedia, "a network which provides access to consolidated, block level data storage... A SAN does not provide file abstraction, only block-level operations. However, file systems built on top of SANs do provide file-level access, and are known as shared-disk file systems."

'Techopedia says: "A storage area network (SAN) is a secure high-speed data transfer network that provides access to consolidated block-level storage. An SAN makes a network of storage devices accessible to multiple servers."'

Discussing an Array capability in terms of the benefits that a SAN offers is misleading. Conversely, *criticizing* an array because it lacks those SAN features is disingenuous. What's worse, the fact that there are implementation differences within and without the protocol standards means that vendors can (and do) use the protocol as they require to achieve the results they desire.

For instance, let us take your point (2) above. "If one does not use NVMf [sic] to connect to a traditional array, it's controller standsa between the host and the NVMe storage device." It seems to me that if you are *not* using NVMe-oF to connect to an array, the question of whether or not NVMe-oF "is a SAN" is moot. There is no definition of SAN (of which I am aware) that requires an array storage controller.

[Which brings up another point: Part of the problem here is that the word "controller" is used so many times that it becomes difficult to keep these items separated. There is a controller for the FTL, there is a controller for the NVMe subsystem, there can be a controller for the storage array, etc. In SDS implementations, there is a controller for the control plane too. Controllers, controllers everywhere! :)]

Resiliency in NVMe-oF systems is indeed a major question, and I completely agree with most of the caveats Hugo mentions in (3). Once again, though, we have a "shifting goalposts" argument. "I'm not aware of such a

- distributed volume manager

- with erasure coding efficiencies

- in widespread use today."

I break this down specifically like this to illustrate that at any point Hugo can point to that last prepositional phrase, "in widespread use today," as the clincher. Then again, with the nascent timeline of development for NVMe-oF drivers to begin with, claiming *any* NVMe-oF deployment as "widespread use" would be at best, wild-eyed optimism.

Nevertheless, the question at hand is one of the technology, not one of the deployment status. I would argue that at the core, the implementation of NVMe-oF as a SAN is indeed possible (technically speaking), and that such features and benefits of SAN deployments can be transferred over to NVMe-oF - even if it happens to be an eventual achievement.

One note on NVMf vs. NVMe-oF. I agree that NVMf is shorter and while I prefer the shortened method myself, I've come to learn that many companies using the non-standard nomenclature also have non-standardized NVMe-oF implementations. It's becoming a very quick litmus test to determine whether or not the implementation is the standard version or not. So much so, that I automatically make the (often correct presumption) that companies using NVMf in their literature are *not* using standards-based NVMe-oF. Again, it's my perception having worked with and talked to many NVMe-based companies.


No VMware?

Datrium's claim of not having VMware support will be news to VMware, considering they've had support and drivers for NVMe for at least two releases (http://www.nvmexpress.org/vmware/).

The question itself about whether NVMe-oF (the correct abbreviation, by the way) is a SAN is like asking whether or not a shipping container is a ship or not. NVMe (and NVMe-oF) are protocols, and a SAN is an implementation of those protocols.

Fibre Channel is a prime example of how the protocol itself can be configured as a SAN, as a Point-to-Point, or as an Arbitrated Loop. NVMe over Fabrics falls in that same realm of "implementation strategies" which *could* be a SAN, or it *could* be a DAS device, etc.

This becomes incredibly important when weighing and measuring the pros and cons of the protocol (NVMe-oF) over the transport mechanism (RDMA-based) versus the architecture type (SAN, NAS, DAS, etc.).

Disclosure: I am on the Boards of Directors for NVM Express, SNIA, and FCIA, but speak only for myself.

Excelero stumbles, squinting, into sunlight clutching its remote direct-access NVMe kit


This is great news. Excelero has some really great tech and some very impressive stats for their numbers. Moreover, they've managed to avoid some common mistakes in their architecture that give them some extremely good insight into their mesh fabric. Kudos to the Excelero team!

Well, FC-NVMe. Did this lightning-fast protocol just get faster?


Re: show us the numbers, not the marketing slobber

Wait, what?

NVMe over Fabrics is transport-agnostic. Regardless of whether you're using IB, FC, or some Ethernet-based RDMA protocol (or even emerging networking protocols such as Omnipath), NVMe is not "mixed" with FC; it sits atop it.

Untweaked FC-NVMe has been demoed (at Flash Memory Summit) at 27us, down from ~70us for SCSI environments. That's a tech *preview* number, so a GA'd tweaked version would have possibly some slight modification.

"FC is the domain of fat, slow enterprise setups." I honestly don't know how to respond to something so blatantly false. 80% of AFAs are connected via FC, which would not be the case if there was anything even remotely accurate about that statement.

Comparing FC to IB is a true apples-to-oranges situation. In the past several years I have met only one customer who would even *consider* moving from FC to IB. The knowledge requirements for IB go in a completely different direction than most environments wish to take - which is why IB storage is always hidden from administrators. There is simply no interest from people to retrofit their servers from HBAs to HCAs. If they already have NVMe-capable HBAs in their environments, being able to run NVMe side-by-side with SCSI is a compelling argument.

"Started letting you tunnel PCIe over IB." Nope. Not even close. That's not how NVMe or NVMe-oF work. Tunneling PCIe over an RDMA-based protocol makes no sense whatsoever.

"This would actually deliver the kind of latency, concurrency, and scalability that we require from flash." Again, no, it wouldn't. Extending PCIe means that you are accepting all the limitations of PCIe - just putting it over an IB network. You're still limited to 126 devices (I believe 2 addresses are reserved), you're still extending a bus architecture, and you're still completely misusing the NVMe Submission Queue location (local to the host in NVMe, local to the target in NVMe-oF). It's so much unnecessary busywork that it makes no sense whatsoever.

Using native NVMe-oF with IB makes a lot more sense, if you're adamant about using IB. It works well, but you have to face 2 specific and very real problems: you have to find people who understand holistic storage architectures *and* IB (few and far between) and you have to cope with some form of a dynamic discovery protocol. All that is assuming that you happen to have the inclination for running a net-new storage hardware network in your data center, which most people do not.

Cisco: We are NOT lagging behind Brocade


Product Manager, FCoE

One quick correction. I did not say that FCoE was outselling FC. I said that in Q3 in terms of switch ports, FCoE sold more than a quarter million switch ports compared to Brocade's 60k according to Crehan Research. As a whle, FC continues to sell extremely well for both companies.


Biting the hand that feeds IT © 1998–2020