* Posts by Steve Chalmers

42 publicly visible posts • joined 26 May 2015

Micron: We're pulling the plug on 3D XPoint. Anyone in the market for a Utah chip factory?

Steve Chalmers

Write Endurance was the failure

The whole point of persistent byte addressable memory is merging memory and storage. Do it right and the persistent memory chipmakers take tens of billions of dollars per year of revenue from the DRAM makers. At the application level, this means user code can write to memory and persist information in the time it takes to do a cache flush, not the time it takes to send a write down the NVMe storage stack.

When the Micron/Intel technology only hit about 10^6 cycles of write endurance, that meant it couldn't be a DRAM replacement as seen from application software, because there had to be wear leveling even more aggressive than flash.

Which meant it was in the I/O path, not the memory path, out of the CPU, and had long latencies compared to DRAM.

It had failed at that point. I salute the Intel people who made a market for 3DXpoint anyhow. But the victory was hollow and the technology should have failed the basic sanity checks before a fab was built.

'It's dead, Jim': Torvalds marks Intel Itanium processors as orphaned in Linux kernel

Steve Chalmers

The processor chess game was won long before...

Processor instruction set architectures have always been a critical mass game, not a technical merit game.

The x86 won even though its instruction set and memory model were a piece of shit (I designed with it in 1978). It won due to the IBM PC design in, which in turn got the PC compatible design in, which in turn got the Compaq server line design-in.

In the chess game, Intel got one move from 32 to 64 bits (due to available industry software design resources, the same constraint Windows got to first, sending IBM's subsequent OS/2 into oblivion). Presumably because the x86 memory model and instruction set were so clumsy, Intel spent its move on a fresh sheet of paper architecture (I knew a lot of the Itanium architects at HP but was not one myself). But AMD almost immediately checkmated Intel by pushing an upward compatible 64 bit extension of x86 -- a much easier compatibility story for customers and software developers -- and Intel had no choice but to respond by following. At the instant Intel made the decision to respond to AMD, Itanium predictably had no path to critical mass, for business reasons.

Linus Torvalds was very kind to wait until after Bill Worley's death to make this final decision.

Xerox to nominate up to 11 directors to HP's board in hostile takeover push – report

Steve Chalmers

Re: Has this ever worked?

Yes. Industry consolidation acquisitions, where the industry itself is shrinking (or at least not growing) and one of the companies involved is very clearly not going to be a long term survivor in its segment.

HP (the combination of HP Ink and HPE) was very, very good at industry consolidation acquisitions, however painful those may be to the employees and to the communities which are affected.

It strikes me as chutzpah on Xerox's part to take the role of hostile acquirer, rather than has been acquiree. I hope that if this moves forward the institutional knowledge of the "clean room" process used in the HP/Compaq merger is applied in the same evenhanded and objective (and make the damn decision now so we can tell our customers and people on day 1 what's happening) way it was 20 years ago.

HPC botherer DDN breaks file system benchmark record

Steve Chalmers

Sometimes those results are about how much money you're willing to spend

I'm very impressed by the DDN result, but would point out that DDN was running 72 SSDs inside vs E8 running just 24.

Back the last time I ran a team doing a benchmark of this kind, about 20 years ago, (1) we had to tie up over a million dollars worth of equipment for months, and (2) we had a competitor who had quietly made it clear that if anyone beat their number, they'd just come back with more equipment and win again. This looks like a much saner benchmark :)

Decoding the Chinese Super Micro super spy-chip super-scandal: What do we know – and who is telling the truth?

Steve Chalmers

Re: A couple of other points

Wonder if there's a peer exploit putting a little cell phone in, say, the power supply, to get to the outside world from an air gapped system. Yeah, I know, a bit harder in a shielded room, but sure seems like thinking one could "phone home" with IP traffic is a little optimistic if the target had any sophistication or was of any value at all.

Steve Chalmers

Re: It's simpler

The only case where this makes sense is if the attacker knows the server is going into a classified environment where as a matter of course every single bit of code on the motherboard will be wiped and reloaded before the server is installed. The hardware technique would allow the server to be re compromised after it was thought to have been scrubbed.

Steve Chalmers

Re: Still Reason to Worry

Methinks the only time it makes sense to embed a chip would be if the server were destined for a classified facility which would wipe and reload (from trusted binaries) every single byte of code on the motherboard.

The hardware strategy would then allow the board to be re-hijacked after it was thought to have been wiped and reloaded.

We may be hearing true story #1 about what happened, and true story #2 about where something else like simple substitution of code for a management processor occurred, but the two stories are mashed up to signal to the perpetrators that the attack is known without disclosing to anyone else where the attack actually occurred.

Now if the perpetrator could only control the motherboard model supplied in a bulk order to SuperMicro, and only some of those boards went to my hypothetical classified site, then many other such boards could have gone to many other customers, either sitting silent or making mischief, which could be the source of a true but irrelevant statement on the number of end customers who got hardware compromised boards.

Just thinking and speculating, no inside knowledge (and no clearance any time in my life) here.

NetApp puts the pedal to the metal with Plexistor

Steve Chalmers

We live in interesting times

Would be interesting to see what ZUFS could do with shared Gen-Z memory across a rack or row (say 250 or 300ns latency with a native CPU interface in the early to mid 2020s, for read or write).

Of course, at those latencies, the difference between coming out the CPU's memory pipeline vs its I/O pipeline is significant, as is the management of cache hierarchy (flushes vs noncacheable regions, etc).

These are interesting times.

Up the stack with you: Microsoft's Denali project flashes skinny SSD controllers

Steve Chalmers

Full Circle, and control of tail latency

Amusing to think that when I started in computing 40ish years ago, disk drives were addressed by cylinder-track-sector and the operating system was responsible for bad blocks, address mapping, and the like. We, ummm, kinda got away from that because when servers went from 2 or 3 disks to 200 or 300 disks in the 1990s the higher level abstraction SCSI provided was a relief, and oh by the way that OS code became a single point of failure (and never handled multiple concurrent disk failures well anyhow).

Of course, abstracting a lot of that detail into a disk drive got us a tail latency problem (my favorite was thermal recalibration of a drive, where the app was running just fine and then out of the blue one of the drives decided to take the better part of a second for internal housekeeping, and the app can wait thank you, in an era when server OS'es tended to blue screen if disks they depended on went away for mere seconds). And SSD's just magnify the tail latency, in an era where applications are far less able to tolerate it. Hence a server, with a handful of embedded SSDs, wanting to onload anything that causes tail latency back where it can be understood and controlled.

There be dragons here: I spent much of the last couple of years of my career helping out thinking through how Gen-Z (www.genzconsortium.org) would be used by the combination of servers, storage, and networking in the data center. The best and highest use of byte addressable storage class memory, once the write endurance of parts is 10^15ish rather than 10^9ish is to allow applications to read and write persistent memory directly (through hardware address mapping and protection, in the style of the way server memory has been protected for the last 30 years) rather than through a storage stack. (1) Creating a requirement for a kernel crossing to read or write persistent memory risks making the OS king of the hill while wear leveling is still required, and ossifying an obstacle to a massive performance change once write endurance improves; and (2) I remember all the code in OS's "SCSI Services" layers and the like running elevator algorithms to reorder I/Os to optimize IO/sec by limiting disk seeks...still consuming CPU cycles in the 2000s when disk arrays had had very good caches for decades. What a waste of path length and CPU cycles (which of course was finally optimized out for NVMe).

Ethics? Yeah, that's great, but do they scale?

Steve Chalmers

The IEEE code of ethics might be a good start

https://www.ieee.org/about/corporate/governance/p7-8.html

As a hardware engineer, I had an IEEE membership until retirement, with this code of ethics. Not perfect, but has evolved over decades and touches on all the pertinent topics save privacy and security.

Apeiron demos 'rocket ship' Intel Optane array tech

Steve Chalmers

The storage benchmark is dead. Long live the benchmark!

Congrats Apeiron on "breaking the benchmark" for storage, both the basic IO/sec and GB/sec measurements, which have been in place for about 50 years.

I would guess the next benchmark will be one of latency measured at the application (ie in user space, as in the way HPC communication is measured), and the winning number there will start at 1us for a storage read or commit-write. Any guesses who the competitors in this new age will be?

Michael Dell? More like Michael in-Dell-nial: No public cloud, no future

Steve Chalmers

Re: Growth and Business Value Add

@ "Shameless Oracle Flack"

I hold Oracle in the highest regard, as a software company which has both served customers well and monetized that service well over a period of not just decades but generations.

That said, the pricing of Oracle's database on virtualized servers, much less converged infrastructure or cloud, has become difficult enough to police that perhaps offering Oracle's database as an SaaS product in its own cloud (which then provides IaaS for all the services built around and over Oracle's database) is a brilliant solution. Allows some of those hardware contributions the Sun engineering team has up its sleeve to come to market in a much more controlled (cheaper to release and support) way than a general release to Enterprise.

But let's not pretend that Oracle as a company has a business model which allows a large capital investment in a commodity IaaS business in direct competition with AWS, much less the culture or expertise to match the cost structure of that IaaS business. This is an SaaS business, at a much higher cost and pricing structure than IaaS, to the point where the hardware the SaaS runs on could be oursourced to a provider like AWS. Not generic VMs, of course, but rather an isolated subnet of suitably custom servers+storage.

I think Oracle's message would resonate better presented that way...

Steve Chalmers

Pendulums swing

C'mon Chris, you've been around the industry long enough to know that technologies and business models come and go, in a pattern where the new and exciting (well, the ones which don't fizzle, which is most of them) become darling growth businesses, reach maturity, and then gracefully decline over a period of decades.

Comparing public cloud (the current darling) with traditional server, storage, and network (all of which are clearly at maturity if not beyond) makes for good journalism if not click bait headlines (you're better than that), but it's just illustrating what we all knew 5 years ago playing out as expected.

Public cloud, if you look inside, is just the endgame for x86 processors, in server storage, and cheap networking, delivered not just with an extraordinarily sharp pencil on equipment cost, but also with extraordinary automation and economies of scale on the management side, beyond what Enterprise was ever able to achieve. But doing this has required investing an enormous level of capital in a narrow way of doing business, which will have to stay in place for a long time to earn the shareholders a proper return on that investment.

So what's next? What will leapfrog the thing that today's public cloud has fine tuned, causing the public cloud vendors to need to extract marginal revenue from their now captive customers, just as (say) IBM needs to in mainframes, or even Oracle needs to as on-prem Enterprise matures? Will the pendulum swing back from public cloud to on-premises, due to outages, or security risks, or attacks on the Internet backbone? Will byte addressable storage class memory chips enable new architectures which break the 50 year old boundaries between server, storage, and network as I have predicted? Will Intel make a series of pricing mistakes, ushering ARM into the data center? These are all hard to predict.

So I see three possible outcomes:

1. Dell is running the consolidation play, buying companies in mature segments for peanuts and then firing people and wringing profits out of them on the slow path to oblivion (I think you called this the Unisys play, but that takes a very backward looking view of what the Unisys folks have achieved over the last 30 years).

2. Dell is using the legacy businesses as a source of cash (as a private company cash flow is way more important than P&L), and will use some of that cash to fund carefully selected breakthrough technologies in fields its sales force can sell. Glances at the changes in storage which could come from byte addressable storage class memory chips, used wisely.

3. Michael Dell won the first time around not on product (they were OK, but nothing special) but by building the best, most efficient supply and distribution chains in the world. It may be that he intends to build a supply and distribution chain for private cloud, to make private cloud competitive with AWS...or for some new approach that hasn't occurred to me yet.

But don't listen to me, I'm just another prematurely retired ex-HPer.

Stealth-cloaked startup claims to be developing super-fast arrays. How fast? Well...

Steve Chalmers

Speculation

How about this as sheer speculation: this is a native NVMe over Fibre Channel array, where an FPGA or ASIC (in the style of Apeiron or Kazan) is hardware-distributing incoming I/Os to a collection of NVMe drives on a collection of PCIe buses.

The network business made the transition over a decade ago from software routing packets to last year's leading edge layer 3 switch ASIC routing about 3 billion (3 x 10^9) packets per second at a latency of about half a microsecond. Hardware of this type is very hard to design correctly and get all the corner cases right, and expensive to engineer, but quite doable.

The hard part is the software, both behind the scenes on data integrity and error recovery, and visible services. In this new world as in networking, this software will run as the "control plane", setting up the hardware tables which allow individual I/Os to execute entirely at hardware speed in the "data plane".

Oh, and if Vexata has any sense, they will not fall into the trap Violin did and design only for legacy access, they'll also offer a forward looking programming model (my favorite is using the hooks in a supercomputer connect like InfiniBand to execute storage reads and writes directly from user space) which eliminates I/O path length at the server. Or maybe they've already figured out something better...

Symbolic IO CEO insists the IRIS i1 is more than a bunch of pretty lights

Steve Chalmers

Perhaps this isn't so complicated

Seems to me this is a server, integrating a storage system, which compresses data and then puts it in DRAM which is backed up to flash either continually or at power fail. So storage I/O happens at memory latency (plus whatever CPU time is used for the compression), and applications which are limited by storage latency run far, far faster.

This design will be significantly simplified when byte addressable, suitably fast SCM chips are available (and priced right, and reliable, and ...).

As an old storage system designer, there are two risks Symbolic needs to have mitigated, and needs to be able to explain the mitigation in terms which are both understandable to a CIO and technically accurate.

First, when a storage system acknowledges that a write is successful (a SCSI status phase comes to mind), it commits to the world that at no time in the future will a read ever, under any combination of misfortunes, be able to see what was on the disk in that place before this write. In this case, it means that even if the power goes out, the server crashes, a chip dies, or the like in the millisecond following this write, it's durable. This is traditionally really hard to do with DRAM. It is clear the Symbolic folks have put a lot of time and thought on this, in the end building a proprietary NVDIMM-N type module with purpose built logic and some processing capability on board.

Second, which I would guess is out of scope, is that part of what makes a classic storage array "reliable" is dual controllers, and RAID or mirroring of disks -- and failing independent of any particular server. I think this design's intent is more along the lines of (say) a Microsoft Exchange active/standby configuration of servers, where the application makes sure a current copy of the Exchange data exists on both servers at all times.

Looking forward to future communication from the Symbolic technical folks which, after the patents are all filed, explains in clear language how the gains are achieved so customers can self-select based on the risks and benefits of the design.

Western Digital CTO Martin Fink refused El Reg's questions, but did write this sweet essay

Steve Chalmers

Re: "In a world where I might want Exabytes of memory..."

64TB is four top-end SSDs right now. That would be a good ceiling for storage in a client (a PC), but not for a server.

When we are talking about memory as storage, a suitable ceiling for a single very large database server is probably several thousand SSD's (perhaps 64PB). For a SAN of a thousand servers (not that SAN is the right technology to share memory), the right measure is exabytes. I would not want to lock in a shared address space smaller than 2^72 bytes or so for a shared memory semantic fabric for use in the 2020's.

This is a nuanced topic, and what I've said here is only a tiny part of what it will take to successfully design in this space. Assuming any of us has a crystal ball clear enough to see that far into the future...

Steve Chalmers

Re: @Steve Well said, Martin!

There are a lot of ways to build product. We don't know whether a distributed approach (put a modest amount of SCM in each server box, and use a lot of server boxes) or a centralized approach (looks like an EMC Symmetrix of 20 years ago, or more recently a DSSD box) will make the most business sense for the most applications. The market is probably big enough for both, so long as they use the same access and security mechanism on the wire (data plane), and the software agrees on who gets to read/write what the same way (control plane).

@FStevenChalmers

Steve Chalmers

Re: If SCM possible... then ...

That's what Gen-Z or some other memory fabric is for: delivering low enough latency for an SCM read or write across a fabric the size of a SAN today, that the software can rationally wait for the response, rather than releasing the processor to go do something else (whether by blocking the thread or by using async I/O and doing the something else in the application / database / filesystem).

The basic action of an Ethernet switch -- packet processing -- even in the fastest switch made today takes too long relative to an instruction execution time in a processor, to wait for a request and a reply to each be packet-processed. This doesn't mean shared SCM access over Ethernet/RDMA makes no sense, it just means it's for cases where the application knows it's reaching for "far" data and is willing to wait for it.

@FStevenChalmers

Steve Chalmers

Well said, Martin!

Chris, Martin's comments are well said and get to the heart of how we need to think about the next decade. Looking forward to what he and his team will accomplish at WD, and to what the industry can and hopefully will do to evolve to best and highest use of byte addressable persistent memory, rather than just looking at it as a new component and jamming it into an existing slot in an existing server or storage system.

(For context, the first time Kirk shared the concepts here with me almost a decade ago, I was very skeptical and tried to force the ideas into the existing concept of what a server is, what a storage system is, and how the two talk to each other. That's not the best and highest use of byte addressable persistent memory. It's OK for it to take a year or two for you to "get it". When you really grok what a memory fabric like Gen-Z is for, and how the application does kernel bypass for storage reads and writes, and instead of thinking in today's paradigm are asking how the industry can get to an affordable plan to evolve to the kind of end Martin has in mind, then you really get it.)

The best and highest use of byte addressable persistent (storage class) memory does not respect today's boundary between server, storage, and network in the data center.

@FStevenChalmers

("retired" from HPE last fall, still too excited about this area to take a job doing something else)

Hopping the flash stepping stones to DIMM future

Steve Chalmers

64 bits isn't close to enough

Today's high capacity SSD is about 16TB (2^44 bytes). Assuming perfect, dense use of address space with no set asides, a million of those drives would completely fill that address space. For a single system, again assuming perfect dense use of address space, I tend to agree that 64 bits will last a long time.

However, for shared memory semantic storage, it is not unreasonable to consider a modern Google class data center as having 100,000 servers and at least 200,000 disk drives. That means in a shared environment it is entirely possible to exhaust a 2^64 address space in a data center built by 2020. I haven't done the analysis myself, but would tend to point to at least a factor of 2^10 more (ie a 74-bit-ish address space) in any chip or fabric design intended for the 2020s.

Note that for fault containment reasons I doubt the entire storage fabric space will be mapped into any one CPU's address space: this address is not the size of a pointer or index register used by applications, it's the size of the address coming out of the MMU or similar mapping table between the CPU and the memory semantic fabric. (Some supercomputer app will prove me wrong here, but I still think there's no rush to re engineer CPUs the way we did from 32 to 64 bit pointers some decades ago.)

@FStevenChalmers

Steve Chalmers

Re: Thanks for the Memory (article)

The SRAM is now on the CPU die, as L3 cache. The path from the CPU's memory pipeline, out the pins to the DRAMs and back has been so hyper-optimized around the way DRAMs work that the latency savings from using SRAM instead of DRAM at this point would be insignificant. That was absolutely not the case when I designed with SRAMs and DRAMs in the 1980s. Oh, and the static power consumption per bit in real SRAM designs of that era would simply melt a chip at today's densities, so it's not exactly SRAM...

It will be interesting to see what happens over the next decade as the various storage class memories emerge, first as storage, and ultimately (possibly) to displace DRAM. That will require a change in the interface between CPU and memory -- Gen-Z (which I worked on) is an example of a different interface. Will be interesting to see if latency-optimized (rather than density optimized) memory devices using some emerging SCM technology, combined with a new interface (think hybrid memory cube stacked on the CPU die itself, with aggressive cooling) accomplish what you have in mind in, say, 2020 or 2023.

Steve Chalmers

Control plane for shared (networked) DAX storage?

The limit as the stack approaches zero instructions executed per persistent read or write, when the byte addressable persistent memory is shared by many applications running on many servers, is that we will have what network people would call a "control plane" spanning server, memory semantic network, and storage system.

The "control plane" (the drivers) would set up protection and mapping tables to give specific (user space) processes read and/or write access to specific regions of SCM in the storage system. (The "data plane" as a network person would call it is thus read and write operations being performed directly by the application (more likely its libraries) to the SCM itself, with no intermediaries executing lines of code -- not in the server, not in the network interface, not in the switch, and not in the storage system.)

There is an example of how such tables would work in the Gen-Z spec, but I would expect an industrywide control plane to work equally well with PCIe based fabrics, as well as memory windowing and its descendants on InfiniBand and Omni-Path, and other similar technologies reused from supercomputer fabrics.

So who's working on a piece of code that can be the start of this control plane for storage? Hint: if done right, very efficient container storage falls out: storage access permissions aren't controlled by LUN by server as we did in Fibre Channel, they're controlled by memory range (more likely page table) and process, as is sharing of memory between processes within a single server. What we do not need as an industry is 100 venture funded startups each coming up with their own proprietary way to do this...

@FStevenChalmers

XPoint: Leaked Intel specs reveal 'launch-ready' SSD – report

Steve Chalmers

First step, not best and highest use for byte addressable storage class memory

In all this discussion, don't forget this alleged product is using the new 3Dxpoint memory technology in a device which is form, fit, function replacement for one based on today's NAND flash.

This allows the new technology to be shaken out, and if an unfixable problem occurs, the warranty cost is limited to giving a bounded number of customers flash products to replace these new ones.

The best and highest use of byte addressable storage class memory is, of course, presenting a DAX space directly to the application, with no storage stack. This brings with it the host of issues discussed in the SNIA NVM programming model documents. The advanced topic here, of sharing byte addressed storage class memory between servers, is what the Gen-Z memory semantic fabric is about. So any characterization of 3Dxpoint as an insigificant performance boost, based on an early product that emulates today's flash drives, has missed the point.

@FStevenChalmers

Steve Chalmers

Re: Honest question...

Haven't worked on data integrity features for almost a decade, but 4228 would allow a 4K sector to have a T10 DIF field on each of 8 512 byte sub-sectors.

More likely it fell out of the implementation (which allows 512 byte sectors with T10 DIF) and got put on the data sheet as a potential differentiator.

Less likely: someone is playing with a capabilities architecture, access control tags, or the like, and simply wants to be able to reserve space at the sector level for future use. This is exactly what the 520 byte sectors offered, used in proprietary ways ~20 years ago in high end arrays like Symmetrix to ensure data integrity.

Hobbled by partners Dell and NetApp, where does Cisco go from here?

Steve Chalmers

Industry end state really isn't clear

This decade we have seen a pendulum swing from traditional on-premises Enterprise gear, from traditional Enterprise vendors, through traditional Enterprise channels, toward cloud services, based on the same core technologies as the Enterprise gear but brutally disintermediating both the box builders and the sales channels.

But what is the end state Cisco should invest towards? It is fundamentally an Enterprise selling play, its entire existence part of what I just said cloud is "brutally disintermediating". So does Cisco move toward being a technology provider (like Intel or Broadcom or Samsung), toward owning a software ecosystem (like Microsoft excluding Azure), or toward wrapping services around the cloud offerings of others? Or does Cisco double down on Enterprise, assuming there will be enough market left when the pendulum stops swinging to support itself in the style to which it is accustomed?

There are two technology inflection points coming which Cisco could "bet the company" on. One is user space access to shared byte addressable storage class memory -- which is a really good reason not to commit to the past by buying one of today's storage leaders who won't be able to evolve through this transition. The other, which I can't articulate as well, is an evolution of networking away from traditional learning-and-moving-packets (I can sit at my PC and "ping" any location in the world -- default permit) toward planned connections (I install the gmail app on my smartphone, and the app has credentials which my service provider recognizes, allowing the app to open a connection to a gmail server -- which would otherwise be default deny). Look at what Fastly is doing at the CDN edge, or what Cisco ACI or VMware NSX or Calico are doing in the data center.

The problem with both of these is they will require changes to the application -- a heresy in the Enterprise world -- and there is no certainty that a large investment will result in a critical mass of adoption by application writers (or at least those who provide storage and network services for containers or VMs). So a Cisco should probably sit back and wait for someone to win, then pay the huge premium to buy them, rather than trying to develop something revolutionary in house (or even in a captive spin-out-spin-in model as they did with Insieme).

Will be interesting to watch all this over the next 20 years.

@FStevenChalmers

RIP HPE's The Machine product, 2014-2016: We hardly knew ye

Steve Chalmers

Re: Think this through to endgame

There are at least three basic technologies, all persistent/nonvolatile, competing for the brass ring of replacing DRAM in the 2020s. Memristor by itself is no longer one of those three, but a descendant is. (ReRAM, or resistive RAM, is a category which includes memristor and several related technologies.)

Concur that what matters is who gets the revenue/margin. I honestly don't know what the market will look like in 2025, much less what share of what segment of that market HPE will have. But HPE invested in a very long view of driving technology for that era, not just following the herd, and we should respect that choice and keep an open mind for the medium and long term consequences of that choice.

Steve Chalmers

Re: The Machine was obsoleted by Intel's Purely Platform

It's not that simple.

A large pool of (storage class, nonvolatile) memory can be built either by putting some in each of many servers, or with memory boxes somewhat analogous to today's disk arrays.

However, a key point of sharing byte addressable storage class memory like this is, well, accessing it directly, inline in user space code. Like a DAX space, but shared at the rack level (or larger). Not calling an RPC, not calling a storage stack, just reading and writing (subject to access controls, of course).

Another key point is that the limit on the number of DRAM DIMMs in a server today is far too low, and the reach of DRAM connections is far too short, to replace rather than supplement storage.

Intel is a smart, resourceful company, but Purley was developed to run with today's software, not in the future software world The Machine envisions. So while Intel has the Cray PGAS software to draw from, and could probably share the storage within a multi socket server over QPI or successors, there is no indication of user space (inline) access over Omni-Path.

Behold this golden era of storage startups. It's coming to an end

Steve Chalmers

An era ends, another era begins (this is tech after all)

An era started when we chose to network block storage by putting Fibre Channel under the host's SCSI software stack. So SANs and Fibre Channel disk arrays came to be. iSCSI chose to extend this to Ethernet.

As flash storage became cost effective, and found the legacy SCSI stack too slow, we got NVMe over Fabric. Right now these are more extensions of local disks (think networked SAS storage) than they are storage systems in the model of Fibre Channel. Will be interesting to see how this plays out, particularly if Omni-Path takes hold in the data center.

Looking further out in time, the whole nonvolatile memory in the server DIMM slots as storage thing is still taking shape, on multiple fronts. I'll leave device technology, where I have no expertise, to the device technologists noting that they're all chasing the tens of billions of US$/year which the "brass ring" of becoming the DRAM replacement would bring.

On the product front, best and highest use of this technology requires repartitioning work at the basic boundary between hardware and software, with the current partitioning of products between "server", "SAN", and "storage system" as collateral damage. This is where flash was 14 years ago when the first flash-array vendor came to us and presciently told us how flash would take over storage...about 10 years ahead of when it did. I think there will be a similar race, with the usual wasteful VC funding of me-too startups starting perhaps 5 years from now, after the core technology foundation is better laid. Will be interesting to see how much ends up open source, how much is driven by the big Internet players, and how much follows the traditional commercial model.

So yes, venture investment in the earlier era ought to ebb and be very focused now. But there will be another wave.

Oh, and the storage business evolves glacially. We'll still be finding new customers for today's era products a decade from now, and still selling them two decades from now. An "abrupt" shift in storage buying patterns takes a decade.

Enterprise storage is a stagnant – and slightly smelly – pond

Steve Chalmers

Measuring the technology cycle in action

When we as an industry launched Fibre Channel two decades ago, we made a choice to reuse rather than replace the then decade-old server-side SCSI software stack. That choice in turn preserved a boundary between server and shared storage which made storage systems what they've evolved to over those decades.

Both flash based storage -- which causes us to revisit the SCSI decision and leads to NVMe and NVMe over Fabric -- and the emergence of http-over-Ethernet accessed, highly cost-per-GB sensitive object storage systems call for very different storage system designs. Those newer designs don't fit the model of traditional storage systems,

So we'll likely see analyses like this which capture sales in one storage technology approach but not the other approaches competing for those same customer dollars. If you think the numbers today are confusing, just wait for storage-in-memory to take hold. Then it will be really hard to divide the revenue between what is server revenue and what is storage revenue.

Intel and pals chuck money at another Fibre Channel killer

Steve Chalmers

California...

Kazan is in Auburn, California.

Congrats Joe and company on the funding round. Go for it!

Euro researchers more loyal and cheaper than Silicon Valley folk

Steve Chalmers

Transnational companies and R&D work, just out of vogue

30 years ago I was a young R&D engineer, leading a small team collaborating with peer teams in Grenoble, Boeblingen, Bristol, Guadalajara, and oh yes a few U.S. cities as well. It worked very well. I have the greatest respect for my colleagues of that era.

We had competitors at that time who chose to centralize their R&D on a single campus or in a single metropolitan area (typically near the U.S. East or West coast). Part of our worldwide success was grasping regional differences in customer preference and buying patterns, which is easy to miss in a team which lacks geographic diversity.

It's been disappointing to see the current generation of "star" companies choose the single-city approach, being open to this important talent pool only as expatriates.

@FStevenChalmers

SPC says up yours to DataCore

Steve Chalmers

The rule is there for a reason -- but SPC-1 has been broken

The rule which demands that the UPS be there in this particular circumstance is there for a reason -- to ensure that what's being measured as a storage system in fact exhibits durability of writes. Otherwise someone could spin up a stateless AWS instance, run memcached-like software in it knowing that the data would be lost if the instance ever crashed or was reset, and claim a storage benchmark result.

I haven't been in the storage performance game for 14 years, or even in the storage business for 8 years. So take this with a grain of salt. But what we used to say is that the life cycle of a benchmark started with small numbers, grew to medium numbers as people learned to design and tune for the benchmark, and that at some point someone figured out how to "break" the benchmark turning in very large numbers which were no longer meaningful to the real-world customer use the benchmark intends to represent. Once broken, a benchmark needs to retire and/or be replaced.

On the surface, without doing deep study, the result in question tells me SPC-1 has been broken for large storage systems.

@FStevenChalmers

(speaking for self, not for employer, which happens to be HPE)

Cisco should get serious about storage and Chuck some cash about

Steve Chalmers

Cisco missed this bus, wait for the next one

Note: I work for a competitor.

Cisco is a supplier of Fibre Channel SAN switching and management, and today cooperates with and meets in the channel a number of storage suppliers. I was really concerned as a competitor what Cisco could do with Whiptail, but it seemed from outside like the backlash from that deal in Cisco's relationships with other storage companies impacted Cisco's commitment to making Whiptail work.

Given that storage, particularly the kind of storage you'd closely couple to the server or make part of the server cluster in the way Whiptail could have been, has a new entrant in the form of server resident persistent memory, it seems to me that Cisco would be far better off entering this new segment as it emerges than in risking the backlash from current partners if it acquires an established leader (NetApps for example) or an emerging solid state array company (Pure, Nimble, etc).

@FStevenChalmers

Storage with the speed of memory? XPoint, XPoint, that's our plan

Steve Chalmers

Real story is changing relationship between application code and storage

Chris,

Thank you for a very good analysis piece, which you had to put together from not enough public data. This kind of analysis has to use rough numbers, as you did. The comments of those who want everything accurate to two significant digits are valid, but Xpoint (and its competitors) are still evolving.

jmbnyc in his comment raises an important issue which I think has been lost in the focus on Intel's excellent marketing of Xpoint. Byte addressable persistent main memory in computers enables a fundamental change in how applications (either directly or through databases, file systems, etc) cause a particular data item to become durable.

It is applications which use some new and different approach to how they read and write persistent data, which will deliver the payoff. The way the persistent bits are stored is just an enabler.

@FStevenChalmers

(speaking for self, not employer)

Memory-based storage? Yes, please

Steve Chalmers

What you see today is stepping stones on a long technology journey

Application performance many times, but not always, is really limited by the latency of writes to persistent storage. For example, neglecting the optimizations of the last few decades, the inherent transaction limit of an Oracle-like database is the latency of a log write, each of which must be persisted before the next begins.

This whole concept of persistent memory in servers is about making available to the application tools which have been available within disk arrays for decades, and as a result allowing the application to persist data far faster than can be achieved using traditional storage devices through traditional storage software stacks (both on the host and in the storage system). Evolving software, including application software, to figure out what hardware is optimal and how software can in practice use it optimally is a really hard (research) problem. There is great value in that goal, and also value in the intermediate steps we will be taking as an industry along the way.

Enrico is right, we should be paying a lot of attention here. But as each of us draws our own conclusions, let's not confuse the research aspects, with the technology competition to replace worldwide DRAM sales comparable to the GDP of a respectable sized country, with the stepwise spinoffs of those efforts coming to market now and over the next few years.

@FStevenChalmers

(speaking for self, works for Hewlett Packard Enterprise)

Public enemies: Azure, Amazon, Google, Oracle, OpenStack, SoftLayer will murder private IT

Steve Chalmers

Watch the storage, not the compute

When it comes to the cloud, pay attention to the data. Moving data into the cloud is hard. Getting it back is harder, particularly if your cloud service provider goes out of business on short notice.

Also pay attention to concentrations of risk. It's no big deal if some archive you rarely need access to is offline for a week. On the other hand, a big bank or trading firm would be put out of business by an outage that long, and prepares accordingly. Remember Amazon's outage a few years back, when a network partition within the piece of a data center which held a lot of customer data suddenly caused essentially all of the computers holding that data to think they were the last surviving copy, each then attempting at very high priority to copy itself?

I've often said it takes a decade to establish a new server to storage connection, and at least two decades for a server to storage connection to fade away once a replacement technology reaches critical mass. In the cloud, that connection is an API ecosystem like S3, not a nominally hardware centered ecosystem like Fibre Channel. My point is not these details, it's that it would take decades to migrate all of enterprise IT to the cloud, even if the cloud were provably a better business choice today.

That is not to say the the cloud hasn't captured 100% of a new generation of applications, and some existing corporate IT as well. Shifts like this in customer choices were the nature of the technology world when IBM brought the System/360 mainframe to market 50 years ago, when Digital introduced the VAX series a little under 40 years ago, when Compaq created the plug-compatible, software compatible PC 30 years ago, when Fibre Channel was introduced 20 years ago, and so on. Most of the time customers figure out where the new technology makes the most sense, where the older technology makes the most sense, and the market evolves accordingly over a period of years (sometimes even decades).

Back to the point: watch the data. If corporate customers (small, medium, and large) start moving all their data to the cloud, the compute can follow (the compute is the easy part).

Why NetApp shouldn’t buy Solidfire

Steve Chalmers

Boulder? Not a problem

StorageTek (long since part of Sun) established a huge Boulder site back in the late 1970s, and had no problem staffing it. Lots of spinoffs and startups since. There is not only a tech workforce there, it's a far cheaper and more pleasant (and more family friendly) place to live than Silicon Valley.

And yes, the workforce is more stable, as it is in the whole corridor from Boise through Colorado Springs.

Array with it: What's next for enterprise storage?

Steve Chalmers

This is about silo walls, not substance

What's really changing is that -- over the next decade or two, not the next year or two -- the boundary between "server" and "storage" which we hardened when we drove Fibre Channel into the industry two decades ago will soften again.

In a lot of cases, the storage is being absorbed back into servers. Sometimes all the servers. Sometimes just specialized ones.

As you walked around HPE Discover, you probably saw flash storage sitting on what we used to call memory DIMMs sitting in servers. That's a lot faster to read and write than storage in an NVMe or NVMe over Fabric device, which in turn is a lot faster than traditional storage. When application performance is proportional to (1 / storage latency), this kind of "faster" is extraordinarily important.

You probably also walked past the HPE Labs display, and learned about "The Machine". In "The Machine", a number of "servers" share a pool of nonvolatile memory -- that's the hardware for the enterprise storage for those servers.

Enterprise storage isn't really about hardware. It's about software. It's about mapping the addresses seen by the servers onto the disk drives, which is really about making disk failures, RAID, rebuilds, migrations, and the like transparent to the servers. It's about snapshots and backup, cloning and migration, thin provisioning, deduplication, and all the other features of a modern Enterprise array. These features, this (possibly hardware assisted) software, doesn't go away in a world where the storage hardware has been reabsorbed into the servers. It's just called something different, such as "software defined storage".

Observation: it takes a decade for a new storage protocol/interconnect to reach critical mass in the market -- very few proposed technologies have this level of success. Once established in that mass of customer sites, a technology like Fibre Channel has momentum that takes decades to fade away, even if a replacement technology has already reached its own critical mass.

I'm reminded of the old tradition: "The king is dead. Long live the king.".

Or in this case, "Enterprise Storage (as we've known it for the last two decades) is dead. Long live Enterprise Storage (more tightly integrated with server hardware for the next two decades)." Only customers will still be buying today's Enterprise Storage for decades to come.

@FStevenChalmers

(works for HPE, haven't worked in storage in this decade, speaking for self)

Block storage is dead, says ex-HP and Supermicro data bigwig

Steve Chalmers

They're still linear arrays of bytes...

Agreed there are a lot of changes coming in storage, and I agree with Robert that the entrenched players may not be moving fast enough. What else is new in the tech industry?

As to storage, a disk drive is a linear array of bytes. Historically, we've used a file system to recursively divide that linear array of bytes into a collection of smaller linear arrays of bytes. Or disk arrays to stripe logical disks across physical disks (striping logical linear arrays of bytes across physical linear arrays of bytes). Object storage is yet another collection of smaller linear arrays of bytes (the objects).

A USB stick (which by the way is the replacement for the removable disk drive) is just another linear array of bytes.

Flash, or more generally nonvolatile memory DIMMs, are simply a new physical linear array of bytes to store smaller linear arrays of bytes (objects, files, whatever).

Agreed that how we organize and access the smaller linear arrays of bytes within the larger physical ones may change.

Oh, and accessing storage congests networks. Blindly using UDP is fatally flawed in nontrivial installations. Layering one of the emerging congestion control protocols over UDP is fine.

But I predict that 100 years we will still have physical devices which store linear arrays of bytes, and within them we will store smaller linear arrays of bytes. If block storage is dead, long live block storage II !

@FStevenChalmers

You know what storage needs? More doughnuts to flatten us up

Steve Chalmers

Torus has its place, but there are downsides

The torus approach works best in fixed installations: imagine needing to add a few more devices to an existing, running torus. In the end, when we used even a basic ring back in the early days of Fibre Channel (Fibre Channel Arbitrated Loop, or FC-AL), the industry learned quickly that we needed to wire that loop in a star with wiring hubs to make adding/removing/failing/repairing devices straightforward. Even then, loop reconfiguration was disruptive enough that in the end low cost switching (the exact wiring needed for leaf-spine) won out.

The torus approach also needs to have the failure (immediate workaround, repair, return to normal) cases thought out well. A torus with hardware which forwards through a node in 80ns is doing so in hardware based on tables loaded into each ASIC by some sort of software control plane. There are a lot of "interesting" cases (black holes, forwarding loops, non simultaneous changes to forwarding tables by software, ...) beyond the basic operation described in this article.

Will be interesting to see where this torus design finds its home, and among what customers.

Diskicide – the death of disk

Steve Chalmers

Good Thinking! But pay attention to...

Good reasoning on flash displacing spinning media for certain use cases in the data center!

I'd encourage some thought on:

1. The server-side software stack for accessing storage. The stack we've used for 50 years, or even in its current form as the SCSI stack for 30 years or the Fibre Channel stack for 20, consumes far too many CPU cycles, far too much time to match up with what flash and emerging solid state storage technologies can do. NVMe and NVMe over Fabric are a major step here, but by no means the end of the story.

2. In the data center, the interfaces used for a storage infrastructure change glacially. An IT shop which committed to Fibre Channel 15 years ago will probably be using at least some Fibre Channel 15 years from now, even if they decide today to start phasing it out. Which leads to...

3. A lot of these new storage technologies, whether it's Object or some creative use of server integrated flash, are great for (certain) new applications, but legacy apps would have to be ported at a minimum and rearchitected/rewritten at the extreme, to use these new storage models. That means the legacy apps will be on something which emulates traditional storage for a long time.

4. "Requiescat in Pace" (sorry, the 4 years of Latin I took 40 years ago stuck, have to call the typo)

@FStevenChalmers

Fibre Channel over Ethernet is dead. Woah, contain yourselves

Steve Chalmers

Looking at FCoE another way, it's been quite successful

Here's the comment I made on the author's blog, where this was first published:

Perhaps there's another way to look at FCoE, not as a product but as a catalyst for business change.

FCoE caused every sales discussion, worldwide, for networked block storage (server, SAN, storage) from 2007 to 2013 to become a discussion where just bringing Fibre Channel, or just bringing Ethernet, wasn't good enough. This changed the sales dynamic, worldwide, to favor not just Cisco products but Cisco's direct sales force and resellers.

FCoE also redirected the entire discretionary investment (and then some) of the Fibre Channel industry (server HBAs, SAN switches, disk arrays and other storage devices) for that same period. In some cases, companies which previously specialized in either Ethernet or Fibre Channel were combined by very disruptive M&A in order to have all the skills required to succeed building, selling, and servicing FCoE products.

In the end, FCoE turned out to be a very cost effective edge (last hop) (Access layer) for Fibre Channel networks. It was also the catalyst for my career shifting from Storage to Networking. In those two ways, FCoE was a big success!

(speaking for self, not for employer, which happens to be HP)