it's time
to plug The Platform
High Performance Computing (HPC) is all terribly exciting. We get announcements about this supercluster or that beating the number of teraflops of that other one (or, more likely these days, petaflops). Or maybe it's data throughput or storage size or even transfer rates. And, perhaps the most interesting way of thinking about …
...at exactly the time that HPC is going towards commodity parts.
I've been working with IBM IH supercomputers (P6 575s and P7 775s) for the last 6 years, and the machines that are coming in to replace the (IMHO) highly technical and innovative 775s are effectively packaged commodity Intel Xeon processors that are available in any high-end commercial server, with normal memory and built by a white-box manufacturer in the Far East.
The backend storage is effectively a software defined storage solution attached using ordinary Infiniband, using SAS disks in normal disk shelves connected using off-the-shelf RAID adapters in normal rack mounted Intel (actually off-the-shelf Dell) servers acting as storage servers.
The only thing that is remotely special in the hardware is the Aries interconnect (the new machines are Crays), which is an ASIC that are not beyond any large company to design.
Compare this to the IBM P7 775 which was a marvel to behold (disclosure: I'm working under contract to IBM [hence the AC], and I like the P7 775s). Although the processors are normal Power 7 processors, the internal architecture of 4 P7 chips mounted on a QCM tightly coupled with the Torrent IO hub (completely proprietary IBM design), which implemented the multi-tiered copper and optical to-the-chip interconnect allowing these QCMs to be bound together in larger units and linked to the rest of the cluster. This is contained in a package that is water cooled directly to the QCM, Torrent and memory DIMMs, and which was so densely packed that the floor had to be reinforced.
Even the storage, which was distributed throughout the cluster linked using the same interconnect that the compute nodes were using, and using the same specialist hardware as the compute nodes to provide a distributed software defined storage solution with distributed RAID, was different from anything available elsewhere. The unique packaging allowed 384 drives to be contained in a 4U 28" wide drawer, something that was again quite different from other solutions.
Unfortunately, IBM appears to have stopped developing the IH series of systems. When they bid for the replacement to the customer here, they bid commodity Intel servers linked using Infiniband, as did almost all of the other bidders (the rumour mill says they lost out, because the week after the tenders were submitted, IBM announced that NeXtScale was part of the Lennovo sale!)
I think that really big, special HPC systems will fast become a thing of the past, at least until there is some killer new technology, but I cannot see that on the horizon. Bigger HPCs will become wider (more CPUs) rather than faster, which does not suite all HPC workloads. You may still see integrated GPUs, but I have been told by a source I trust and who as a reputation in this field, that these hybrid systems are a nightmare to program.
I would have loved to have seen something like the 775 packaging appear in the ordinary data centre, because it was delivering the P and the R (performance and reliability) from the PERCS principles. and the Ease-of-use was just a matter of management software, which would have matured over time.
This post has been deleted by its author
"The backend storage is effectively a software defined storage solution attached using ordinary Infiniband,"
Funny you should mention that, because *I* was going to mention Infiniband as a prime example of a technology that WAS developed exclusively for use in supercomputers moving down to being (high-end) commodity hardware. The view 15 or 20 years ago of the supercomputer of the future was that it would use tech very similar to PCIe on-board, fiber channel-like technology to connect storage, and infiniband to connect CPU cabinets together. No fiber channel for you? Well, the >gigabit ethernet also cribs methods from these supercomputer interconnects, so instead of 10gigabit being totally untested when it came out, the principles had already been tested to some extent.
The fact of the matter is, I think the "trickle down economics" has proven to be mostly crap... but technologies developed for HPC do indeed trickle down first to higher end servers and clusters, and the parts that make sense eventually make it to (non-server) desktops and eventually some makes it to portable systems.
That said, as far as I know GPGPU (general purpose GPU) computing kind of came out of left field, and did not stem from supercomputer developments. The Cray 1 had a vector processing unit, but this was more in scale like MMX or SSE instructions (running the same operations on a block of numbers) than the totally over-the-top processing of the GPU-based supercomputers of the last year or two.