back to article WekaIO almost, but not quite, summits Summit supercomputer on storage performance

WekaIO has served up 95 per cent of the Summit supercomputer's 40 storage racks IO using just half a rack's worth of its Matrix scale-out fast filer software. It's documented in the IO-500 10 Node Challenge List, maintained by the Virtual Institute for I/O. WekaIO ran the IO-500 test on eight Supermicro BigTwin enclosures, …

  1. Anonymous Coward
    Anonymous Coward

    NVMe vs. SSD?

    As I recall, Summit's storage is conventional SSDs. And WeakIO managed to get close to that with NVMe, but couldn't beat it. That's the lede here?

    These benchmarks desperately need to be normalized by cost, don't they? I feel a bit dirty for saying it, but I miss TPC-C

    1. Anonymous Coward
      Anonymous Coward

      Re: NVMe vs. SSD?

      Summit is Nearline SAS HDDs (7200rpm)

    2. ElReg!comments!Pierre

      Re: NVMe vs. SSD?

      The final score is not the most informative thing here : I find the bw and md data much more interesting...

  2. Barbara Murphy

    IBM Summit - 2.4x more storage nodes, 3.2x more NVMe Drives, 12.5x more memory, 5% faster

    I agree with Anonymous coward #1, we need to have more transparency in the systems used so you can get a side by side cost estimate. Anonymous coward #2, you are partially correct, each node has both HDD and NVMe drives. According to IBM's literature the Summit Storage System Configuration is utilizing 2 NVMe drives in every enclosure in the storage configuration. It was a bit of a mystery hunt to find the system specification but here is a link to the IBM sponsored paper that outlines its build. I have taken that data and put is side by side with WekaIO and I encourage you to check it out because the paper talks extensively about how they are using NVMe to improve small file I/O performance. Page 6 outlines the storage node build. They have 77 storage nodes total compared to WekaIO's 32 node system.

    https://public.dhe.ibm.com/common/ssi/ecm/75/en/75017375usen/sci-ibm1804-spectrum-scale-perf-v020_75017375USEN.pdf

    Summary:

    IBM Summit number of nodes 77, WekaIO 32 (2.4x more nodes used by IBM)

    IBM Summit memory per node 1,000GB, WekaIO 192 (5.2x more memory per node use by IBM)

    ****** Total IBM cluster memory used 77TB, WekaIO 6.14TB (12.5x more memory overall used by IBM)

    IBM Summit NVMe drives per node 8 (2 per enclosure x 4 enclosures per node), WekaIO 6 (1.3x more NVMe used by IBM)

    ****** Total IBM cluster NVMe drives used 616, WekaIO 192 (3.2x more NVMe used by IBM)

    1. Anonymous Coward
      Anonymous Coward

      Re: IBM Summit - 2.4x more storage nodes, 3.2x more NVMe Drives, 12.5x more memory, 5% faster

      A few quick questions Barbara (nice video with HPE by the way):

      1) does having flash based storage for data (not metadata) improve the result in the IO-500 test?

      2) WekaIO seems to be a bit HCI like in terms of the "slider" diagram on your website - it actually infers that the software is running on the compute nodes. I can think of a number of potential downsides for that, but what ones aren't downsides that customers might think are?

      3) Small file sizes are emphasised as being a challenge for some other HPC filesystems, but you design for the workload - if you're small and care about performance you go NVMe/SSD for metadata/bulk storage. If you care about performance, but you have to store a ton of stuff, cost effectively, you use NL-SAS for the bulk. Putting it another way, a motorbike doing 150 mph is impressive, but seeing an articulated truck doing 157 mph (within 5%) is even more impressive. They may have "X times" the hardware, but they're moving a lot more shit too.

      1. ElReg!comments!Pierre

        Re: IBM Summit - 2.4x more storage nodes, 3.2x more NVMe Drives, 12.5x more memory, 5% faster

        Exactly what I meant by my previous comment: the overall score is meaningless, what is interesting is the 10 vs 24 GIB/s and the 507 vs 170 kiOP/s. So, not at all designed for the same kind of load. About as comparable as a motorbike and an articulated lorry, as you put it.

      2. Barbara Murphy

        Re: IBM Summit - 2.4x more storage nodes, 3.2x more NVMe Drives, 12.5x more memory, 5% faster

        Hi,

        Re 1) Flash based storage will help. My understanding though from reading the IBM document is that flash is used for small file as well. Given that IBM scored very high on IOPS, (MDtest) I have go to believe that NVMe and the 1TB of memory was instrumental in this result.

        re 2) Both IBM Spectrum Scale and WekaIO have a POSIX client that runs on the compute nodes which then communicates with the storage nodes. This is standard across all the parallel file systems including Lustre. It is not using something like NFS, which would never be able to deliver the kind of performance numbers you see on the IO-500.

        re 3) No question that the Summit storage system is possibly one of the biggest in the world. The benchmark is all about performance per client, which is a different metric. Our software does not preclude putting disk behind the flash tier to get the great scalability. We leverage object storage and have customers with WDC ActiveScale behind us getting great performance and scalability.

        Regarding the motor bike vs, truck analogy, it sounds good but that is not what is happening in the case of the IO-500 test. If you look at the results, you can see that WekaIO did over 2x the bandwidth of Spectrum scale (27GIB/sec vs. 9.8GIB/Sec)

        1. Anonymous Coward
          Anonymous Coward

          Re: IBM Summit - 2.4x more storage nodes, 3.2x more NVMe Drives, 12.5x more memory, 5% faster

          So on point 1, it's likely that the bandwidth was better due to the backend storage being flash?

          Everything is workload dependent (which is why the standard answer to any performance question is normally "it depends"), but if all requests are being fulfilled by NVMe, if you weren't better on at least one measure, you'd have real problems!

          On point 2, your website isn't particularly clear on where the storage is. In the diagram which is part of "Your Data Management Before and After WekaIO" it very much looks like it is in the compute nodes with links out to S3/Swift - S3/Swift aren't going to be about absolute performance. If it's not in the compute nodes, where is the performance storage?

        2. ElReg!comments!Pierre

          Re: IBM Summit - 2.4x more storage nodes, 3.2x more NVMe Drives, 12.5x more memory, 5% faster

          The benchmark is all about performance per client, which is a different metric.

          Exactly. Your system did perform very well on that benchtest, no question about that (congrats, btw). The standard issue about benchmarks still applies: they are only indicative of real-world performances if your real-world needs are reasonnably close to the test conditions, which is, I guess, the point that many here are trying to make.

          you can see that WekaIO did over 2x the bandwidth of Spectrum scale (27GIB/sec vs. 9.8GIB/Sec)

          Analogies are always imperfect, but the bandwidth metric kinda reinforces this particular one: while the motobike is very fast, it's not necessarily able to process large loads faster than a lorry (although in that case a better analogy would probably be "one Hayabusa vs a fleet of delivery scooters" but that doesn't remotely sound as good).

          Horses for courses...

          1. ElReg!comments!Pierre

            Re: IBM Summit - 2.4x more storage nodes, 3.2x more NVMe Drives, 12.5x more memory, 5% faster

            the point that many here are trying to make

            Plus, of course, the lack of information on the systems used makes it difficult if not impossible to estimate the real-world TCO.

  3. janella-barmer

    We need to have more transparency in the systems used so you can get a side by side cost estimate.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2020