back to article IO, IO, it's profiling we do: Nimble architect talks flash storage tests

We interviewed Dimitris Krekoukias, Nimble Storage's global technology and strategy architect, on the subject of storage array performance claims – he has some strong opinions – particularly about Pure Storage's approach to performance. Pure provided a response to Krekoukias' points which has been added after the interview. …

  1. Anonymous Coward
    Anonymous Coward

    How about a benchmark test?

    Surely potential purchasers would believe benchmarks (a la SPC-1) rather than manufacturers claims? How about Pure and Nimble having an SPC-1 shoot-out?

    1. madanko

      Re: How about a benchmark test?

      Why? Who pays attention to a benchmark designed over a decade ago and does not take into consideration modern architectural designs?

      1. Big_JM

        Re: How about a benchmark test?

        That's just an absurd comment. Incredibly absurd the more I think about it. My guess is you're either a Pure Storage fan or an employee.

        Modern architectural designs should thrive in old benchmarks. Technology is an evolution. As workloads have changed so has technology had to change to keep up. The SPC-1 is very much a workload test. It wasn't "designed" for anything. It's a test that pushes multiple workloads at a system. The theory of the test is that the workload patterns would overrun caching algorithms and we'd see what the backend is truly capable of.

        It's very much a relevant test. To say otherwise is absolutely irresponsibly reprehensible. It's a workload test and intensive one at that. Either run the test or don't. Denouncing the test, however, shouldn't be your option. It screams more about your lack of knowledge than it does anything else.

        1. Anonymous Coward
          Anonymous Coward

          Re: How about a benchmark test?

          Sounds like you're not up to speed with the actual SPC1 workload nor do you know that systems with inline reductions are not allowed to participate.

          "Data reduction is currently not allowed in any audited SPC measurements to become new SPC Results."

          ”We are currently developing specifications for compressible content for the next iteration of the SPC benchmark tools and specifications. That activity will be followed by investigating how to incorporate deduplication within the benchmark specifications."

          The SPC1 workload uses a highly dedupable and compressible dataset and so any one of these arrays will perform extremely well.

          In an day and age where data reductions have become table stakes a benchmark that does not take that into account is to say the least dated.

  2. This post has been deleted by its author

  3. klaxhu

    Honestly, who the hell cares about storage and IOPS and IO sizes anymore?

    1. Big_JM

      Everyone does. They just don't know it.

      The world is all about workloads now. Technologists don't examine the details as much as they used to, this is true. BUT they still matter.

      Applications all use IO sizes, that doesn't go away. And storage arrays still deliver IOPs, that doesn't go away. And a storage array is still limited by the amount of IOPs it can deliver with the smallest amount of latency, that also doesn't go away.

      So, like it or not, all these things still matter. If Pure is overselling what their system is capable of then it's something to be explored. The fact that you think you don't care about IO sizes or IOPs shows you that their marketing is dangerous. Those things haven't gone away. They can't go away. That's just how data works.

  4. dikrek
    Boffin

    Allow me to clarify

    Hi all, Dimitris here (recoverymonkey.org).

    In case it wasn't clear enough from the article:

    1. Pure is variable block, their very own results show the bimodal nature of I/O (clearly split between small block and large block I/O), yet they keep talking about 32K I/O sizes which is the (correct) mathematical average of overall throughput. Nimble's research shows that using the throughput average isn't the optimal way to deal with I/O since the latency sensitivity hugely varies between I/O sizes and apps.

    2. Nimble arrays are of course variable block and in addition application-aware. Even the latency alerting takes into account whether I/O is latency-sensitive or not, which increases alerting accuracy by 10x.

    3. Nimble's research shows that you need to treat different size and type I/O accordingly. For instance, large block sequential doesn't need to be given preferential latency treatment. Small block random I/O on the other hand is more latency-sensitive depending on the app. Nimble will automatically take into account app type, I/O size, latency sensitivity and randomness in order to intelligently prioritize latency-critical I/O in the event of contention to protect the user experience where it counts. This is an extremely valuable ability to have and helps automatically prevent latency spikes in the portion of I/O that actually cares about latency. See here: http://bit.ly/2cFg3AK

    4. The ability to do research that shows detailed I/O characteristics by app instead of the whole array is what allows Nimble to be so prescriptive. The analysis we do is automated and cross-correlated across all our customers instead of having to ask someone in order to find out what apps an array has on it.

    Final but important point: This is a disagreement about math, not about the actual I/O prowess of the two vendors. Both have excellent performing systems. But my problem with the math is when it starts getting misleading for customers.

    The other disagreement is talking about hero numbers at what looks like 100% reads. Sure they look good but if the performance drops to half when doing heavy writes then it's not that exciting, is it?

    Thx

    D

  5. Nate Amsden

    3par has been saying this for close to 15 years

    And built their architecture around it

    Picture describing this aspect

    http://www.techopsguys.com/wp-content/uploads/2013/05/3par-mixed-workload.png

  6. Anonymous Coward
    Anonymous Coward

    But Nimble's data is mostly SMB customers

    Surely the vast majority of data in Infosight is SMB workloads, seeing as they only had hybrid arrays for most of their lifetime, and only introduced FC support a short(ish) while ago?

    In other words, their conclusions may be true for small-mid customers, but not for true enterprise customers.

    On the other hand, the same applies to Pure that most of their customer data (in the equivalent of Infosight) is based on mid to high end customers.

    1. dikrek
      Boffin

      Re: But Nimble's data is mostly SMB customers

      Not at all true. Nimble's install base ranges all the way from huge to SMB. We have detailed data for all kinds of enterprise apps - Oracle, SQL, SAP, plus more exotic ones like HANA, SAS, Hadoop.

      Plus the more pedestrian VDI, Sharepoint, Exchange, file services...

      The interesting thing is that Nimble's and Pure's stats actually agree

      What differs is each company's reaction to the math and their ability to drill down in even more detail so that one knows how to automatically treat different kinds of I/O even within the same app.

      For instance, DB log files always expect extremely fast response times. That exact same DB doing a huge table scan expects high throughput but the table scan is not nearly as latency-sensitive as being able to write immediately to the log.

      Being able to differentiate between the two kinds of I/O is important. Being able to act upon this differentiation and actually prioritize each kind of I/O properly is even more important.

      It's far more than just letting random I/O through like Nate's drawing of what 3Par does. It's about knowing, per application, how to properly deal with different I/O classes intelligently, and accurately alert with very little noise if there is a problem.

      Thx

      D

      1. Yaron Haviv

        Re: But Nimble's data is mostly SMB customers

        This is exactly why block is not going to keep up with new DB design

        The right way is to put the DB log and Indexes in NVRAM (no disk IO), and encode data in a way that avoid full scans or long tree traversal, time to leverage random nature of NVMe flash, most databases are designed around decades old HDD limitations, and avoid random at all cost with inflated memory cache

        Even newer things like hbase, Cassandra, googlefs, aws dynamodb all copied the same LSM research paper thats only relevant to HDDs, which is why they can't make optimal use of SSDs

        So instead of reverse engineering io patterns of DBs its time to build DBs that run on Flash and NVRAM natively, do caching, compression, dedup in the data layer, build metadata memory hierarchy for optimal search..

        If DBs will be fast, we can build searchable file systems over those DBs, vs DBs over file systems

        Watch for iguazio announcement tomorrow

  7. Anonymous Coward
    Anonymous Coward

    TOO LONG, DIDN'T READ

    TLDR...

  8. Big_JM

    You're right. Dimitris should invest in a chainsaw because his posts were TLDR for me too...

  9. storageer

    It's now easy to use your own production workloads when evalauting storage vendors

    The vendors do bring up good points and both of their approaches have pros & cons. The bigger point that is not fully discussed is that storage buyers must test with their own application workloads. They should not be using outdated benchmarks like SPC/TPC or even tests run by the vendors themselves who can “game” the benchmark. They need to use products like the Virtual Instruments Load Dynamix storage performance analysis and testing platform. It’s the only professional vendor-independent load testing solution that extracts real-world production workload data from your data center. You can literally replay your production workloads in a test lab environment and load test them against any vendor, product, or configuration. Load Dynamix users have done 100s of “bake-offs”, often with our help, and the results vary by 5x or more between vendors on identical workloads. There is even a free cloud-based service that offers workload profiling of your production networked storage for free at workloadcentral.com.

  10. This post has been deleted by its author

  11. Chris Mellor 1

    From a Nimble guy

    [Entered on behalf of Nimble employee...]

    I am a Nimble employee for full disclosure.

    Nimble has had variable block from day 1 on our hybrids (and now on All-flash which is the same OS) which came out a number of years prior to Pure releasing their product.

    The below statement about our product is 100% false:-

    "Side note: Nimble’s data reduction, for example, operates on fixed block sizes that have to be set/tuned on a volume-by-volume basis….in our minds that’s simply not correctly architected for today’s cloud/mixed workload reality."

    On a second (and more political) note: Pure proves in their own diagram that very little IO actually happens at 32k... Transactional apps operate at 4-8k blocks and sequential apps happen at larger 64-256k blocks meaning 2 benchmark ranges are representative of real world workloads.

    It's intellectually insulting for Pure to insist after showing their numbers that the mean of these is a valid benchmark. I would hope that such a fantastic team of analysts and journalists would call this faulty logic out.

    [Obviously not :-) ]

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like