back to article That is pretty, er, Nimble. Storage firm claims 'six nines' availability

Nimble says its storage arrays (all-flash and hybrid) have reached six “nines” availability. That means less than 25 seconds downtime a year, with the measured availability - according to the storage firm - being 99.999928 per cent. Nimble has roped in an IDC spokesperson, research director Eric Burgener, to help spread the …

  1. This post has been deleted by its author

  2. dikrek
    Boffin

    The key thing is that this is measured

    Hi all, Dimitris from Nimble here (http://recoverymonkey.org).

    This is true measured data across over 9000 customers running all versions of the code, new and old. Unlike some vendors that say "designed for 7 nines" (and across a handful of customers and very specific code levels).

    I don't so much care what it's theoretically DESIGNED for, I care more about what it can DO ;)

    You can see the evolution of code uptime here:

    https://www.nimblestorage.com/blog/six-nines-availability-across-more-than-9000-customers/

    Thx

    D

    1. Lee D Silver badge

      Re: The key thing is that this is measured

      Technically I've worked in a number of places that have done 7 nines or more over the course of a handful of years. Literally zero downtime. Luck is like that sometimes. But I wouldn't sell those systems to others on that basis.

      It doesn't really matter unless you're guaranteeing it, and if you're guaranteeing it then it gets expensive should anything ever go wrong, even for only a few seconds.

      In the same way that we can all point out systems with thousands of days of uptime, pointing out customers who haven't experienced a failure isn't that difficult either.

      What matters is not what's happened historically, but what's going to happen tomorrow. And what you'll do when you don't hit those six 9's. My guess is that you'll shrug your shoulders, go "Oh well, never mind" and your customers will be no better off than with any other similar provider.

  3. Anonymous Coward
    Anonymous Coward

    Does Nimble exclude maintenance windows for hardware and software upgrades or is that accounted for in the six-nines they claim?

    1. fredesmite
      FAIL

      duh - what you think scheduled down time is ?

      1. Anonymous Coward
        Anonymous Coward

        >duh - what you think scheduled down time is ?

        What a moronic comment. The other AC (not me) is just asking if maintenance requires down time. Scheduled downtime is fucking useless if you can't have scheduled downtime. Modern storage systems should require no downtime whatsoever, with no requirement to reduce the load on the system during code upgrades.

  4. dikrek
    Boffin

    There are no maintenance windows for firmware and hardware upgrades, these happen non-disruptively (a few seconds for a controller failover) so they are not counted.

    In addition, if we detect an actual outage, we investigate. If it turns out the outage was because of something like a customer shutting down the gear in order to move it or other similar major site-wide maintenance activity, we will not count that as downtime. Otherwise it’s all counted.

    1. John Robson Silver badge

      A few seconds for failover - when you're only allowing 25 seconds per year each firmware upgrade takes a serious percentage..

    2. Lusty

      So which is it then, 6 nines including maintenance or 5 nines like every other array? Excluding maintenance is cheating like it or not. That's the entire point of availability stats, if I run it and maintain it properly how much downtime do I expect? Excluding maintenance you're pretty much down to luck of the draw and a couple of drive failures will destroy those stats so be careful how loudly you shout!

      1. Anonymous Coward
        Anonymous Coward

        Availability figures are as useful as IOPs figures: without the details they are pretty meaningless. I don't care if you can do a million IOPs with 4K read cache hits and I don't care if you can maintain X 9s availability on a system that never changes, because it's during changes where it's most likely to fail.

        I also don't care if _your_ system stays online during a code upgrade. I care about _my_ systems. If I start a code upgrade, am I going to suddenly get increased latency (the performance stat that actually matters) because you've had to disable your write cache? If that increased latency leads to timeouts in my applications are you going to assume responsibility? Or are you going to say "our system stayed up and is working as designed therefore we provided 100% uptime"?

        What about if a disk or flash device fails? Will the extra IOPs caused by the rebuild slow down my applications?

        I should point out that this isn't directed at Nimble, but is a generalisation, based on what actually happens all the time across many vendors and products. There are products which can do code upgrades and fail disks without performance impact and for those which can't, this can generally be architected in, usually at the expense of over-engineering the solution which your friendly vendor salesman will never do in order to keep the deal "sweet".

        Take uptime claims with a pinch of salt and if you're talking to vendors, ask them for the details. And well done to the Nimble guy for coming on to back up his argument (even if I don't necessarily agree)

        1. Anonymous Coward
          Anonymous Coward

          Uptime measures

          I agree that details of how uptime is measured matter. One dimension of this is what is considered as part of "downtime". Most vendors will measure the time that the storage array is "down". But what really matters is how long applications are "down" and that is usually much much longer. Once the storage array comes back up, there is quite a bit of work to do to bring each and every application back up and to validate that they are working.

          I know at least one vendor (I work for them) that actually includes the time from the first application going down to the last application coming back up in their downtime measures. Of course, that can't be measured automatically.

          Another aspect - who gets to decide if the application is "down"? If latency shoots up by 5x and the application is no longer able to keep up with the workload - but is still puttering along doing the best it can - is that downtime? I know of at least one vendor that takes the customer's definition of downtime into account. If the performance problem is considered by the customer to make the application "down", then it is counted in the downtime tracking. Again, this is difficult to measure automatically.

          My conclusion is that Nimble is probably counting only storage downtime (not application downtime) and they probably don't count what we call "performance DUs" - performance levels unacceptable to the customer.

          One final point... There are different classes of outages - some are limited to a single volume or application (i.e. array returned an error response that wasn't properly handled by some version of an arcane operating system that is on the support matrix, causing one application to go down). Others are outages of the entire array such as an upgrade gone bad or something. Are single volume outages counted in the total or only instances where the entire array is "down"?

          All this to say that the details absolutely do matter. They likely account for 2 of the 6 9s. And if the vendor skimps on those details, well, it is actually pretty easy to achieve only 4 real 9s of availability. Getting to 6 "real" 9s is rather difficult. I've watched the product I've been working on go from 4 real 9s to 7 real 9s over the course of 8 years and it took very deliberate effort and a lot of it.

  5. Mike 16

    Better than my former employer

    After the startup (that you have never heard of) where I worked was acquired by a mega-corp (that you have definitely heard of), all of us plankton were herded into a room to be told of our bright future at BigCorp, and the wonderful products we would now be working on. At one point, the spokesdroid gushed about how they (now we) were totally committed to "Nine Fives" reliability. We figured that might be just about doable.

  6. Anonymous Coward
    Anonymous Coward

    It's funny how Nimble can get six nines availability, but can't get their stock price up to $9.99999

  7. Kevin McMurtrie Silver badge
    Coat

    Pfft!

    I can do an uptime with seven nines. There are other digits too, but at least seven are nines.

    1. chris coreline
      Happy

      Re: Pfft!

      'at least seven nines uptime, also, we cant guarantee decimal point positioning'

  8. DTrump

    With downtime it's more like 2 99s

    Right from their own docs! %99.4

    http://info.nimblestorage.com/rs/nimblestorage/images/wp_nimble_storage_system_availability_5-9s.pdf

    1. dikrek
      Boffin

      Re: With downtime it's more like 2 99s

      You're referring to an obsolete doc from 2014 where we hit the 5 nines. Since then we did a bunch of work to go to 6 nines.

      As someone else stated, this stuff isn't at all easy, and it takes years of hard work to get the extra "nines".

      This is the most aboveboard company I've worked at, when I first joined last March I saw internal stats from the 2.3.9.2 codeline and newer. The uptime was already comfortably over 6 nines for those releases.

      I asked, "why not tell the world", and I was told that's not how Nimble measures - there's no cherrypicking, NO counting of only specific releases (unlike a competitor that recently also announced 6 nines).

      All releases are counted, including older ones that are at 5 nines.

      We waited until the true average uptime across ALL releases was over 6 nines before making this public.

      All this means is that anyone with a release that's 2.3.9 and up will have 100% uptime most likely (most of our customers are at 100%).

      And for whoever asked how many customers we have: the number keeps growing. Was about 9500 last I checked but might be over 10K now. Haven't looked, but the actual number is not the important point.

      The important point is we don't cherrypick the reliable ones. If we did that we would just claim 100% and be done with it ;)

      But, ultimately, our happy customers are testament to the reliability of the gear. Ask your local account team to get you in touch with various customers and hear the stories.

      And for those of you that work at competitors and posting anonymously - grow a pair and state your affiliation. If nothing else, it's good manners.

      Thx

      D

  9. Anonymous Coward
    Anonymous Coward

    Nimblisms

    Nimble talking 6 x 9s is like quoting Their NPS scores that are compiled by organisations that allow the data to be manipulated .

    PS. My NPS score is 98

  10. Anonymous Coward
    Anonymous Coward

    Nimble fluff

    So nimble quoting over 9000 customers yet their employees are touting over 10000 customers. Do they remove the customers who have downtime to bump up their figures like they clean out the bad customers on their NPS scores????

    After having first hand experience with these guys I wouldn't trust their chest beating as when you peel it back it's bullshit. Yes Gavin C I'm talking about you :)

    1. dikrek

      Re: Nimble fluff

      Interesting since we formally announced 10000 customers (https://www.nimblestorage.com/blog/10k-customer-milestone-blog/). And the number keeps growing.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon