back to article No biggie: EMC's XtremIO firmware upgrade 'will wipe data'

There's a rumor that upgrading EMC's XtremIO firmware to v3.0 will wipe all user data from the flash array. It's mentioned here. It seems going from firmware version 2.4.x to version 3.0, expected by early October, will cause data to be lost – and customers will need to back everything up before the upgrade and restore it …

  1. Matt Bryant Silver badge

    First rule of upgrades....

    Regardless of vendor, regardless of claims of 'non-disruptive upgrades' - BACKUP THE DATA!

    1. Anonymous Coward
      Anonymous Coward

      Re: First rule of upgrades....

      Absolutely agree but the backup is typically only a precaution on other platforms, with a recovery required only in an absolutely worst case scenario. Whereas the data recovery here is a mandatory part of EMC's upgrade process.

      The fact they had to make such a drastic change to the underlying data structures so soon doesn't really fill you with confidence about the roadmap.

  2. Erik4872


    I wouldn't want to be the storage admin who, even after triple checking the backups and praying to the array gods, lost data after this upgrade. If I were EMC, and had absolutely no way to make that update non-disruptive, I'd be sending teams of people and tons of free loaner units out. Not really knowing what's involved here, this kind of update must be so drastic that there would be no way to read data from the "old firmware format" disks. How could that be possible? No matter what form you store the data in on the physical disk, it still has to be in a logical format that can be interpreted. How hard would it be to write firmware that can read the old disks and do the conversion when the array reboots?

    I don't know, maybe EMC should just give away disks and have the customers migrate the data to a new array and give them back the old one when they're done.

    1. virtualgeek

      Re: Scary

      Disclosure - EMCer here.

      That's exactly what we (and our partners together) are doing. See the followup to the original post in Australia here:,extreme-upgrade-pain-for-xtremio-customers.aspx

      Customers and partners first, always!

  3. Nate Amsden

    this isn't disruptive

    At least to me - disruptive simply involves downtime, total data loss is in a different category than the simple term of disruptive.

    1. bitpushr

      Re: this isn't disruptive

      "Downtime" is fine. If you have a Windows box, every Tuesday morning it experiences "downtime" when Microsoft patches this, that and the other thing.

      What EMC is talking about is a wee bit more disruptive. Imagine if, on Patch Tuesday, you had to *migrate* all of your files to another computer *and* re-install Windows in order to apply the patch. Which is what seems to be the case with this XtremIO upgrade.

      Disclaimer: I am a NetApp employee

      1. Keith 21

        Re: this isn't disruptive

        I have to say in my old days, I loved the simplicity of upgrading on NetApp systems.

        Take the cluster. Failover first head. Upgrade first head. Failover 2nd to first, upgrade 2nd, bring second back in. All worked nicely with no downtime. Just as it should be!

        Even migrating to a new NetApp system was pretty seamless for the users, as NetApp helped us set up both clusters, mirror, keep mirroring until all was perfectly in sync, about 5 minutes (if that) downtime to physically change the heads, all done.

        Not to say there were not other issues with NetApp systems, especially with a particular restore which would have take 2 weeks due to an undocumented "feature", mind you, but that's a whole different story!

    2. Anonymous Coward
      Anonymous Coward

      Re: this isn't disruptive

      Exactly you might accept minimally disruptive upgrades from a startup through it's first few iterations, but at least you'd be aware of the likelihood of this happening upfront.

      In this case not only is it disruptive to your production workloads, it's also more importantly data destructive, not something that should be acceptable from a tier 1 vendor pushing this into mission critical environments.

      There's some comfort if you're running VMware, you can storage Vmotion it elsewhere whilst the upgrade deletes your data, but where that elsewhere is, at the moment is anyone's guess :-)

  4. M. B.


    ...someone blogged something about a rumored problem with an upgrade that's still in beta? I'd be a whole hell of a lot more concerned if it was production code, but according to EMC this is not the case. Sounds like a bunch of FUD to me (at least until XtremIO's start wrecking client data in the field post-upgrade).

    For the record, I've seen just about everything shit the bed at least once during a supposed "non-disruptive" upgrade. Good backups are everything. Snapshots don't count.

    1. Anonymous Coward
      Anonymous Coward

      Re: So...

      So, you should probably follow the link and take a look at EMC's response via Chad Sakac, despite all the misdirection and damage limitation attempted in the article, Chad effectively admits that the process is and will be data destructive for Customers. This is production code and will impact production systems in the field and as above it's not just disruptive it's data destructive. The alternative option appears to be to stay put on a dead code base and slowly whither away.

  5. markkulacz

    Reminds me of that shelf expansion issue they just fixed with XtremIO, where you have to remove all of the data after backing it up, flatten the cluster, reinstall the OS ... that is just one other example.

    The problem is that XtremIO is still a new product, with no code written until 2009 and no real customers until 2012. The frequency of disruptions experienced on XtremIO should be no surprise.

    With an architecture that introduces significantly more coupling between the controllers due to the in-memory design, upgrade complexity is going to be an enduring challenge with XtremIO. Symmetrix/VMAX avoid that by using a more simpler slice-n-dice layout; cDOT reduces this by using a more sustainable virtualized aggregated scale-out model. An enterprise product doesn't get coded up over the course of a couple of years, and EMC is just insulting an entire industry (even many of its own employees) by suggesting otherwise.

    If you need enterprise and extreme performance - get a cDOT FAS solution with all flash. It is the only, 6 9s, flash-optimized (naturally by being a LSF variant), proven scaleable storage product in the world. WIth more features than anyone can grasp.

    I am an employee of NetApp. My comments do no reflect the opinion of my employer.

    1. Anonymous Coward
      Anonymous Coward

      I agreed with everything you'd said especially about insulting the rest of the industry. That was until I reached the bottom and you mentioned Netapp cDot, flash optimized and 6 nines in the same paragraph :-)

      EMC will be lucky to hit 3 nines with all of the issues they've had on XtremeIO recently, it's a good job they have that always on $1million guarantee and of course the flash Customer rescue program will be invaluable....oh wait !

    2. Anonymous Coward
      Anonymous Coward

      "With an architecture that introduces significantly more coupling between the controllers due to the in-memory design, upgrade complexity is going to be an enduring challenge with XtremIO."

      If you take the time to read the last couple of paragraphs Chads article effectively validates this statement with the link between DRAM and addressable capacity. Also the fact they're looking to shrink the footprint and simplify the components reiterates how rushed to market this really was.

      So long and thanks for all the beta testing folks.

  6. Dave Nicholson

    It is hard being the adult company in the room

    The criticisms of XtremIO sound like Apple zealots laughing at the IBM PC. In 1981. We all know how this story plays out. XtremIO introduces all of the features that matter...integrated into its advanced architecture...and it ends up dominating the all-flash array market. In the meantime, over-funded start-ups struggle to meet unrealistic growth expectations. Most simply disappear. Some are acquired. Congratulations to the people who made personal fortunes playing with other people's money. Remember when you had to reboot Linux to rescan a SCSI bus? Speed bump. This is a speed bump.

    1. Anonymous Coward
      Anonymous Coward

      Re: It is hard being the adult company in the room

      " XtremIO introduces all of the features that matter...integrated into its advanced architecture."

      Really and those advanced features are ?

      XtremeIO has only just introduced snapshots and is now following up with compression that forces a format of the entire system and subsequent destruction of the data. Does it handle online upgrades, quality of service, even basic replication ? Nope.

      As for the advanced architecture...white box servers, infinband switches and UPS for DRAM hold up hardly qualifies..

  7. virtualgeek

    Disclosure - EMCer here (Chad). Chris, thanks for the article (though seriously, that headline , and FWIW - it's nice to have linked to my post. I'm a big believer in transparency and disclosure.

    Commenters (and readers)... man - I wouldn't put much personal trust in people without enough confidence to share their identity, and if they share their name, if they don't have the confidence to disclose any affiliations - I think the same applies.

    I have no such reservation. I have the data on the XtremIO customer base. XtremIO customers are happy. Our customers partners are giving us great feedback (including healthy critiques). Availability stats across the population (far more than 1000 X-Bricks out there and growing unbelievably fast) are amazingly high. We are far from perfect, and always looking to improve - so feedback welcome. I guess for some there's no way to compete without being a hater.

    Yes, this is a disruptive upgrade (certainly not a Data Loss scenario as one commenter notes), but I've detailed the "why", and the "how" on my blog post that is linked to in the article. If you want to see disclosure, and transparency, there you have it.

    It's notable how much commentary is one vendor going at the other here in the comments, and how little there is of the customer voice. Seems like in the industry we all like to navel gaze - but I suppose that's the way. At least we're passionate :-)

    To the NetApp commenters in here - I'm frankly a little shocked. Is the 7-mode to C-mode migration a "data vacate and migrate event"? It is. You are in the middle of a huge migration of your user base - with a fraction of the FAS user base on the clear future target and everyone else navigating a disruptive upgrade which is exactly analogous (and triggered by same cause that I note in my blog post). Further, when you point to ONTAP for this story which is about AFAs... I have to presume (assume - making an ass out of me) that customers looking at AFA solutions from NetApp are directed to the e-Series (with the engenio storage stack until FlashRay GAs - neither of which is an NDU from ONTAP. I have a great deal of respect for NetApp as a company and for their IP - this isn't about "NetApp sux" (they don't) - rather I'm scratching my head how you could make those comments without your heads exploding from cognitive dissonance, but hey - that's just my opinion :-)

    And, to one of the many "anonymous cowards" - in particular the one that commented on my blog post being "misdirection"... that's not misdirection, I'm **REALLY THAT VERBOSE AND LONG WINDED** - that's just me :-)

    Best comment IMHO was from MB, likely a customer - good backups are ALWAYS a good idea, and should be done before any upgrade, NDU or not.

    1. M. B.

      I'm actually not a customer anymore (I resell HP, EMC, and Nimble primarily and there's plenty of bug notices to go around), but I was a customer so felt the pain firsthand when a NetApp originally configured as a gateway decided to ignore all of it's natively connected shelves after an ONTAP upgrade. A DS4300 Turbo that would reboot randomly due to a bug in a watchdog timer. A remote InForm OS update which hung an F200 system, requiring a long drive and manual power down. An EqualLogic that wouldn't boot at all after a firmware upgrade, and then ran at half performance for two weeks before it got fixed. A nice, big V7000 system that could barely manage 20MB/s for iSCSI after an upgrade (I can assure you, that was plenty "disruptive" for my users).

      Now when I look at the list above, that's a midrange NetApp, an Engenio-based IBM, a 3PAR, an EqualLogic, and a Storwize V7000. None of those are bad systems. I literally bet business on them being "not shit", but still there were bugs and odd behaviors. That is simply expected. It's why I always backed everything up to my Data Domain, then ran a clone job to send it all to my Spectra library before "non-disruptive" upgrades.

      Now I DID miss the part about this being a disruptive upgrade (my bad), and that is definitely unfortunate. There's degrees of disruptiveness, from simply offlining both controllers at the same time to requiring complete evacuation. I would be pretty pissed about that myself especially if I had nowhere else to put my data and I absolutely had to, risking data integrity otherwise. I'd even be pissed if I just had to shut everything down to reboot a controller pair.

      1. Anonymous Coward
        Anonymous Coward

        Those problems you highlight happen across all vendors after all you can't test for every possible permutation on every implementation. But I think the key is, that these issues are the exception rather than the rule, however in the case of XtremeIO downtime and data destructive upgrades to this level of firmware (and others before) appear to be the rule and up ntil now EMC has managed to keep a lid on it.

        It's not just about the inconvenience either as that can be managed, it's all about the increased risk to my data during the evacuation and recovery. Remember that old tag line EMC where data lives :-) or not..

    2. Anonymous Coward
      Anonymous Coward

      Unusually Honest! Honorable .

      Altho coming from a strong competitor -have to admit your response was admirable. Agreed that sometimes all efforts to avoid breaking the commandment of no table structure changes in storage controllers fails. Most likely there is no one without sin here. In general , if one must make the change , save it for next fork lift generation. Anyway, in a year this story will all be dust in the wind!

      But, as a competitor, it must be said -long term - on a fair playing field -you will lose and we will win :).

      Again , nice response .

      End of discussion... I hope. AC :)

    3. Amazi

      This is not the first time XtremeIO required data wipe with FW upgrade. There is the June article on one russian resource where stated that FW upgrade required data movement out/in (at least EMC/partner engineer insisted on this). Much earlier than upgrade to 3.0.

      And, of course, if dear customer want to upgrade/scale - data wipe again...

    4. Anonymous Coward
      Anonymous Coward


      Yes we've noticed

      However throughout the article you go off at tangents, attempt to drag all the other vendors into, even throwing your own legacy kit under the bus and most worryingly you hint this won't be the last time either.

      That's why it was misdirection and damage control, after all your in technical marketing.

    5. Anonymous Coward
      Anonymous Coward

      "Yes, this is a disruptive upgrade (certainly not a Data Loss scenario as one commenter notes),"

      Really, no data loss, not destructive, then why is there a need to evacuate ALL of the data to another array ?

    6. Anonymous Coward
      Anonymous Coward

      XtremIO Promises & Sharing Identity

      Part 1: XtremIO Promises

      Let's all be honest, EMC bought XtremIO in a pinch to compete in the All Flash Array market. They probably threw several 100 coders at it, because upon purchase. It was lacking major areas being at it's infancy. EMC can afford to recoop. But at the end of the day, when you code at the speed of light, QA is often compromised. And you end up making major compromises. Sales Management, has one thing in mind - "Lets get the XIO box out the doors, and we will make changes accordingly in the field and provide additional free" PS resources to recoop the losses at a later point."

      My biggest question here is, this is the 1st generation product. I can't wait for the HW to change to support the next code releases and present the infamous Tech Refresh. This is still an evolving platform. It's not ready for availability. No customer should use this if they aren't assured 5;'9s of availability. VDI is a different story. But database up-times are critical in nature and this non-sense of using backups to an array upgrade is not a compromise, it's a sacrifice. I think if guarantees are posted on EMC's website, customers should light the fire and say, "Wait, this is not part of your pitch!". Performance isn't everything. Reliability is KING!

      Part II: Sharing Identity

      Content on the internet is best anonymous. Why would you want to put your name out there in the first place? Everyone will read this post and make there own opinion out of it. Doesn't matter if I have a high school degree or an industry storage veteran. You will make your own opinion from comments and concerns from anyone. #whocares

  8. Anonymous Coward
    Anonymous Coward

    Glass House

    There a few things you can reliably count on, one of them is NetApp folks to be at the same time delusional, arrogant, and completely oblivious to their own epic failing.

    Dude, the storage world's biggest disruptive upgrade is cDOT.

    1. Anonymous Coward
      Anonymous Coward

      Re: Glass House

      XtremeIO Enterprise High Availability- No Compromises

      "Uptime & Non-Disruptive Upgrades: XtremIO eliminates the need for planned downtime by providing non-disruptive software and firmware upgrades to ensure 7×24 continuous operations."

      That was last week :-)

      1. Anonymous Coward
        Anonymous Coward

        Re: Glass House

        "Uptime & Non-Disruptive Upgrades" and they're still claiming the same

    2. bitpushr

      Re: Glass House

      As I mentioned on Chad's blog, this is comparing apples to oranges. As I mentioned on Chad's blog, I work for NetApp.

      Upgrading NetApp 7-mode from 7.2 to 7.3 to 8.0 to 8.1 to 8.2 has not required you move the data off, blow everything away, and move it back.

      Upgrading NetApp cluster-mode from 8.0 to 8.1 to 8.2 (to 8.3) has not required you move the data off, blow everything away, and move it back.

      Going from 7-mode to cluster-mode is not an upgrade -- it is a transition. It *is* disruptive, as you correctly point out. But it is, for all intents and purposes, a different OS compared to 7-mode. Comparing NetApp's (disruptive) _transition_ to ExtremIO's (disruptive) _upgrade_ is disingenuous.

      1. Man Mountain

        Re: Glass House

        But you're not developing 7 mode any more so it's a transition that customers HAVE to make if they want to stay with NetApp. Yeah, it's not quite the same as the XtremIO but it's not that different either,.

        1. bitpushr

          Re: Glass House

          I agree a little and disagree a little more. By your rationale, VNX to VNX2 is a forced transition, because they're not developing VNX any more.

          The differentiator in this case is that cDOT is so comprehensively different from 7-mode. Much different architecture, much different capabilities, different feature set, different shortcomings, different points of focus, etc.

          The closest thing I can come up with is that *if* you had to wipe your system to go from DOT 7.3 7-mode to 8.0 7-mode. Once done you got better features, better efficiency etc. It was a change but not a fundamental one. (In reality, you didn't wipe your system to do this; just a HA takeover/giveback.)

    3. Bcraig

      Re: Glass House

      Well that's arrogant in it's own right. I think it's fairly obvious that CDOT is not an upgrade but a platform change. I have not "upgraded" one unit, always implemented during tech refresh and its the last disruptive code update the customer will do.

  9. david666

    I was shocked to learn this about XtremIO, and so will you

    Think of it as a new platform and suddenly this becomes a non-issue.

    I bet that nobody would complain if XtremIO delivered this version in 3 years, requiring new hardware: x2 performance, compression, major architecture overhaul, what not to like?

    Paradoxically, letting customers preserve their hardware investment while accelerating delivery of major improvements is upsetting to some of the commentators here. But I'm sure no-one here has any ulterior motives, right?

    PS apologies for the title - but if buzzfeed, sharable and half a dozen other sharing websites can use misleading bombastic titles to attract readers, why can't I? :-)

    1. Anonymous Coward
      Anonymous Coward

      Re: I was shocked to learn this about XtremIO, and so will you

      Well I wasn't shocked in the slightest, I've been hearing about these issues for a while now, but the EMC marketing juggernaut has so far managed to keep a tight lid on the issue.

      "I bet that nobody would complain if XtremIO delivered this version in 3 years, requiring new hardware: x2 performance, compression, major architecture overhaul, what not to like?"

      What like EMC did with the forklift upgrade from VNX to VNX2, hey it's all about the software optimizations, but you guys can't have them:-) The major architecture overhaul is pretty much what Chad said in his post, so I'd expect XtremeIO customers will be in for a lot more disruption down the line.

      "But I'm sure no-one here has any ulterior motives, right?"

      Such as yourself David with your one and only post to the reg forums.

  10. Anonymous Coward
    Anonymous Coward

    Chad misses the point: EMC lied to it's customers

    Chad: You are only being partially transparent and ultimately disingenuous. The other side of this is that EMC blatantly lied about non-disruptive upgrades.

    It's also not for you to fall on the company's sword. I'm certain you were not responsible for this and any level of transparency/disclosure/apology from you, while generous, is meaningless. EMC as a company and trusted business partner has to take ownership of this.

  11. Anonymous Coward
    Anonymous Coward

    Customer success stories - Due diligence

    Free tip of the day if you are looking at flash storage for your environment:

    Before you buy XtremeIO, run a real-world application test for your environment and compare the results. I am not showing a preference for any one vendor, that is for you to determine.

    1) Determine in advance what your application performance requirements are, do not let the vendor dictate this for you with case studies. *Every environment is different.*

    2) Pick three vendors to do a bake-off. Set up a sample (real world) application environment behind a DMZ and let each vendors configure and optimize their wares. DO NOT LET THE VENDORS DETERMINE THE PARAMETERS OF THE TEST ENVIRONMENT UNDER ANY CIRCUMSTANCES. EMC pre-sales engineers have canned configurations which do not represent real-world use cases and are specifically tuned to mask footprint/performance problems.

    3) Independently (and privately) test each configuration.

    I have seen this now dozens of times and have not once seen XtremeIO ever won based on performance vs footprint vs cost. XtremeIO has ALWAYS required more footprint and ultimately more cost to get the same performance and cost of ANY other competitor. From my experience, the only IT shops buying EMC are existing customers who will never move off of EMC regardless of the testing.

    But don't take the word of an anonymous poster - test it yourself.

    1. virtualgeek

      Re: Customer success stories - Due diligence

      Disclosure EMCer here.

      I guess I'm tilting at windmills on the "anonymous posting" topic, and hey - it's a free world. I think the strength of an argument is proportional to someone's willingness to personally stand with it (nothing to do with who you are, or degrees as someone suggested). I just think an argument doesn't make as much sense without context (does the person making it have an agenda?) That's why personally - I think disclosure is right.

      Re this comment on benchmarking, I personally completely agree. In fact, Vaughn Stewart (Pure) and I did a joint session this set of topics (trying to be vendor neutral) at VMworld (and will repeat in Barcelona in Oct), and in essence outlined the same point:

      1) Don't trust any vendor claims. Benchmark.

      2) Don't let any vendor steer you in benchmarking. Even if their bias is non-malicious, they will have a bias.

      3) we warned the audience - good benchmarking is NOT EASY. Sadly most people take a single host/VM, load up IOmeter with a small number of workers and just run for a few hours. While that's data - that ain't a benchmark.

      Some of the steps needed to benchmark properly:

      a) run for a long time (all storage targets have some non-linearity in their behaviors). As in days, not hours.

      b) a broad set of workloads, at all sorts of IO profiles - aiming for the IO blender. Ideally you don't use a workload generator, but can actually use your data and workloads in some semi-real capacity.

      c) you need to drive the array to a moderate/large utilization factor - not a tiny bit of the capacity you are targeting, and all AFAs should be loaded up, and then tested. Garbage collection in flash (done at the system level or the drive level) is a real consideration.

      d) you need to do the benchmark while pressing on the data services you'll use in practice.

      e) ... and frankly, doing it at a scale that actually discovers the "knee" in a system is pretty hard in the flash era (whether it's AFAs or software stacks on all SSD configs). It's hard to drive a workload generator reliabily past around 20K IOps. That means a fair amount of workload generators and a reliable network.

      Now - I feel confident (not arrogant) saying this, and have been through enough customer cases of all shape and size to willingly invite that opportunity.

      ... but, I'll note that very, very few customers have the capacity or the time to benchmark right. Some partners do. The feedback Vaughn and I gave to the audience was "if you can't do the above right, you're better off talking to a trusted partner, or talk to other customers like you at things like a VMUG".

      Now changing gears - one thing in this comment put a huge smile on my face :-)

      I can tell you for a *FACT* that EMC SEs are NOT all running around with a "fake workload generator" trying to deviously get customers to test to our spec... LOL! While there are ~3500 EMC SEs, only a tiny fraction (55) are setup to support a customer that wants to do a PoC. Most PoCs are supported by our partners. I can tell you that we are not organized enough, or well led enough to have all the 3500 able to set the proverbial table, and execute PoCs with a fake workload generator. And frankly those 3500 have a hard (but fun!) job. They need to cover the whole EMC portfolio (and be knowledgeable on the EMC Federation of VMware and Pivotal at the same time) as well as know the customer... Phew!

      Wow - if that's what our success in the marketplace is being ascribed to - well, go right ahead and think that :-)

      ...if 3500 people sounds like a big number - when you overlay the world with it, it's not. Heck the 55 people able to support a PoC barely covers 1 per state, and there are 298 countries in the world we cover! Thank goodness for our partners!

      I'll say it again - the EMC SEs are AWESOME humans, but they are not organized enough, or well led enough to be that devious - ** and that's coming from the person singularly responsible to organize them and lead them ** :-)

      What **IS** true is that we found people really struggling to benchmark. We wanted to OPENLY, TRANSPARENTLY share how we do it, and welcome feedback (crowdsourcing it) if there was input, or a better way. This tool (which aligns pretty well with the IDC AFA testing framework) is here:

      If people can point to a better benchmark, I'm all ears!

      1. Anonymous Coward
        Anonymous Coward

        Lol mr. ears

        How about you publish spc benchmark for all your arrays like hp,ibm,net app,hds,etcetera. As I recall customers can't even talk about benchmarks and outages without involving lawyers. Or I have a great idea for a benchmark run them and have them do a code upgrade to show how those impact a workload? You know the ndu as seen in the youtube high availability video. A million iop benchmark is great but what is the impact to the environment for failures or code upgrades.

      2. Anonymous Coward
        Anonymous Coward

        Re: Customer success stories - Due diligence

        If you are so eager in OPENLY and TRANSPARENTLY doing benchmarks, then what is wrong with the industry standard SAN benchmarks backed by just about every other vendor in the market place, that EMC has never participated in.

        I am thinking SPC-1 and SPC-2 benchmarks. While we can agree that the benchmark will never perfectly resemble neither the configurations or the actual workloads from an actual customer environment, at least they will hint at the realism behind vendor claims of performance and allow customers to somewhat compare different platforms. That is why everybody else, who believes they have a well performing product, participates. Makes you wonder...

        Whats the point in OPENLY and TRANSPARENTLY discussing benchmarks, if you have decided never to OPENLY and TRANSPARENTLY publish the most relevant "industry standard" benchmarks? I am sure, for competitive analysis, that you will have performed such benchmarks internally.

        1. kalenx2

          Re: Customer success stories - Due diligence

          @Anonymous Coward- SPC tests are pretty much a joke. Shoot - even Huawei claimed SPC superiority at one point. If that is possible, what is the real validity of such a test. Sort of a waste of time in my opinion. Of course, I am biased, since I work at Load DynamiX where we think of ourselves as the only real storage testing solution.

          By the way, I hope you don't mind, but I borrowed a bit of your content from these comments for a blog post of my own. While you and Chad don't seem to see eye-to-eye on a few things, you seem to agree on needing to run real world application tests. That is actually our forte. I don't want to spam the comments with the URL so just let me know if you are interested, I can send you a link.

          1. Anonymous Coward
            Anonymous Coward

            Re: Customer success stories - Due diligence

            While I sincerely hope that noone invests in storage solutions based on SPC results alone, at least SPC tests offers some kind of a baseline, that will allow you to compare systems.

            SPC benchmarks have one good thing. Transparency and the ability to compare. Not just IOPS, but more specifically response times. And at what workloads reponse times increases. While better/more specific benchmarks are around, at least SPC will give you a rough indication of a platform being low-, mid- og highend by your own definitions - rather than vendor claims.

            A real life example would be that people would be protected against buying VPLEX, based on assumptions and vendor claims that IOPS and response times on writes would be anywhere near other enterprise equipment. The stories I have heard on real-life performance are horrifying, and I have no doubt an SPC-test would shed some light over bottlenecks or bad designs.

            1. Matt Bryant Silver badge

              Re: AC Re: Customer success stories - Due diligence

              ".....SPC benchmarks have one good thing. Transparency and the ability to compare....." At best, SPC should be used as an indicator of performance capability. Far too often it is a gamed benchmark. Unless your business is running the SPC benchmark (and I've yet to find a company that does so) then it is largely irrelevant. The best and only real benchmark is to test in your environment with real data and processes. I would suggest you make the vendors participate in a proof-of-concept where you set the rules and goals (do not let any vendor dictate them to you), and make the vendors pay to prove their product to you.

      3. kalenx2

        Re: Customer success stories - Due diligence

        @Chad - since you asked, there is a better "benchmark" out there. It isn't actually a benchmark per se, but a full on testing solution built specifically for storage testing (block, file, and object - not just a one off point tool). It does everything you detailed here including scale (way more than 20k iops, more like millions), realism, broad workloads, and even content control to exercise dedup & compression. I say it isn't a benchmark because benchmarks only provide general data. This solution goes a step beyond and emulates a customers production workloads so it helps answer that question storage vendors love to hear, "Ya, but how is going to perform in MY environment?"

        Full disclosure - I actually work for the company providing this solution. We are an EMC partner (as well as many other vendors). We actually help vendors and end customers do extreme scale testing - ongoing and for POC's. It is funny, you pretty much nailed our messaging on why testing is so important. So I hope you don't mind, but I stole a bit of your material from this post to put on my blog as well (plagiarism is the highest form of flattery, no?). Here is a link for anyone interested:

        Basically, the gist of it is, any and all firmware should be tested prior to rolling out to production. As someone who has been in the storage industry for a few years, I have seen first hand bad code hit the streets or datasheets put out by marketing that don't exactly match reality (putting it nicely). No vendor, I repeat - NO VENDOR, is free from fault. The good news is that there are many smart customers out there who are building a strong testing practice to verify what solution is best for them. The bad news, there are even more that do no testing. We are hoping to change this by making scale testing easier, transparent, and unbiased.

    2. bitpushr

      Re: Customer success stories - Due diligence

      If working for a storage vendor has taught me one thing.. it's that benchmarking is hard. Someone here (I think Chad?) said, "there's more to benchmarking than making a LUN and pointing iometer at it" and that is 100% accurate.

      I have lost count of the amount of times a customer has approached a benchmark of their new NetApp kit by saying "I mounted a filesystem and then run "dd" against it."

      If your business makes money by using dd, this is probably a fine way to do it. But I haven't seen one of those businesses yet...

  12. thegreatsatan

    Lets be honest with ourselves

    There is a big difference between a non disruptive upgrade, and an upgrade that requires a data migration / restoration event. Far too many times I've seen people take the vendors word for it when it comes to the upgrade/update process only to get burned down the road. The onus is on the purchaser to ensure that they fully vet a solution(s) that are being considered for purchase. Any decent storage admin worth his salt would have a set test routine to put a solution through its paces, as well as the proper questions to ask. As always the devil is in the details and its up to the purchaser to do their own due diligence.

    If you get bit by something like this its because you didn't do your homework.

    1. Anonymous Coward
      Anonymous Coward

      Re: Lets be honest with ourselves

      Whilst I agree the onus is typically on the Customer when it comes to validating non disruptive upgrades the key difference between the high end vendors on these claims is typically minor. But in this case EMC are purposely using the wrong terminology in an attempt to minimize the damage to themselves.

      This is NOT just a disruptive upgrade, it's a data destructive upgrade, requiring swing kit and a data migrations back and forth and as well as the inherent risk I assume some impact on performance throughout this process. Chads suggestion of staying on the old code base is unsustainable and so bogus.

  13. virtualgeek

    Disclosure, EMCer here.

    Anonymous Coward,

    We are absolutely helping customers (along with our partners) through exactly that (in addition to svmotion). All noted in my public blog, which I'd encourage you to read (and comments welcome - though I'd suggest disclosure).

    The commitment to support people on 2.4 you may not agree with, but some customers are electing to stay there for the foreseeable future. Our commitment to support them is certainly not bogus in their eyes.

    1. Anonymous Coward
      Anonymous Coward

      And all of this for free ?

      All well and good, can we assume all of this help in the form of project and migration planning, logistics, onsite engineering, swing kit, support etc from EMC and their partners will be all free of charge to existing XtremeIO Customers ?

      Those customers who have elected to stay on 2.4 shouldn't be take as some glowing endorsement of EMC's product or support. It's more likely they can't stomach the downtime and the risk required for the data migration, lets face it they're more hostages than volunteers.

      1. This post has been deleted by its author

    2. R8it

      What did you just say about vSAN 2.0?

      Chad, just read your blog article on this. I'm staggered that you said that the upcoming VMware vSAN1.0 to 2.0 is going to be a Data Destructive upgrade as well. Really?

      This is a new approach for EMC in comparison with the being the steady partner of the past (whether EMC is some type of "Star Fleet" Federation, or not). It's beginning to look like a trend if you consider the VNX2 migration as well.

      The shields are way down, and the Federation is becoming susceptible.

      1. virtualgeek

        Re: What did you just say about vSAN 2.0?

        Disclosure - EMCer here (Chad)

        No - that's not what I said (and people can go read the blog for themselves to verify).

        VSAN can be a non-disruptive upgrade to the underlying persistence layer BECAUSE you can use svmotion and vmotion to vacate workloads non-disruptively. Aka - a small amount of "swing capacity" is created by clearing data, new structure, swing workloads back.

        BTW - the "hyper converged players" (Nutanix, Simplivity, EVO:RAIL partners) do this as well. It's handy (and frankly an approach that can be used broadly to avoid what would otherwise be disruptive).

        Why can this always be used in those models? Well - because all their workloads are VMs.

        You **CAN** version metadata (this raises other engineering tradeoffs), but when you change the format/structure of on-disk structures, it involves vacating data. VSAN 2.0 will have some on-disk structure changes, but I would wager (will defer to VMware) will use this "rolling workload move" to make it non-disruptive (although data is getting vacated through the process)

        It's a joke to anyone on here that claims they are beautiful and flawless (30 seconds and google "disruptive" + "vendor name" - I'm not going to wade in the muck by posting links like that for every of the vendors piling on here) - so I'd encourage all vendors to have a little humility. Customers don't like people who go negative.

        The trade off for us was: "do we save the new features for customers with new hardware" (aka more RAM for more metadata), or "do we give the features to all". We chose the latter. Hence why we continue to support 2.4 for years to come. AND we also chose to plan for swing hardware and services to help customers in the migrations. Frankly, I'm pretty proud of how EMC is approaching this difficult decision, and thinking of the customer first.

        I'm sure the haters out there and competitors will disagree - but hey - so be it.

        Don't be a hater :-)

        1. Matt Bryant Silver badge

          Re: virtualgeek Re: What did you just say about vSAN 2.0?

          "....It's a joke to anyone on here that claims they are beautiful and flawless (30 seconds and google "disruptive" + "vendor name" - I'm not going to wade in the muck by posting links like that for every of the vendors piling on here) - so I'd encourage all vendors to have a little humility. Customers don't like people who go negative......" MAJOR FAIL! Actually, what us customers really hate is when a vendor tries to excuse their failing by pointing fingers and saying 'but look, vendor X has bugs too'. Seriously, when the system is down at 2am on a Sunday, I really couldn't give a deep-fried shit if anyone else I don't know using a different vendor's kit may be having an issue, I just want you to fix the issue that your kit does have and is losing me production time and profits. Do you really think any of the customers facing this destructive upgrade with XtremIO are going to be cheered up by you FUDing NetApp or anyone else?

          1. virtualgeek

            Re: virtualgeek What did you just say about vSAN 2.0?

            Disclosure - EMCer here (Chad)

            Matt - I absolutely agree that first and foremost is what are WE doing.

            Our 100% focus has been the customer through this. Again, I'm not going to convince any haters, but this is the essence:

            a) We have a rapidly growing user base of XtremIO.

            b) We had features we know would require more metadata than we had in the current generation of hardware (compression, and performance improvements)

            c) The internal debate was long, and hard. Should we deliver the new capabilities to the existing install base, or only to customers who buy future hardware with more RAM?

            XtremIO's architecture of always storing metadata in DRAM (vs. paging to on-disk or storing on SSDs) is an important part of linear behavior always. Conversely, It does mean that total X-brick capacity and features is directly related to the DRAM capacity and the on-disk structure (which relates to the amount of metadata).

            People can (and are absolutely entitled!) to second-guess our decision. We decided the right call was to:

            1) make the capability available to all (existing customers and those in the future) - which requires an persistence layout change.

            2) to do it quickly, as this is a very (!) rapidly growing installed base.

            3) to ensure that this change would carry us through all upcoming roadmapped releases.

            4) to build into the plan (seriously - we have) budget for field swing units (capacity to be deployed to assist on migrations), as well as for EMC to absorb the services cost, and wherever possible help with non-disruptive svmotion at scale (where the workloads are vSphere VMs)

            5) to commit to support the happy 2.4 customers (which are legion) for years to com if they want to stay there..

            This is the first disruptive upgrade (in spite of some of the earlier comments) of GA code. I agree, we should have changed the on-disk structures prior to releasing the GA code - that's 100% on us.

            That all said - I'm proud of how the company that I work for is dealing with this: actively, quickly, and with the customer front and center in the thinking.

            Now - on to the "never go negative" point, what I was saying Matt was this: anyone who has been around the block has seen difficult moments in every piece of the tech stack, from every vendor. As important as anything else is: *how the vendor deals with it*. This is true in the storage domain, the networking domain, the PaaS domain, the cloud domain - whatever.

            If any vendor feels they are immune to issue, and make their primary argument about "the other guy" - customers generally (in my view) have a negative reaction - because they know better. That's it.

    3. Man Mountain

      You're helping them fix a problem that you created? That's good of you!

  14. Matt Bryant Silver badge

    Bad block size choice = bad design.

    Simple as that. Every UNIX system I have ever implemented, I have made a point of getting the application people to state what block size they wanted and put it in writing, because changing block size later means recreating the file system = downtime and overtime = a budget-owner looking for a scapegoat. If any application developer said 'we'll we want to start with a 4k block, but we might want 8k in future', I'd point him to the budget-owner and tell him to justify the additional costs and downtime to the budget-owner. I would definitely not be so stupid as to let a system go into production with the potential trouble just ignored.

    It seems bizarre that the XtremIO developers could have had compression on the roadmap and not have foreseen this problem earlier, which begs the question did EMC keep schtum, sold the product as having a 'non-disruptive upgrade capability' and yet knew this problem was coming? If I was an XtremIO customer (thankfully, I'm not) I'd be asking EMC just how long they have known this problem was coming and then looking back for the dates the EMC's (and/or partners') salesgrunts were making promises of said non-disruptive upgrades.

  15. CaterhamKing

    You Joke?

    Seriously. EMC employees trying to justify this are only making it worse. It's a joke.

    Nutanix upgrades it's software routinely during business operations. Before someone comments about the magnitude of this XtremIO code release, Nutanix changed their OS in their CVM a while back, non-disruptively, nobody noticed.

    EMC, just come out, admit all your sales guys misrepresented the solution with respect to never needing a disruptive upgrade, apologise, and ensure that if your customers do it and stay with you now, whatever you deliver ensures they never have to do it again!

  16. Anonymous Coward
    Anonymous Coward

    EMC XtremIO no.1 in Gartner MQ with a product like this :-)

    Can you trust a Gartner MQ with result like that or is it just a matter of who pay most!

  17. Bcraig

    SO...looking into the crystal ball. I read that EMC is planning on rollling Recoverpoint to some extent into the XIO in V4 or V5. To all the EMC connected folks, does it look like there will need to be platform (hardware) replacements for that? Another thought I had was that given the Bricks have to be expanded with identical units, are we to shitcan the original bricks in the next year or so when the only option it to buy newer bricks with highter capacity SSDs? Lots of thoughts that would give me pause on purchasing XIO if I were a customer.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon