back to article Someone has fixed the ESX 'VM stun' problem

Comtrade's latest HYCU Nutanix backup product version has a fix for the VM stun problem. An ESX virtual machine is quiesced (stunned) when a backup is taken. This can cause IO latency problems as IO requests come in to the "stunned" VM, are stored, and then consolidated into the stunned VM when the backup completes and …

  1. Anonymous Coward
    Anonymous Coward

    This has been fixed forever, by anyone using storage level snapshots i.e. everyone with a half decent integration.

  2. MrBoring

    I thought this issue was fixed with esxi 6.0 anyway?

    1. Anonymous Coward
      Anonymous Coward

      VSAN snapshots got sort of fixed in 6.0

  3. Anonymous Coward
    Anonymous Coward

    Veeam had this a while back, implemented with Cisco and Nutanix AHV

    Veeam have had this sorted for the best part of a year with their integration with Cisco HyperFlex - https://www.veeam.com/blog/cisco-hyperflex-snapshot-integration-availability-suite-9-5-update-2.html

  4. sathackr

    If you're taking storage level snapshots then you aren't flushing the disk cache and your snapshots are only crash consistent.

    The stun is there for a reason.

    1. baspax

      LOL

      Remind me to never give you responsibility for any of my data.

      There is no uncommitted write cache lingering in the hypervisor, per definition.

      1. Anonymous Coward
        Anonymous Coward

        Re: LOL

        There's an OS running in the hypervisor that potentially *does* have uncommitted write cache. And apps with unwritten state, etc. Back in the days of real sysadmins, we used triggers in the backup software to quiesce apps and flush that data out before a backup. You dolt.

        1. Anonymous Coward
          Anonymous Coward

          Re: LOL

          Dolt yourself, dude. The times of MS-DOS and FAT are over. Your filesystem is no longer relevant. All applications and OLTP systems write with commit to the underlying IO stack

        2. Anonymous Coward
          Anonymous Coward

          Re: LOL

          It goes something like this:

          1. ISV tells vCentre to take a snap

          2. vCentre calls VSS or similar in the guest to flush buffers

          3. VMware takes a snap

          3. ISV tells the array to take a snap

          4. VMware deletes its snap

          No need to merge lots of deltas as the VMware snap only existed for a few seconds, everyone has been doing it forever.

  5. Anonymous Coward
    Anonymous Coward

    Oh For F's Sake!

    Can you and Nutanix already get a room?

    Every time these yahoos rediscover something that's been done for years, it's being presented as if they've landed a rocket on Mars. Same with their product announcements, always blown out of proportion as if they have secretly developed Oracle 12 in only 6 months when in reality they slap a GUI onto some OSS and release a minimum viable product into a mature market.

    If I want to read press releases I can go onto vendors' homepages and get the blatant bullshit spiel. Why don't we get at least a paragraph putting this into perspective, comparing it to existing technologies and market players, how this measures up to what's already tried and true?

    Several posters already mentioned that this is nothing new, so let's give it a try: Traditional storage arrays with every single backup vendor have been doing this since the age of time. Holy shit, ever heard of NetApp SnapManager?? EMC Replication Manager?? Fuck me, IBM TSM, Netbackup, Legato, Commvault, and yes, Veeam, all integrate with NetApp, EMC, Hitachi, Pure, Nimble, and so on.

    See? Wasn't so hard. But let's go deeper:

    It's the wannabe players with delusions of grandeur who think their shit doesn't stink and fancy themselves "cloud scale" (barf) and "enterprise" who apparently couldn't get their coked up heads out of their asses for how many years? Six? Seven? Yes, Nutanix, looking at you.

    VSAN was always a piece of crap when it came to snapshots. 6.0 kind of fixed it but it scares the shit out of me what perverted code debt lurks underneath. One would expect a company the size of Vmware with it's track record of fostering a healthy ecosystem would know how to do proper integration. What's up, Gelsinger?

    So tell me, how come the "newcomer" Cisco was able to have Veeam and Commvault integration with native snapshots basically just few months after launch and all those self proclaimed "Enterprise" HCIA vendors didn't or still don't?

    1. SniperPenguin

      Re: Oh For F's Sake!

      I cant upvote this enough....

  6. Dwarf

    Stunning

    How this got through testing, particularly given that its a key step in the backup process.

    1. Anonymous Coward
      Anonymous Coward

      Re: Stunning

      Testing? This is devops, you're doing the testing!

  7. Anonymous Coward
    Anonymous Coward

    Can’t Simplivity do a full copy in seconds? No additional software or limited snapshot?

    1. Anonymous Coward
      Anonymous Coward

      Any storage array software worth their salt can do that. Nutanix however is souped up Linux fs (ext2) with some data hauling (data locality) in the background. The distributed system is good for large but slow data sets (see Cohesity which is basically the same tech, written by the same guy). They are using caching to mask that, with the competition blowing past them in performance and stability more extreme caching is required to keep up. See acquisition of failed distributed caching company PernixData.

      Instant clones or snapshots has been standard tech for 25 years.

  8. amanfromMars 1 Silver badge

    Unbelievable ..... but true. J'accuse.

    Comtrade says it can solve this issue because it uses Nutanix storage-level snapshots instead of the usual hypervisor-level snapshots.

    Oh, really? I disagree and posit that the issue is not solvable, it is a systemic feature/exploitable OS vulnerability/SMARTR bug which can only be managed and/or mitigated.

    Comtrade are being economical with the truth ..... or are ignorant of the truth in regard to this particular and peculiar matter.

  9. Anonymous Coward
    Anonymous Coward

    This has been solved before...

    1. Backup vendors have been doing this for years...

    2. Snapshots stun a lot less in 6.0 (Mirror driver was implemented) on VMFS. 6.5 switched to SparseSE Snapshot format if anyone was keeping score. You'll notice a significantly lower stun/issue on high write VM's if you would upgrade off of your 5.5 environments.

    3. vVols solves this by never needing to take a hypervisor snapshot.

    4. Veeam has done this integration with a million different storage vendors.

    The "Snapshot Suck" presentation at VMworld went over this, or the vVol/vSAN presentations at VeeamOn discussed this.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like