The link to the workaround article doesn't work.
A compatibility issue between VMware's ESXi hypervisor and Windows Server 2019 will leave some customers unable to safely snapshot their virtual machines. A Register reader tipped us off this week that the newest edition of Windows Server is causing some admins to encounter show-stopping errors when making snapshots of their …
So, what are the use cases for this sort of feature? Are systems these days so overly complex they have no mechanism to suspend themselves "safely"? Reminds me of the old email server no one is allowed to reset because no one knows if it will come back up or not.
Using a virtual machine should make disaster recovery easier as it's not hardware dependent, but now the software itself is so touchy it has to be loaded to the exact same machine state?
This seems like the opposite of a recovery plan.
Are systems these days so overly complex they have no mechanism to suspend themselves "safely"?
It's a single check-mark to get consistent data from the entire VM with only a fraction of a second of slowdown during a snapshot, rather than having to do all kinds of application-specific commands to ensure they're all good.
The alternative (and I would think a passable workaround) is to just snapshot the VM's memory as well, so it comes back up and running in exactly the same state, with all that dirty data still in memory.
This post has been deleted by its author
PROTIP: Use as little Microsoft Technology as possible.
Microsoft appears to have loved to deliberately introduce incompatibilities into their products at the expense of competitors. They've done this for decades and I've experienced it firsthand with Novell and Windows. A constant battle to work around the new road blocks that they put into competitors way, accidentally of course, it's only because they don't have time to test other vendors flawed solutions.
It's called VSS, but it's new technology (only available since Windows 2003) so I'm not surprised some vendors haven't gotten around to fully supporting it yet.
Normally I'd expect installing VMWare Tools to provide the conduit between "host wants an application-consistent snapshot" and "call VSS function to quiesce IO properly". It's probably less than two hundred lines of code including proper error checking (I say 200 because I expect the bare call is probably ... 5).
Sort of a misleading post..
The snapshots are used as part of the backup process to have a consistent point in time to get the data. In your link specifically says "This is because the snapshot is used as part of the data movement process to a backup file or a replicated VM. "
At the end of the day you have to determine what you are trying to protect against, and then devise a backup strategy if possible to protect from that.
For my linux VMs (of which I have around 800 of them in production on vSphere), I don't do any VM level backups, just backup the data that we need(at present 99% of it via NFS to HP StoreOnce). Actually I've never needed VM-level backups as I have always felt that is sort of wasteful especially if you are backing up a bunch of systems that are fairly identical to each other in the case of web servers etc. MySQL servers have custom scripts that use Percona xtrabackup to export the data safely to another storage system.
Snapshots absolutely can be backups (short term generally). I rarely use VM level snapshots(and 99% of the time when I do I power the VM off first to make a consistent snapshot faster). As can, gasp, RAID be a backup (protects against disk failure). Storage snapshot (especially file storage) are great for restoring files that were lost accidentally. That is a backup, I mean especially for those time windows where some data could be created, and then destroyed between major backup windows. To have rotating snapshots happening every 5 minutes for X hours, every hour for X days etc.
Some folks don't see a backup as a backup unless it is distributed off site(sometimes to more than one site) and at least semi regularly tested. That certainly qualifies as a very good backup.
But very few have the resources and/or budget to commit to that level of assurance (certainly none that I have worked for nor any that my immediate friends have worked for). I have been involved with several near data disasters caused by software and hardware failures, many of which involved more than 24-48 hours of downtime. In every case to-date at the end of the day the companies opted not to invest significantly more(either in software/hardware or in staff time) to make the backups more robust. In most cases there was some data loss as a result of the failures, though never complete data loss.
I have to believe that many of the folks touting extremely robust backup processes that are fully tested, off site, encrypted etc etc etc are most likely dealing with a very small amount of data in a simple environment. Or are in a fortunate position to have a massive budget available for such a system. In either case I'm sure it is a tiny tiny minority of environments out there.
Too much emphasis in my experience gets put on offsite backups, as if a nuke is going to hit the facility that has your data. Or a big flood or something. This is so incredibly rare. The likelihood of a software failure causing massive data loss (perhaps triggered by a hardware failure) by contrast is quite common.
In nearly 20 years of working with data centers I've only been hosted with one that had a full facility outage. There was a fire in the electrical room that too the site down for I think almost 72 hours. I wasn't hosted in the facility at the time as it had a previous poor track record for power outages. But the point is even if the systems were down for 72 hours (they had generator trucks on site for several months following while they rebuilt the electrical systems), the systems weren't lost. They were down for up to 3 days (including "big" name sites like Bing travel which had no backups at the time apparently), but they came back. That is also literally the only facility I've worked with that ever had a complete power outage. Though where I have the authority I choose good facilities. Having such an outage is terrible of course but it's not a permanent loss.
By contrast I recall an article here on El reg for a similar fire in the electrical room at another facility, I think it was Terremark at the time. They built a good facility, the article said customers never noticed any issues, and they were able to resolve the issue with the fire department with no impact whatsoever.
You apparently misread my post.. My comment regarding 20 years had to do with the need for offsite backups. Ransomware doesn't take down a facility it just encrypts data.
Snapshots certainly can help recover from such an event depending on how they are used. Example is if ransomware encrypts a file share that data is easily recoverable provided you catch it before your snapshot policy starts removing the last snapshots before the ransomware hit.. I recall reading some ransomware attacking windows VSS. i should clarify when I talk about storage its about purpose built systems generally those don't run windows.
Snapshots aren't for everything certainly but they can be a powerful tool. I just wish NAS appliances had the ability to do read write snapshots for data testing(netapp does I believe not aware of any other vendor that can)
As for security intrusions into the network the best policy for that in my opinion is OFFLINE BACKUPS. In addition to whatever dudupe appliance or cloud backup or whatever.. storing the data where it requires physical human interaction to get to it(best example is rotating tape that is physically removed from the drive) .. make sure the intruder cannot wipe out your data because they compromised user or admin credentials.
I remember at a previous job that had everything in a public cloud, realizing that with my admin credentials it would literally be just a few commands(probably in a bash for loop) to wipe out all data and all backups. Now think of the news articles where cloud credentials have been leaked online. So keep a copy of your backups offline if you want to protect against that kind of scenario.
I agree with you in part. But on the other hand good backups are important and not that expensive, in context.
Veeam, as mentioned in the article, is fairly cheap, compared to the outlay for VMWare licenses, Windows Data Center licenses and the hardware. We use a two-stage backup, the VMs are backed up to a NAS and the NAS is backed up to external drives which are swapped out daily and stored off-site. All-in-all, the backup solution probably costs less than 10% of the total cost of the VMWare infrastructure.
At home I snapshot my VMs, copy them onto spinning rust, which in turn is synced to a NAS and backed up to Carbonite, along with all my important data.
To be pedantic, application quiescence is not a feature of the hypervisor, but requires a tool within the guest's OS to commit the data residing in memory and cache to persistent storage so that a storage snapshot, such as a *-snap.vmdk file in vSphere or a snapshot on a storage array, can be used as a quick recovery tool (Application-consistent snapshot). In Windows, that tool is Volume Shadow Copy Services (VSS). The pain of trying to restore snapshots with VSS is much less than the pain of restoring without VSS.
I noticed the snapshot issue in my environment when I moved a couple of Server 2019 VM's from an ESXi 5.5 host to a 6.5 host, but the errors went away when I upgraded the VMware Tools to a newer version. I'm currently using version 10.3.10 (build 10346) on my Server 2019 VMs and no longer getting any errors on snapshots.
VEEAM uses an API call to quiesce the application running in the VM. If you quiesce an application via local script or VEEAM, the result will be the an inconsistent snapshot. Microsoft will need to fix Windows 2019 so applications can be properly quiesced. It is incorrect to say a third party backup application like VEEAM will fixe the problem, it won't.
To be a bit more precice. Veeam has a consistent snapshot by working directly with the OS and the consistency frameworks like VSS to bring everything in a consistent state. Then Veeam triggers a standard snapshot without quiescense (it is already consistent).
Pre and Post script engine from Veeam do not leverage VMware Tools quiescense neither.
Business Continuity Maturity - what scenarios are you attempting to protect against and can you recover from those scenarios in a timely manner.
"Backup" is an all encompassing word that means very little these days. If you can trumpet "yes we have backups" then you're probably screwed. You need multiple options documenting options appropriate to the failure scenarios. Restoring might be one of those options.
Most backups haven't been thought through. But it's relatively cheap to send tapes somewhere and feel warm and cosy even though if you thought it through you are just making tapes and paying people to handle them. They don't do anything.
I was at a FTSE top 50 savings investment org a few years ago overseeing service assurance for a data centre migration transformation. Anyway.
Techies proposed losing tape. Made sense. Business managers got diarrea cos they thought 6 years of tape backup covered their asses.
On investigation the app guy Jerry could remember one occasion in 20 years where he had gone yeah ok I will get a tape and have a look if what the business want is there. It wasn't.
Do what I do, albeit in vmware, not aws. My backup script checks several things:
1) is this a paired system (yes, my DCs are all paired)
2) is this power-off-before-backup system. (yes, its the only /safe/ way to back up a DC)
If #1 is true and if #2 is true, I set a lock and use vmware to shut down the vm prior to the backup.
This keeps at least 1 domain controller online at all times (at each site) and even with the added shutdown/reboot delay, its still quicker than an online backup.
With the advent of dedupe storage arrays, and the caveat that you have to build all of your VMs from the same image, the total space to do incremental backups is, for my usage, neglIgable.
To answer another question posed above. I believe that Reduxio has the ability to revert a disk to a point in time. The issue becomes that you then have to have separate LUNs for each virtual machine.
Also, like has been pointed out above (and before, and forever after), if you haven't successully restored from a backup, you can't consider it a backup.
I recall asking on this very forum, a couple of years back: Why would anyone migrate a working VM to Hyper-V?
In keeping with a long history of motivating migration to their own products by creating pain for customers of other vendors through subtle - or sometimes blatant - forms of sabotage, it looks as if Microsoft have devised a reason for people to migrate their VMs to Hyper-V.
It astonishes me that people tolerate this kind of thing, and have done for decades.
I remember, they did this to Novell constantly as well. Their favorite trick is to offer the product for "free" by bundling it with Windows. It's Microsoft anti-competitive practice that apparently rarely falls foul of regulators/legislators.
Then if that fails to steal enough market share, as you mention, ooops, we introduced a bug, or oooops, does that new API change that works better for us not work well for your product. Strange, well, I guess you better rewrite your product. We've already incorporated that into ours, why are you so behind our superior offering.
so I read this, and we have a handful of 2019 servers in our vmware environment utilizing another snapshot based solution for backups. tbh, I panicked a little bit, until I read the last bit in the VMWare KBA under resolution:
Use an MBR disk layout instead of GPT while provisioning the machines.
I don't know about you guys, but all our stuff is still MBR, there's no compelling reason for us to move to GPT drives yet. (we're a relatively small shop, not super enterprise level where we need vm's with drives that measure in the +2TB range)
snapshots in VMware aren't backups at all, not in the real sense of the word.
You lose your VMDK, your snapshot is worthless. They're short to mid term rollback options, or little lifesavers for easily restoring something relatively simple. They aren't a backup though. you damage or lose your vmdk file and if you don't have an actual backup of *that* it's toast, snapshots or no.
Line one is "do not use snapshots as backups"
Remove them from your thinking completely when you think "do I have backups?".
You can use an Agent based backups in situations like this. For example, with Unitrends you could install their client agent into Windows 2019 and back it up at the Guest OS level. They have a way of taking that agent based backup and then converting it back to a VM, using proactive (pre-recovered ready-2-go) or reactive (instant or regular) restore options.
Biting the hand that feeds IT © 1998–2020