Wow!
I gotta gets me one of those hover-gondolas!
Days before VMware's virtualization-fest at the virtual Venice of Las Vegas' Venetian Hotel, CEO has issued a second letter of apology to customers affected by the ESX update last month that crashed their virtual servers. VMware has found that this foot-shooting episode's effects have been exacerbated by Microsoft's price- …
Three days to "repair" the time out code ?
If this was something new they added then the first port of call should have been to remove it, not dick around trying to repair a largely unnecessary "feature".
Thumbs down for the focus on trying to make a licence model work rather than getting paying customers up and running.
"His update engineers screwed up big time by issuing an ESX update in August that contained destructive time out code. This caused many VMware users' licenses to abruptly expire and their virtualized server worlds came crashing down to Earth."
If you're going to quote supposed fact, please get it right. The timeout didn't cause things to stop, shutdown or 'crash to Earth'. It simply meant that any powered off VMs couldn't be powered on again. There was also a fairly simple time-shift workaround that many clients were able to employ until a fix was released the next day.
Given that this is the first and only 'major' problem exhibited by this product, and one that didn't in fact cause the hypservisor to 'Crash' as you stated, I think that you are making a mountain out of a molehill.
Let's look at the facts;
What we had here was a BETA release of vmware ESX.
Now, any proper admin that tries to run a beta hypervisor in thier production network deserves everything they get. EMC really should have left any sys admins who were stupid enough to do this to their own devices.
There is no need to make this scenario even more unsanitary for genuine users by injecting a destructive time bomb into said beta code, especially when there is a good chance that this time bomb will end up in subsequent production code.
This is the problem. Beta code is supposed to represent a potential future release of production code with any bugs identified during the beta test rectified before any actual release.
What EMC has done is hide a time bomb in their beta which beta testers cannot expect to know about let alone test and then they FORGOT to remove the time bomb when they went to gold status.
OOPS.
This is nothing but extreme incompetence coupled with an unhealthy desire to protect themselves from imaginary unscrupulous users.
If Microsoft had have done this they would have been rightly ridiculed by the entire tech community.
And I say this as a card carrying Microsoft hater as well as a vmware fan.
The point is that my fanboi allegiances are irrelevant. Companies (especially those who purport to service fortune 500 companies) should never infect their own code with such negative "features" in a pathetic attempt to catch the occasional rogue admin who thinks it is a good idea to run their production servers on beta code.
EMC/VMware deserve all the ridicule they get.
that's a bit rich. As I understand it, the macrovision licensing tool allows you to centralise licensing for your clusters. Now the ESX servers rely on their licenses, or they enter a period of grace in which running VMs run, but none can be powered on from cold boot. this means that people running witghout a license has their eval time, but cannot run beyond it indefinately; and legit buyers can lose their license server and suffer only mildly reduced working conditions until they bring it back on line or install the macrovision tool elsewhere and point their Virtual Center master server to the new one.
In this case, retrograde code got in through a bad change management or QA, that looks like it was designed to stop beta editions running past a certain point. but the got got leaked into production versions, and this stopped new VMs form coming online. A workaround was released, and they published workround options the same day while testing the code on a fix for the code.
an earlier poster seems to think that the timebomb was added deliberately - i don't understand why, nor do I understand why this is described by the author as a hypervisor crash, when if anything, it was either a beta code or licensing balls up.
One last thing, @ GoatJam, despite the 80something %, EMC =! VmWare surely...
With reply to Goat Jam: The release was not a Beta. VMware has a completely different release mechanism for betas.
An accurate description of the problem is that VMware forgot to take out the expiration date lockout from a prior beta when they released a general update for all customers, and which all customers of their production software were pushed to upgrade to if they were using the automated update manager. VMware then didn't realize their mistake for several weeks until the timeout occurred. And even worse, gave completely unrealistic fix estimates when informed of the issue (36hrs for a critical patch to an enterprise product is not acceptable).
As for Danial Gold's comments, you are leaving out some other aspects:
- Impacted customers were unable to vmotion across clusters until patches were applied (for any reasonably sized cluster, this is a significant impact).
- There was significant confusion about the production impact of changing the date of the ESX hosts
- Given that VMware was not aware of what was going on until many of their customers were impacted and had expanded significant effort tracking down the cause of vm's being unable to restart, to state that customers should have applied workarounds is completely wrong. VMware didn't publish the workarounds until after many clients had already spent hours trying to figure out what was happening.
In brief, VMware completely screwed up, and their new CEO is exactly right to apologize again.