back to article Virty servers' independence promise has been betrayed

Server virtualisation in its current state is pretty much done. The market is mature and critical tier one workloads are absolutely at home on tier one virtualised platforms such as VMware and HyperV. Virtualisation tends to be a very useful technology in larger organisations for what most administrators would say are the wrong …

  1. Alistair
    Windows

    I've seen this architected.

    And then spent 5 months raising every red flag I could wave at everyone on the damned project.

    Now they don't understand why changing anything in the environment takes 3 months of planning.

    New platform is being built to replace it. I still have my original emails. Will likely end up trotting them out for the next round.

    Sadly none of the vendors involved decided to step and admit that it was a mistake to do this...

  2. NinjasFTW

    What am I missing?

    I'm not an expert with virtualisation so i'm trying to work out why its so bad to have cluster nodes virtualised.

    The only thing that I can' think of is that you don't want your cluster nodes all sitting on the same host which is resolved with DRS/Affiniity rules.

    I know that dba's usually have objections due to disk access speeds etc which is fair enough but I can't think of any other reason

    1. Alistair

      Re: What am I missing?

      Most clustering software (including OS level) requires direct hardware access to a shared disk somewhere. -- in Vmware language, device passthrough. Once done, typically, your VM is now pinned to a specific node. This applies for most OS clusters, Oracle RAC, Vertias, MCSG and a number of applications I've had to deploy in the past.

      To my knowledge, RH has fixed RHCS for VMware and kvm environments, and the MCSG for linux incorporated a similar fix - I believe that Vertias clusters now can *have* virtual nodes, but any master must NOT be virtual - I've not worked with them recently.

      I still have DBA's talking about putting RAC clusters in virtualization - to whit, we have an exadata coming shortly to solve that issue.

      1. Tom Maddox Silver badge
        Thumb Up

        Re: What am I missing?

        What Alistair said. There's a fairly straightforward workaround to the disk-locking issues, which is to use in-guest iSCSI, but many organizations either don't have iSCSI support for block storage or have fear and loathing about using iSCSI for performance-oriented applications.

        1. FakeIT

          Re: What am I missing?

          iSCSI workaround works in hyper-V and if that's not your cup of tea Hyper-V 2012 & 2012R2 support virtual fibre HBAs on the VMs

  3. GitMeMyShootinIrons

    vSphere 6.0

    You've been able to do MSCS in VMware for a while (I've dealt with it all the way back on ESX 2.5). With 6.0 though, there is support for vMotion of nodes - so they are portable now. You still need RDMs and anti-affinity rules though.

    Tom above is quite correct about iSCSI as a workaround too. This works well on NetApp or Equalogic kit.

  4. John Sanders
    Holmes

    Two thoughts

    Virtualization does not reduce complexity, except in a small number of scenarios.

    Virtual is not a silver bullet for all workloads.

  5. Anonymous Coward
    Anonymous Coward

    of course, there are reasons for strict affinity rules

    If you run snort, or anything else that requires its own NIC, you don't but a new nic for every host in the cluster, two or three should do fine....but this means that your snort box can now only run on the physical nodes that have this additional nic.

    1. Benno

      Re: of course, there are reasons for strict affinity rules

      Or due to software licensing constraints - Oracle, I'm looking at your stupid host licensing model for Enterprise RDBMS...

  6. ecofeco Silver badge

    It's not just VM

    I see this same attitude and disaster happening at all levels of IT.

    Far too many people have been paid to hype the koolaid and it's being drank by the tanker shipload.

  7. BlartVersenwaldIII

    Depends on your clients...

    We do a bunch of multi-tenanted stuff (although technically all sitting under the same umbrella), one business unit in particular was notorious for having a CTO that wanted absolutely everything on an MS-or-something-else cluster (as well as dual bonded virtual NICs), with all the downsides mentioned in the article.

    The really silly factor is that five nines uptime for these systems wasn't even needed (for most of the production systems we get a 3hr maintenance window each week) and yet, because of the many complexities of clustering (especially MSCS which is flaky as hell and especially-with-knobs-on misconfigured MSCS - yeah, insist on using a file share witness in another data centre and see where that gets you) uptime on the clustered systems was way, way lower and incidence of incidents was about three times as high. Elsewhere with our bog standard "here is a bog standard VM" standard we don't get bogged down and reach 99.95% availability (5mins downtime a week) without even trying.

    Given the considerable management overhead this entailed the price multiplier for their VMs was much higher for this misbehaving business unit; thankfully the CTO's replacement is aware of the futility of trying to cluster something that doesn't need to be clustered on a cluster and the clusterfuck of MSCS deployments are something rapidly vanishing in the rear-view mirror.

  8. agnostic_node1

    "So, given these issues, what can you as an admin do? To be honest, your options are limited."

    No they aren't. I object good sir!

    "Sure, the odd virtualised cluster is perhaps acceptable but when you start to scale to hundreds of hosts, thousands of guests and several faux clusters these issues start to become a real pain for the administrator who has to work around them."

    The answer to that problem is not to scale bad/obsolete practices.

    1) You can address apparent virtual clustering difficulties by thinking app-first, rather than infrasturcture-first. The author doesn't mention anything about virtual or physical load balancers hosting FQDNs & proxying SSL, for instance. But if he was building http infrastructure for the Register in 2015, it's safe to say he'd be better served parking the register.co.uk FQDN on several high-end physical or virtual load balancers, and hosting his public A records & C-Names on a DNS host that has Anycast DNS, such that American users like myself resolve to the Register's Yankee datacenters, rather than make a trip across the pond.

    But even if he didn't have those resources, he could pull an IT MacGuyver out of his hat and do poor-man's load balancing: if http server A needs rebooting, delete it's C-Name from your public DNS host. Once it starts back up, re-create it.

    If he has an IT team that uses the RFC-1918 or public IP address to access & manage the application and its infrastructure, beat the team members on the head with a large heavy stick. If they repeat, or start mentioning the IP address to users, beat them until hospitalization is required by law. Follow this up by posting a fatwa banning such practices.

    Management's got a point....making hardware much much less important is what the last 5-8 years has been about in IT.

    2) The column doesn't mention the concept of what we in the Microsoft kingdom (others call it something else) think of as availability sets, or what network guys might call failure domains. Essentially, why bother with guest virty clusters if you can guarantee that multiple instances of an app's Storage + Compute + Network resources have no hardware in common and are accessed via FQDN? App1.domain.com is thus in Avaibility Set 1, which consists of VMs on-prem, in my Azure-west or AWS west region, and that whole enchilada is replicated active/passive to Availability Set 2 on the East coast, or (super-sexy) is Active/Active with AS2

    "The faux clusters usually utilise shared SCSI bus technology and sit on different hosts. “Big wow,” you may say but it has a direct and detrimental effect on the ability to manage a cluster."

    Bingo! +1, retweet, Facebook Like!

    3) I don't think about SPC-3 disk reservations, block storage vols as witness/tie-breaker, or RDMs anymore, either on physical or virtual clusters, which did make virty clusters hard. Why? Fault-tolerant SMB 3 file shares fixed all those headaches; I can use \\myawesomefileshare\path1 or an Azure service as a witness to resolve a cluster dispute. I can even host my SQL databases on an SMB share, so who the hell wants the baggage entailed with a block storage volume? It's almost to the point where I only need block vols for boot...I'm sure NFS 4 has similar features; relief from the pains of unique-as-a-snowflake block vols is within reach of the Everyday IT Grunt.

    4) The author doesn't mention the difference between fault tolerance & high availability, which is important for his readers to understand. The former is hard & expensive to do on a physical host & guest, the latter is much easier to achieve. Most businesses when pressed will really only need the latter.

    Let me pose a thought experiment. Take a 3,000 seat enterprise. Imagine it is 100% virtualized. Now imagine it runs Active Directory for identity/authentication etc. Now further imagine that all AD Domain Controllers are virtual.

    Is that AD Domain for that business more like a virtual cluster hosting an application on shared resources and offering HA, or is it more like a distributed application that is fault tolerant? What happens if 30 out of 31 virtual AD servers falls over? Can your users still get a krb ticket, identify themselves to your resources, and do their work?

  9. agnostic_node1

    "So, given these issues, what can you as an admin do? To be honest, your options are limited."

    No they aren't. I object good sir!

    "Sure, the odd virtualised cluster is perhaps acceptable but when you start to scale to hundreds of hosts, thousands of guests and several faux clusters these issues start to become a real pain for the administrator who has to work around them."

    The answer to that problem is not to scale bad/obsolete practices.

    1) You can address apparent virtual clustering difficulties by thinking app-first, rather than infrasturcture-first. The author doesn't mention anything about virtual or physical load balancers hosting FQDNs & proxying SSL, for instance. But if he was building http infrastructure for the Register in 2015, it's safe to say he'd be better served parking the register.co.uk FQDN on several high-end physical or virtual load balancers, and hosting his public A records & C-Names on a DNS host that has Anycast DNS, such that American users like myself resolve to the Register's Yankee datacenters, rather than make a trip across the pond.

    But even if he didn't have those resources, he could pull an IT MacGuyver out of his hat and do poor-man's load balancing: if http server A needs rebooting, delete it's C-Name from your public DNS host. Once it starts back up, re-create it.

    If he has an IT team that uses the RFC-1918 or public IP address to access & manage the application and its infrastructure, beat the team members on the head with a large heavy stick. If they repeat, or start mentioning the IP address to users, beat them until hospitalization is required by law. Follow this up by posting a fatwa banning such practices.

    Management's got a point....making hardware much much less important is what the last 5-8 years has been about in IT.

    2) The column doesn't mention the concept of what we in the Microsoft kingdom (others call it something else) think of as availability sets, or what network guys might call failure domains. Essentially, why bother with guest virty clusters if you can guarantee that multiple instances of an app's Storage + Compute + Network resources have no hardware in common and are accessed via FQDN? App1.domain.com is thus in Avaibility Set 1, which consists of VMs on-prem, in my Azure-west or AWS west region, and that whole enchilada is replicated active/passive to Availability Set 2 on the East coast, or (super-sexy) is Active/Active with AS2

    "The faux clusters usually utilise shared SCSI bus technology and sit on different hosts. “Big wow,” you may say but it has a direct and detrimental effect on the ability to manage a cluster."

    Bingo! +1, retweet, Facebook Like!

    3) I don't think about SPC-3 disk reservations, block storage vols as witness/tie-breaker, or RDMs anymore, either on physical or virtual clusters, which did make virty clusters hard. Why? Fault-tolerant SMB 3 file shares fixed all those headaches; I can use \\myawesomefileshare\path1 or an Azure service as a witness to resolve a cluster dispute. I can even host my SQL databases on an SMB share, so who the hell wants the baggage entailed with a block storage volume? It's almost to the point where I only need block vols for boot...I'm sure NFS 4 has similar features; relief from the pains of unique-as-a-snowflake block vols is within reach of the Everyday IT Grunt.

    4) The author doesn't mention the difference between fault tolerance & high availability, which is important for his readers to understand. The former is hard & expensive to do on a physical host & guest, the latter is much easier to achieve. Most businesses when pressed will really only need the latter.

    Let me pose a thought experiment. Take a 3,000 seat enterprise. Imagine it is 100% virtualized. Now imagine it runs Active Directory for identity/authentication etc. Now further imagine that all AD Domain Controllers are virtual.

    Is that AD Domain for that business more like a virtual cluster hosting an application on shared resources and offering HA, or is it more like a distributed application that is fault tolerant? What happens if 30 out of 31 virtual AD servers falls over? Can your users still get a krb ticket, identify themselves to your resources, and do their work?

    If your answer is "yes, they can still do their work," then you know what thinking app-first is all about.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like