The problem with modern data center design is attempting to solve redundancy through networking. The safest design approach is hub and spoke. Three separate clusters built using hub and spoke simplifies the network design greatly.
Then, all services should be run as a cluster in all three locations. Instead of fighting to keep a service running at all times in each cluster, simply fight to make sure that at least one cluster is operating at all times.
This design sounds expensive, but consider the decrease in network and fabric and interface costs and the cost of a third cluster is negligible.
This is not 1994. All server software today is designed to operate in n+1 where n is at least 2... using a system like Azure Stack, Kubernets, Mesos, etc... are all more than capable of ensuring a properly operating server service in this design. Also, when moving to a NoSQL + Object Storage environment and possibly a scale out file server for legacy applications means that N+1 storage can be easily handled on gigabit Ethernet.
This design eliminates the need for vMotion/live migration, eliminates the need for SAN and decreases design costs on CapEx as well as OpEx across the board.
Designing based on a “VMware is the only way” approach increases costs at least 10 fold. It increases cost of hardware, software and more. In addition, it makes it so the platform is generally designed from the aspect of how an IT crew would see it without any understanding of what services are actually needed by IS.
A recent study I participated in showed that more than 95% of the cost of building and operating a data center was the actually based purely on the cost of building and operating the management systems for a data center. By rethinking the design based on operations of IS, we could increase uptime substantially and decrease costs even more.
The moral of the story is, friends don’t let friends let IT people anywhere near their data centers.