Re: Operational Failover is incredibly complex
Agreed. Plus, N+1 systems are based on educated (sic!) guesses of what loads will be - which they won't as systems just grow to fill the capacity provided. So, when N+1 become N (e.g. you lose a high density compute rack, then everything fails over - typical VMs then restart will more resources than they had when they were running normally and the hypervisor had nicked all the empty ram/unused CPU cycles. So, now nothing fits and everything starts thrashing and crashing domino style...
At some point you'll be looking to turn the whole thing off and on again - assuming you've documented that process properly, and nothing been corrupted, ...