Reply to post: Re: Operational Failover is incredibly complex

BA's 'global IT system failure' was due to 'power surge'

thondwe

Re: Operational Failover is incredibly complex

Agreed. Plus, N+1 systems are based on educated (sic!) guesses of what loads will be - which they won't as systems just grow to fill the capacity provided. So, when N+1 become N (e.g. you lose a high density compute rack, then everything fails over - typical VMs then restart will more resources than they had when they were running normally and the hypervisor had nicked all the empty ram/unused CPU cycles. So, now nothing fits and everything starts thrashing and crashing domino style...

At some point you'll be looking to turn the whole thing off and on again - assuming you've documented that process properly, and nothing been corrupted, ...

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon