Re: Once again, Single Point of Failure failed
There are a lot of things that could go wrong within an Availability Zone that could cause an application to fail.
If you want your application to survive anything bad enough to take out zone, then you need to architect it to use at least two zones, and it's your problem to make sure you've addressed every component and service needed to make it work.
If that's still not good enough for you, then you need to look at using two separate regions, again making sure to address every component and service needed to make things work.
It quickly gets complex and expensive to cover off every possible risk. And the more complex it gets, the more the risk of something going wrong.
And if there's an event bad enough to take out an entire region, of a level more than a comms glitch, do you really think your staff are going to be rushing to sort the problem or frantically combing though the rubble looking for their loved ones?