What Microsoft refuses to understand
If you are providing critical infrastructure services to the public, you need to plan for failure and redundancy in order to provide 365x24 service. If you don't, you are remiss and should lose your business. I was responsible for the design of an application development framework (in C++) for semiconductor, disc drive, and flat panel display manufacturing systems that had to support very large installations, thousands of users, and process terabytes of data per day, where 1 hour of down time would cost the customer over $10M USD per hour in lost profits - failure was not an option. We had to design the systems to be failure-resiliant with no single point of failure at the network, system, database, or other system components. This is not easy, but it is possible - if your system has a chip, disc drive, or LCD display, then that software most likely built it...
I left Microsoft shortly after they closed their purchase of Nokia Mobile Phones (where I was working), because they still refuse to understand this. My position was Senior Performance Engineer handling 5000+ servers worldwide, and the software I designed and wrote collected 10 billion data points of performance data per day so we could apply mathematical and engineering algorithms to monitor system behavior and predict when systems, networks, and databases might fail. Unfortunately, most of the people I worked with at Nokia will be looking for new jobs soon... :-(
In my opinion, this failure of Microsoft is inexcusible. I hate to see what will happen to Azure users in the future.