Re: Too hard, too frequent, too unreliable
The 5 nines uptime figure is all about availability not downtime.
So if you have have 2 sets of machines where either can handle the full load, you take set 1 down and patch/test.
Then bring that into production and repeat for set 2.
I have seen services run at 99.999% when components have been down for a couple of hours that week for patching, because this was planned maintenance work.