"Reg readers expect their apps and online services to just work 24/7"
I think Reg readers are among the few who *don't* expect that actually.... Unless I missed the sarcasm...
South Korea's two largest domestic internet companies, Naver and Kakao, have experienced significant service interruptions after the datacenter that hosts much of their infrastructure was shut down by a Sunday fire. The datacenter in question is operated by SK C&C, one of the many arms of South Korean conglomerate SK. SK C&C …
There are three types of companies.
The first one without any disaster recovery strategies. First disaster they will most usually either become a No2 or No3 depending on the severity of the situation.
The second one with untested disaster recovery strategies. Usually flips over to a No3 with a major outage and their DR strategy is not up to scratch.
And then the third one with tested disaster recovery strategies. Recovers from most major incidents. But it takes time and money to keep the DR recovery strategies tested and up to date, but it is money well spent, and the enemy of this strategy is a miserly beancountery type or miserly bossly unit, who thinks that it is just a waste of good money and winging it will works out the best.
It's not feasible to duplicate every bit of hardware/software for DR. But it *IS* important to have a plan that keeps things moving, even if it is slower than usual, so that the spice can flow.
It's also important to test it. That gut wrenching moment when you drop power to a switch that forces your primary system to choke for a moment, then the DR kicks in and it goes to the redundant. A live test. On real data. So that when the excrement collides with the atmospheric redirection device, you know the cat pictures and actual work data keeps moving.
The Japanese govt. responded in a similar way when their phone system stopped working and one of their big banks kept having problems. Govt. enquiry. Finger wagging. See it doesn't happen again.
They just don't get it. Digital is less resilient than physical. Star topology is less resilient than distributed services. Any tech service will eventually be bricked, probably by an update.
And if you put all of your eggs in one digital basket, it become much easier to brick everything, completely.
Incidentally, having national services that differ from everyone else, really isolates you in a networked world. Other nationalist regimes would love to be able to cut their citizens off in such a fashion.