Re: Disaster Recovery - they've heard of it
Several work hats ago we (IT) got one of those "sounds really simple, but ..." one liners from manglement: "write disaster recovery plans for IT". IIRC it was the insurance auditors who'd been through and seen a box not ticked.
I had a bit of an idea, but almost by chance had the opportunity to get on a decent business continuity course - which was quite an eye opener. So when I got back, I asked manglement some basics: what's your recovery time objective ? and a few others. The answer: "stop being awkward and just get on with it".
As pointed out in the course, there's no point planning to be able to recover the IT within (say) 2 days, if the business will be dead if it's out for 6 hours - it would be a waste of time and money doing it as the business would be dead anyway. Conversely, no point planning the IT stuff to have a recovery time of (say) 12 hours (quite tricky and expensive) if the business is such that you could take a week (simpler and a lot cheaper) without killing the business.
The pipeline and medical cases others have cited are good examples. The recovery times for the systems just didn't match the requirements of the business (or it's customers, or the public) - so the business manglement was negligent in not having properly assessed that and put the "right" technical and process measures in place. That's not just "recover the IT faster", but also things like "how do we operate a basic service without the IT ?" - in the case of the pipeline for example, that could have been "have a plan that if the brown stuff hits the fan, we can deploy people to local control points to manually operate the systems".
With current work hat on, we deal with systems where if the brown stuff hits the fan, people can die - quickly (in some cases it's a matter of minutes). As you can imagine, a lot goes into the safety cases and engineering. But it's not just the engineering, the people who operate the systems are highly trained and routinely practice all manner of scenarios so that if (or given the complexity and nature of the environment, when) something happens - people will "just deal with it" rather than go into headless chicken mode. And while there's centralised control (essentially one person can operate everything from a chair), everything has local control options.
So no, "our systems had to be back up quickly" is NOT an excuse for paying, it's an admission of (at best) incompetence, at worst wilful disregard for public safety.