Re: 9.5 hrs of downtime
yes single data center. No DR plan. I've never worked at a company in my career (24 years) that had a workable DR plan, even after near disasters. Everyone loves to talk DR until costs come up then generally they don't care anymore. At my current company this happened every year for at least 5-6 years then people stopped asking.
Closest one to have a DR plan actually invested in a solution on paper due to contract requirements from the customers. However they KNEW FROM DAY ONE THAT PLAN WOULD NEVER WORK. The plan called for paying a service provider(this was back in 2005) literally to drive big rig trucks to our "DR site" filled with servers and connect them to the network in the event of a DR event. They knew it wouldn't work because the operator of the "DR site" said there's no fuckin way in hell we're letting you pull trucks to our facility and hook them up(they knew this before they signed on with the DR plan). They paid the service provider I think $10k/mo as a holding fee for their service.
That same company later deployed multiple active-active data centers(had to be within ~15 miles or something to be within latency limits) with fancy clustering and stuff to protect with DR. Years after I left. One of my team mates reached out to me joking they were in the midst of a ~10 hour outage on their new high availability system (both sides were down, not sure what the issue was I assume it was software related like Oracle DB clustering gone bad or something).
Another company I was at I was working on a DR plan at the time, it was not budgeted for correctly, I spent months working on it. While this was happening we had a critical storage failure that took the backend of production out for several days. There was no backup array, just the primary. It was an interesting experience and I pulled so many monkeys out of my ass to get the system working again(the vendor repaired the system quickly but there was data corruption). Most customers never saw impact as they only touched the front end. I got the budget I was fighting for in the end, only to have the budget taken away weeks later for another pet project of the VP that also was massively underfunded. I left soon after.
Current company had another storage failure on an end of life storage system, and guess what the IT team had NO BACKUPS. Accounting data going back a decade was at risk. Storage array would not come up. I pulled an even bigger monkey out of my ass getting that system operational again(took 3 days). You'd think they would invest in a DR or at least a backup array? I think so. But they didn't agree. No budget granted.
Rewind to 2007ish, hosting at the only data center I've ever visited to ever suffer a complete power outage (Fisher Plaza in Seattle). I was new to the company and new to that facility. It had previously experienced a power outage or two, various reasons, one was a customer hit the EPO button just to see what it would do(aftermath was all new customers required EPO training). Anyway I didn't like that facility and wanted to move to another facility but was having trouble getting approvals. Then they had another power outage and I got approvals to move fast. I remember the VP of engineering telling me he wanted out he didn't care what the cost was and I was literally at the end of the proposal process and had quotes ready to go. Moved out within a month or two. That same facility suffered a ~40 hour outage a couple of years later due to a fire in the power room. The building ran on generator trucks for months while they repaired it. It was news at the time even "Bing Travel" was down for that time they had no backup site. Several payment processors were down too at least for a while.
I read a story years ago about a fire in a power room at a Terremark facility. Zero impact to customers.
Properly designed/managed datacenters almost never go down. There are poorly managed and poorly designed facilities. I host some of my own personal equipment in one such facility that has had several full power outages over the past few years(taking the websites and phone systems of the operator out at the same time), as far as I know no redundant power in the facility which was designed in the 90s perhaps. Though it is cheap and generally they do a good job. I wouldn't host my company's equipment there(unless it was something like edge computing with redundant sites) but for personal stuff it's good enough. Though sad that there are less power outages at my home than at that data center.
Amazon and other hyperscalers generally build their datacenters so they CAN GO DOWN. This is mostly a cost exercise, doubling or tripling up on redundancy is expensive. Many customers don't understand or realize this. Some do and distribute their apps/data accordingly.
As someone who has been doing this stuff for 20+ years I believe people put too much emphasis on DR. It's an easy word to toss around. It's not an easy or cheap process. DR for a "data center" makes more sense if you are operating your own "server room/datacenter" on site small scale for example. But if your equipment is in a proper data center(my current company's gear is in a facility with ~500k sq feet of raised floor) with N+1 power/cooling, and ideally operated by someone who has experience(the major players all seem to be pretty good). The likelihood of the FACILITY failing for an extended period of time is tiny.
To me, a DR plan is for a true disaster. That means your systems are down and most likely never coming back. Equipment destroyed, someone hacks in and deletes all your data. Outages such as power outages or other temporary events do not constitute a need to have or activate a DR plan. But it really depends on the org, what they are trying to protect and how they want it protected. 99%+ of "disasters" can be avoided with proper N+1 on everything. Don't need remote sites, and complexity involved with failing over, or designing the app to be multi data center/region from the start as the costs of doing that are generally quite huge, for situations that almost never happen.
I've been involved with 3 different primary storage array failures over the past 19 years(all were multi day outages in the end) and having an on site backup storage array with copies of the data replicated or otherwise copied would address the vast majority of risk when it comes to disasters. But few invest even to that level, I've only worked at one company that did, and they didn't do it until years after they had a multi day outage on their primary storage array. I remember that incident pretty well I sent out the emergency page to everyone in the group on a Sunday afternoon. The Oracle DBA said he almost got into a car accident reading it. Seeing "I/O" error when running "df" on the primary Oracle servers was pretty scary. That company did actually have backups, but due to budgeting they were forced to invalidate their backups nightly as they used them for reporting. So you couldn't copy the Oracle data files back, at least not easily I don't recall what process they used to recover other than just deleting the corrupted data as they came across it(and we got ORA crash errors at least 1-2 years after though they only impacted the given query not the whole instance).