Re: The single points of failure?
I agree, the power excuse is likely to be a red herring / BS. It's also an easy target for software guys who investigate a failure and are tasked with providing an RCA. I was once asked by a Sysad to check the power to a particular server that had gone offline; the server PSU's were fine, as were rack PDU's and all other upstream power systems. However, in the absence of any other evidence (such as syslogs etc), the guy still reported the cause as power failure at the data centre. The most likely cause IMO was that the server had either shut itself down, or been shutdown accidentally by a Sysad (or DBA 'trusted' with the root password).
Back to BA: your #2 scenario sounds feasible and may tie in with the early reports of incorrect data (on boarding passes), due to corruption. The same outcome could also be due to database corruption caused by a hosed disk partition. The bottom line though is probably: 1. Poor HA design, 2. Inadequate / absent testing, or 3. Deficient routine maintenance.
Note also that when large, complex systems are outsourced, knowledge transfer to the new company / teams is often skimped or even overlooked in the rush to get the business and operations transferred. As others have pointed out, you will never get back the depth & breadth of experience you had in the original system designers and custodians. Even if you re-hire some of them at consultancy rates, they still have to liaise / battle with the new guys, whose work culture may be quite different.