About Ariane 5...
More thorough testing could have caught the problem.
More thorough testing could ALWAYS have caught the problem. This is an "empty" truth.
As Bertrand Meyer says (among others) in "The Lessons of Arian"
Is it a testing error? Not really. Not surprisingly, the Inquiry Board's report recommends better testing procedures, and testing the whole system rather than parts of it (in the Ariane 5 case the SRI and the flight software were tested separately). But if one can test more one cannot test all. Testing, we all know, can show the presence of errors, not their absence. And the only fully "realistic" test is to launch; this is what happened, although the launch was not really intended as a $500-million test of
More relevant was a software config error.
Particularly vexing is the realization that the error came from a piece of the software that was not needed during the crash. It has to do with the Inertial Reference System, for which we will keep the acronym SRI used in the report, if only to avoid the unpleasant connotation that the reverse acronym could evoke for US readers. Before lift-off certain computations are performed to align the SRI. Normally they should be stopped at -9 seconds, but in the unlikely event of a hold in the countdown resetting the SRI could, at least in earlier versions of Ariane, take several hours; so the computation continues for 50 seconds after the start of flight mode -- well into the flight period. After takeoff, of course, this computation is useless; but in the Ariane 5 flight it caused an exception, which was not caught and -- boom.
More interesting, William Kahan has this take in https://people.eecs.berkeley.edu/~wkahan/JAVAhurt.pdf
A commission of inquiry with perfect hindsight blamed the disaster upon inadequate testing of the rocket’s software. What software failure could not be blamed upon inadequate testing? The disaster can be blamed just as well upon a programming language ( Ada ) that disregarded the default exception-handling specifications in IEEE Standard 754 for Binary Floating-Point Arithmetic. Here is why: Upon launch, sensors reported acceleration so strong that it caused Conversion-to-Integer Overflow in software intended for recalibration of the rocket’s inertial guidance while on the launching pad. This software could have been disabled upon rocket ignition but leaving it enabled had istakenly been deemed harmless. Lacking a handler for its unanticipated overflow trap, this software trapped to a system diagnostic that dumped its debugging data into an area of memory in use at the time by the programs guiding the rocket’s motors. At the same time control was switched to a backup computer, but it had the same data. This was misinterpreted as necessitating strong corrective action: the rocket’s motors swivelled to the limits of their mountings. Disaster ensued. Had overflow merely obeyed the IEEE 754 default policy, the recalibration software would have raised a flag and delivered an invalid result both to be ignored by the motor guidance programs, and the Ariane 5 would have pursued its intended trajectory. The moral of this story: A trap too often catches creatures it was not set to catch.