@various
So many posts, and some interesting insights emerging. I know not of the IEC standards (which is a failure on my part) but then nor did most of my colleagues in the safety critical world of DO254 and DO178 (aviation), so I'll stick to aerospace.
"The big problem is that no matter how much you expand your list of chosen inputs, there are an infinite number possible so you can never test them all."
Not infinite, but quite possibly inconveniently large (especially from the point of view of a bean counter). What can we do to address that?
Well. choosing your test input values carefully may help. What does that mean?
Random (Monte Carlo) style choice of inputs is an option already mentioned, but the space of test inputs gets big quite rapidly, especially if time dependencies come into the picture. And you still have no idea whether the important cases have been covered.
Time dependencies (including things like data stored from a previous iteration) are quite inconvenient from a testing point of view.
One can look at the code's decision points and choose values appropriate for testing the various possible outcomes of the various decisions. This can be done manually or with tools. In suitable combinations, preferably. It's usually a big number of combinations and a lot of testing but it's a lot less than infinite.
When looking at decision points, does one look at the unmodified uninstrumented source code, or the binary end result that's actually been generated by the tool chain? Knowing that they match is another question entirely - sadly some test tools deliberately make their own (unnecessary and irrelevant) modifications to the program being tested, therefore making it impossible to know that what's been tested matches what's been shipped, either in source or binary forms.
That said, looking at source is relatively simple and relatively compatible with bean counters (and also frequently suits tool vendors), but when push comes to shove, processors don't execute source [1], they execute binaries.
It is in principle possible to do static analysis of some classes of binary, spotting the decision points (ignoring most of the rest of the code) and generating test inputs accordingly. You don't need a simulator of the whole processor for this, and it may be substantially quicker than some other options.
Something similar is possible by running the code in a more comprehensive simulator/emulator and tracing the decision points, but the requirement for a more comprehensive simulator is sometimes inconvenient.
These two approaches both rely on the fidelity of the analysis/emulation/trace tools. You can eliminate the dependence on tools by running in real systems, but in the embedded market, the real systems tend to be sufficiently slow that testing becomes a proper chore, even if it's offshored to a place where time is very little money.
A wise mix of the the various options would often make sense from an engineering point of view but the additional time and cost has often been unacceptable to Management I have known.
Did anybody mention that time-dependent effects at run-time are a challenge?
None of which is a replacement for having clued up design and verification teams in the first instance, but such teams tend to be inconveniently expensive, and can risk delaying the project (and expanding the budget) beyond what Management have promised.
[1] A username round here mentions Forth. Maybe someone should look at something FORTH-like as a language for high criticality software. Simple, compact, maybe even safe (albeit quite possibly impenetrable to the average contractor and bean counter). Arguably doesn't even need a trustworthy compiler, certainly doesn't need a complex untestable unprovable compiler.
"software is impossible to test; no matter how thorough the testing you do, you're trying to prove a negative by searching every part of an infinite space."
How about "testing cannot demonstrate the absence of errors, but it can show when they are present.". NB not just software testing, but software does make life particularly tricky, especially as the failure modes of a software-based system are quite hard to predict.