Re: Fixing the symptom…
Yes. FP maths on all modern CPUs has to conform to the IEEE.754 standards. This is why the Pentium FPU bug in the late 1990s was such a big embarrassment for Intel.
Well, IEEE.754 is a storage format for real numbers + some rules for operations. There's no guarantee that one's CPU will actually uses it (even though it is quite common), and no guarantee that the FPU or vector unit will actually produce answers as accurate as those theoretically possible for values stored as IEEE.754, or that the FPU and vector units will produce identical results. There's been plenty of CPUs that store numbers as IEEE.754 but explicitly (that is, if one reads the data sheet in fine detail) guarantee to not achieve exhaustive arithmetical accuracy for operations performed on them. AltiVec and the Cell processor were one. I'm pretty sure that Intel have done the same, on occassion.
The reason why this might be a factor is that arithmetical errors introduced early on in a string of calculations can have a bigger impact on the result than if they're introduced later. With CPU designs that take short cuts with FP arithmetic the error magnitude is data dependent; the same operation with different data produces a different magnitude arithemtic error. So this application (which I am of course boldy and possibly erroneously assuming is heavy on floating point math) may reasonably produce a different answer with files processed in different orders, even if the mathematics theoretically doesn't care about order, when run on a processor that doesn't guarantee exhaustive accuracy, or even consistent inaccuracy.
Calculating the theoretical numerical error expected in the results produced by one's application by reading the CPU datasheets is hard work, and has to be reassessed every time you change the code or run it on a different model of CPU. Without this one cannot say that there's a statistically valid difference between answers such as 173.2 and 172.4, or whether either are close to being correct. This is much more likely to be a problem if the calculations involve a lot of data and a lot of cumulative calculations, which a lot of science applications do.
I've previously seen a couple of science applications written that made errors with the use of FPUs on modern computers. Overflow is easily run into, which generally occurs silently (so there's nothing to tell you that something has gone wrong). To be confident that an application is performing reasonably correctly one has to be quite defensive, e.g. checking that input, intermediate and output vectors don't contain + or - INF, testing of extremities, etc. Checking that one's code has adequately implemented the required maths on a specific data set (i.e. not just test datasets) can be a big job, and takes an especially nerdy computer boffin who is prepared to really look at what a CPU actually does.
You can even run into problems with library changes. Libraries that once used to be running on the FPU might have been optimised in later versions to run on the vector unit (SSE, AVX, or whatever). On vector units the arithmetic accuracy achieved can be worse than that on the FPU, in the interests out outright speed.
Of course, a lot of applications don't care (games, etc), and that's just fine; modern CPUs are all about being good enough for the bulk of applications. However, it doesn't suit all applications. Intel and chums do not pander to the 0.01% of users, they design to impress the 99.99%. The result is that almost all developers out there, whether professional software devs or scientists competent in python, never have to give a moments thought to whether or not floating point arithmetic in their applications is actually correct, because for the vast majority of time it's Good Enough.
This sort of problem shows up in various areas of computing. Financial calculations generally avoid IEEE.754 altogether, IBM going so far as to have a decimal representation of real numbers in their more modern POWER architectures, and adding in a decimal math co-processor for good measure. There's even a GNU library for such representations. When dealing with $/€billions, the precision of IEEE.754 is not adequate to get sums rights down to the cent, and one thing that annoys a banker when converting dollars to euros by the billions is getting the cents wrong.