Been there, done that
No problem. The US tech had simply grabbed the disk from the PC running the testing and copied it to the other computers. "Obviously it worked because they are all up and running," he said.
I had a somewhat similar experience, although thankfully lesser in scope.
As in the article, I was working on a testbed simulator for a safety-critical system that was currently running in production. To be clear, this was a simulator to test an upgrade which had not been deployed yet.
We had multiple client X machines that communicated with server Y. Many of the X machines were mobile, and field testing of the upgrade showed that many X machines intermittently either lost communication, or had communication errors (duplicated, out of sequence, or lost messages). Sometimes.
The stationary clients were fine, but no one could determine the cause of the intermittent failure. The systems integrator said it was the hardware, the hardware vendor showed the hardware tests that proved the communications systems worked when stationary, so it had to be the mobility aspect, the mobility supplier ran pings and traceroutes showing it couldn't be them, so obviously, it had to be our proprietary protocol. Our protocol guys pointed out that we didn't even know if an X was mobile or stationary, so how could the protocol fail only on the mobile clients?
To investigate the problem, I essentially wrote a customized ping command to test. My ping didn't use ICMP, though, it used our proprietary protocol. I made it a master/slave application, where the master would timestamp a message packet, give it a master sequence number, and send it to the client. The client would in turn timestamp the arrival time, give it a slave sequence number, and send it back. When the master received it, it would log it to disk. The operators could configure the protocol's send rate, packet size, etc.
The idea was to try it in the field, identify which clients/mobility locations were the problem, and play with the frequency and protocol payload to narrow down what was going on.
Of course, with all the time stamping and optimizations going on, there were several issues (time syncing was particularly troublesome), and although version 1 worked as a proof of concept, they wanted more features, so I got a budget for version 2.
And then a field report came in. My tool had reported a catastrophic loss of communication in an entire range of client machines where there was actually no problem with the actual upgrade application. So obviously my tool was crap, the customer had no faith in us because we had no clue what we were doing, how could we be trusted with safety-critical systems, etc. I had endangered the entire multi-million dollar project, we would all be out on the street, our children would starve, and all because of me. The project manager wanted blood, specifically mine.
Hmm. Send me the logs, I said.
Looking at the logs, I noted that the master and slave packet formats were slightly different, which made no sense, because masters and slaves were paired. I was extending the packet format for version 2, but it hadn't been released yet. This looked almost like a version 1 master trying to communicate with a version 2 slave. So I asked them to confirm the MD5 checksums of the master and all slave versions in the field.
Sure enough, there was a version 1 master, and the offending slaves didn't match version 1. What on earth had happened?
It turned out that the project manager had been in our test lab, and seen a couple of the machines where I'd been testing version 2. He liked what he saw, so he made a copy of the software (the unreleased, work in progress software, which was being debugged) from the lab machine, and when he returned to site, he gave it to the field testers, who then ran with it.
Not surprisingly, when this came to light (not to the customer, at least), there was a reshuffling, and there was a new project manager.
I promptly added two more features to the software. First, during the initial handshake, masters and slaves verified that they were each running compatible versions, or they stopped talking. Secondly, I added a time/date check, and the software would only work for 5 days after compilation, after which it would post an onscreen message saying it was expired beta software. I disabled that check in the formal release, but it put a stop to project-ending reports from the field.