Re: The two extremese of testing
At a previous job, I built some SMTP relays. They were highly tuned Linux servers with lots of disk space for mail spools. I published benchmarks, something like 1 million messages per day per server.
Years later, another team took over responsibility for email. Their tech lead made a Great Project to replace this old low-budget infrastructure with world-class appliances. He got test gear and quotes from 3 vendors, any of which utterly eclipsed the performance (and cost) of the Linux relays. Someone got me invited to help test.
My first suggestion was to list all the failure modes that each device was protected against: failed drives (RAID), redundant power supplies, battery-backed write cache, etc. Then induce every failure mode in rapid sequence. Yank a drive, wait a few minutes, re-insert it, wait until it started to rebuild the array. Then pull power from one supply, reconnect, yank the other power cord, reconnect, then pull both. Power it back on then start the performance test.
One appliance failed immediately. The others were badly impaired. Vendors got upset, and the tech lead was livid because he didn't look good either.
At a Meeting with managers and vendors, Tech Lead called me out on this BS. I was only sabotaging the project out of jealousy. He observed that the old relays had never been through that.
I showed them my original test results, where I had done exactly the same tests, but had first queued up 7 days of email on the Linux relays "because it could happen."
One appliance really shone. We bought that one. He never invited me to another project.