The perils (costs) of automated bug finding
A few years ago I was responsible for running automated defect-finding tools (preFIX/preFAST) over a moderately-sized codebase. These tools modelled the code and brute-forced bugs, a lot like fuzzers do today.
The first runs were horrific. Literally thousands of niggly edge cases, each having almost no impact on the user. I tuned the tools to ignore many of these (as a purist, this hurt!) because I knew dropping 5000 defect reports in the database would stop the business for a few weeks to sort out- even just to triage.
I tuned and tuned and eventually got the tools to produce a serviceable number of defect reports that wouldn’t swamp the company.
The problem was - the nature of these defects found by the tools. A defect might read “after the user types ‘bananas’, then on the 41678th loop through this networking stack code, we access memory we don’t own”. These reports were 100% reproducible- they’d been simulated - but they were almost useless.
First, they needed investigation just to understand and triage. Then, the developer would simply hands-down say that it couldn’t happen in the real world (way more than they normally do). After a long conversation where I convinced them that it COULD and indeed HAD happened, they would begin investigating these infernally complex and detailed situations. Often they’d have to dig through many levels of code they were unfamiliar with, meaning talking to other colleagues, so distracting a whole bunch of people; sometimes they would pull on a thread that would take them down a deep rabbit hole which showed up bugs in the underlying architecture, at which point we have system architects involved, aggressively defending their design.
The bottom line is that these bugs were the most expensive bugs to fix I’ve ever come across, for (usually) very little value to the company. Spending resource on them meant we were missing some big user-facing bugs, which was much more visible and consequently painful for everyone involved.
After a few months grappling with making these tools fit into our organisation profitably, I quietly stopped using them.
Nobody noticed.
I did some analysis of defect find and fix rates afterwards (just to make sure I hadn’t goofed!) and there was no noticeable difference between with and without the tools.
My conclusion was that this way of finding defects has a VERY limited application. When you can turn on a tap and find 1000 real, reproducible defects at will, you need to be extremely careful how you action those reports- and my honest feeling is that they’re probably best ignored.