back to article Everyone cites that 'bugs are 100x more expensive to fix in production' research, but the study might not even exist

"Software research is a train wreck," says Hillel Wayne, a Chicago-based software consultant who specialises in formal methods, instancing the received wisdom that bugs are way more expensive to fix once software is deployed. Wayne did some research, noting that "if you Google 'cost of a software bug' you will get tons of …

  1. BebopWeBop
    Thumb Up

    It is always good to have someone occasionally look a wee bit harder at the things 'we all know'.

    1. HildyJ Silver badge
      Thumb Up

      It's hard for us to admit but "they say" is as applicable to IT as it is to the internet in general.

    2. AVee

      Research like this is pretty useless anyway. With or without source, statements like "100x more expensive" are just as useful as saying "people are 5 feet 3 inches tall". You can argue about that being the correct average or not, but it certainly is wrong in the vast majority of cases.

  2. Warm Braw Silver badge

    Equally unattributed, but different...

    ... figures are given in a Digital Equipment Company publication of the 1980s which suggested that it was 50% more expensive to fix a bug at the coding stage than at the design stage, 10 times more expensive at the integration stage, 60 times more if it’s caught in alpha testing and 100 times as expensive to fix in customer beta test.

    Of course, in those days if a customer encountered a bug you’d have to plough through crash dumps arriving by international courier or send an engineer on site to sort out problems so it was definitely worth avoiding.

    There's an inherent difficulty in measuring these figures in a meaningful way: whereas you can quantify the amount of time actually spent fixing bugs encountered in these different situations, it's difficult to reliably estimate the amount of time that would have been spent finding those same bugs at an earlier stage in the development process. It may be that the extra cost of fixing bugs in production is significantly offset by the amount of time it would have otherwise taken to uncover them in development, particularly in complex multi-threaded, multi-component systems.

    Arguably, if you feed "lessons learned" from production bugs back into the development process you can mitigate a proportion of the cost, but how many shops are really that organised?

    1. cschneid

      Re: Equally unattributed, but different...

      > [...] how many shops are really that organised?

      Aren't all IT shops CMMI Level 5 self-certified these days?

      1. Claptrap314 Silver badge

        Re: Equally unattributed, but different...

        The emphasis on self, of course...

        1. chuckamok

          Re: Equally unattributed, but different...

          Dunning-Kruger effect is so strong in IT, we have many masking methods for it.

    2. Nifty Silver badge

      Re: Equally unattributed, but different...

      "50% more expensive to fix a bug at the coding stage than at the design stage, 10 times more expensive at the integration stage, 60 times more if it’s caught in alpha testing and 100 times as expensive to fix in customer beta test"

      True for one group of products, but a huge generalisation there.

      I worked with 2 products for comparison. One was data driven with a lot of configurability - quite variable in the way it could be used. The second has an built in programming language that everyday users could use with, so was not just configurable per config files and data driven but also programmable. The variability of usage was near infinite. I would say that genuine, full scenario and regression testing was near impossible. So instead, the approach was developed to provide powerful debugging and crash analysis tools for the real-time system. This was targeted at break/fix and not at the design stage.

    3. Anonymous Coward
      Anonymous Coward

      Re: Equally unattributed, but different...

      As someone who used to debug core dumps from IBM M/F systems software in the late 80s I'd disagree with the premise. It used to cost IBM less to fix later cause the customer did the diagnosis.

      1. Hazmoid

        Re: Equally unattributed, but different...

        More to the point, that is how Windows and most M$ products are debugged aren't they ? ;)

    4. Anonymous Coward
      Anonymous Coward

      Re: Equally unattributed, but different...

      I was an Operating Systems consultant for ICL in the 80's part of my role was 2nd line support. If a code issue was found in development it could be fixed in a couple of hours.

      To get to the starting point of a fix after code was live on customer machines a bug report would have to be raised, prescanned by first line, passed across to 2nd line for further diagnostics, a that stage it would go to anyone in the team who would investigate whether it was a known error, or a different manifestation of an earlier bug for which a patch already existed. this would often result in having to ask for further evidence, if it was being sent back to 3rd line it would normally be reviewed by someone who was a specialist in that part of the Operating System ( we got a lot of stick if we passed back something to 3rd line for which there was already a fix available). We had access to the O/S source code and could sometimes even design the patch in advance of transferring it to 3rd line). If a patch could be written that would then require testing at 3rd line before being issued for customer release. This meant obtaining time on a test mainframe (there was only one). You can easily see how the cost of fixing a bug in production in this situation could easily be 100X fixing it in development. Scarily even with these processes in place some patches would reach high version numbers ( think the highest I saw was 13) before finally fixing the problem fully. If the issue was critical and several patch versions were required someone like me would be liaising between 3rd line and the customer around patch application and testing, often this required on-site attendance to manage the customer relationship and could also involve the Account Manager as well. When I started the patch would be sent to us via telex, thankfully by the time I let we we could apply them remotely.

  3. elregidente

    For the love of God, stop saying "methodology" - these are all *methods*

    Sociology is the study of societies.

    Methodology is the study of of methods.

    "I'm studying British sociology" is not the same as "I'm studying British society".

    Any given way of doing something is a *method*, not a methodology.

    1. Potemkine! Silver badge

      Re: For the love of God, stop saying "methodology" - these are all *methods*

      Are you sure? :~

      Definition of methodology

      1 : a body of methods, rules, and postulates employed by a discipline : a particular procedure or set of procedures

      * demonstrating library research methodology

      * the issue is massive revision of teaching methodology

      — Bob Samples

      2 : the analysis of the principles or procedures of inquiry in a particular field

      1. dakra

        Re: For the love of God, stop saying "methodology" - these are all *methods*

        You can utilize the word "methodology" as long as you specify its functionality.

        However, it is better to use the word "method" and to specify its function, unless you are a consultant, in which case, please continue to use the sesquipedalian words.

        1. BillG
          Thumb Up

          Re: For the love of God, stop saying "methodology" - these are all *methods*

          Upvoted because sesquipedalian is my favorite word. Also, sesquipedalian is sesquipedalian.

  4. Anonymous Coward
    Anonymous Coward

    Go agile, go!

    Oh, well. Bugs are OK then.

    1. Potemkine! Silver badge

      The Zen way

      Bugs are not bad. You must accept your bugs.

      1. A.P. Veening Silver badge
        Joke

        Re: The Zen way

        Bugs are not bad. You must accept your our bugs.

        FTFY, from the MS manual.

    2. Alumoi Silver badge

      Re: Go agile, go!

      It's not a bug, it's a feature!

      1. the Jim bloke Silver badge

        Re: Go agile, go!

        and on that note, we now have a full-featured release.

      2. Coen Dijkgraaf

        Re: Go agile, go!

        > It's not a bug, it's a feature!

        It is only a feature once it has been documented

        1. richdin

          Re: Go agile, go!

          There are undocumented features, and there are documented unfeatures (vaporware)

          1. Charlie van Becelaere

            Re: Go agile, go!

            It's bugs all the way down!

  5. deadlockvictim Silver badge

    Fixing things long after they have gone live

    Fixing things long after they have gone live can be an expensive process partly because of our modern bureaucratic way we develop software.

    During development, the people writing the code, the people writing user stories, the project managers, management and the testers are all available and willing to iron out problems discovered during development. Funding is available, expertise is available as are the people who understand and codify the business logic. Releases are made frequently and sometimes regularly.

    Compare that to the time when a bug is discovered. The team who did the work originally may very well not be working on the project anymore (or even be in the company). Depending on who is pushing to get the bug fixed, it may not get the weight it deserves from management. If there is a new team working on this product, they will all have to read up on how it should have been, on how the code is at the moment, what consequences might arise from this change. The testers will have to get to work on getting up to date on the changed functionality, if convenient or nicely-written test-cases are not available.

    Now, to be sure, this extra work to push against the bureaucratic flow is not 100x more expensive, but it *is* more expensive than getting it done right while the original team were in full development mode.

    1. HammerOn1024

      Re: Fixing things long after they have gone live

      One thing not mentioned in the cost here: Are you willing to bet your companies reputation and cash on not fixing a bug?

      As several companies have recently found out, how expensive was that cyber attack? What did it cost your company in brand reputation? How much did it physically cost your clients to repair the damage?

      How many lawsuits did your company just swallow because of a bug?

      I'll put a paycheck on a Los Vegas bet that 100x would be cheep compared to the cost of a very ugly lawsuit train.

      1. AVee

        Re: Fixing things long after they have gone live

        Just today I fixed a bug. Customer called, told me he was getting an error when they did some specific thing. I pulled up the logs, found a exception, with a stacktrace containing file names and line numbers. Took a look at the code, made some trivial changes and 15 minutes after they called the fix was in production.

        I promise you that this boosted our reputation more than not having the bug would have done. It can swing both ways.

        Thing is, this was a trivial bug without real consequences. Not all bugs are the same. Not all software is the same, not even all components of a single piece of software are the same. There is a huge difference between say a print preview not working or salaries not being paid or a huge gaping security hole. Tests and review efforts should be directed accordingly.

      2. grumpy-old-person

        Re: Fixing things long after they have gone live

        Cheep?

        Chicken-feed!

    2. yetanotheraoc Silver badge

      Re: Fixing things long after they have gone live

      The byproduct of trying not to let bugs get into production in the first place is called "documentation". A perfect recent example is the documentation for the Hubble Space Telescope, without which NASA could never have revived the telescope; of that I am sure. From my own experience, even if I am the original developer I still rely on my own documentation for certain key details, which otherwise would be lost in the mists of time. And of course your scenario of not having the original team available will be made immeasurably worse if the documentation is shoddy.

    3. a_yank_lurker Silver badge

      Re: Fixing things long after they have gone live

      The biggest problem with fixing bugs is whether there is anyone around who has the process knowledge and familiarity with the code base; this can be different people. If both are true then many bugs are relatively fast and cheap to fix once the correct people are included in the conversation. If either is lacking then fixing the bug becomes more expensive as whoever is assigned the task has to figure out what is going on and then figure out a reliable fix. What might take a couple of hours with knowledge might take a day or 2 of learning before even doing any coding.

  6. Nudge Away

    Train Wreck Of A Newsletter ?

    The Newsletter seems to focus on the cost involved to fix a software problem within the software team.

    There are numerous other teams involved including testing, production, shipping, field engineers etc. (not to mention sales & marketing - ok I just did !).

    If a defect is detected early and self contained within the development team and rectified within a timely manner the others teams need not be involved hence the costs can be relatively small.

    However, defects are often like snakes and ladders causing a design to go back many steps & teams possibly to the very beginning: it doesn't take an expert to figure out the cost is generally cheaper the smaller the snake !

  7. Derek Jones

    Of course reviewing code finds coding mistakes; the question is whether this a cost-effective approach to software product development.

    Yes, mistakes fixed in later phases will involve more effort; some evidence: http://shape-of-code.coding-guidelines.com/2020/08/23/time-to-fix-when-mistake-discovered-in-a-later-project-phase/

    but would it be more expensive, on average, to have fixed these mistakes earlier?

    Products have a finite lifetime; some evidence: http://shape-of-code.coding-guidelines.com/2018/11/28/half-life-of-software-as-a-service-services/

    Some coding mistakes are never experienced as faults by customers before the product is withdrawn; some evidence: http://shape-of-code.coding-guidelines.com/2020/12/20/many-coding-mistakes-are-not-immediately-detectable/

    Figuring out the most cost-effective approach is very hard.

    1. W.S.Gosset Silver badge

      Hi Derek, good work on sourcing those data.

      A quibble: your Lifetime post's text mixed up red & blue vs the graph, and also there's clearly more than a dozen mainframe points. You might like to correct it. Other than that, good analysis.

      Re the bugfix times: the startlingly short times for the Avionics software implies startlingly simple bugs, implying startlingly poor quality code in the first place. Unsettling, given it's presumably controlling aeroplanes...!

  8. Anonymous Coward
    Anonymous Coward

    Cost of a defect is a 2 dimensional problem

    Many years ago I worked for IBM developing a System 390 product( on VS1/DOS). People had to record each week where they spent their time, For example, writing the spec, reviewing the spec, code, Unit test, FV prep, FV execution, FV Fix validation, System test prep, system test execution... system test fix validation. Golden code drop execution. There was also the cost of support. Time spent in PMRs, time spent diagnosing problems, time fixing problems. etc.

    We had one guy whose job it was to collate all this stuff and come up with the numbers.

    The numbers were pretty consistent. These numbers were reviewed by an executive who had oversight in many labs.

    Dimension 1 - count the number of people

    For a bug found in unit test, one person was affected.

    For a bug in FV there was 2 people affected( developer and FV)

    For a bug in Systems test there was 3 people affected( developer, FV, Systems test)

    For a bug found by the customer there was IBM L2 support, IBM change team, then the original developer to migrate the fix, FVer to change the FV tests, Systems test to change the systems test. Bits of managers, bits of overhead ( the guy doing the collating).

    For critical problems, factor in many calls to the customers, "call cooridinator" overhead, executive attention and 5 layers of management.

    ________________________________

    Dimension 2 simple complex and subtle problems

    I think (this is going back a few years) that we also classified the bug as for simple, complex, subtle.

    Unit test found most simple bugs. Systems test found complex, and subtle, customers found many subtle and not many simple ones.

    Complex and subtle took more time to design and run tests. (Some bugs you had to run for a week being really nasty to cause it)

    So as well as bugs found by customer's taking more effort to fix, these bugs were harder. So an n**2 problem.

    1. runt row raggy

      Re: Cost of a defect is a 2 dimensional problem

      iow, the cost was a direct reflection of your org structure.

      1. yetanotheraoc Silver badge

        Re: Cost of a defect is a 2 dimensional problem

        "For a bug found by the customer there was IBM L2 support, IBM change team, then the original developer to migrate the fix, FVer to change the FV tests, Systems test to change the systems test. Bits of managers, bits of overhead ( the guy doing the collating)."

        Okay, so the cost is an org problem? Seems many software companies think the same as you. Easy enough to fix, just lay off the L2 support, lay off the change team, have the original developer's new manager label him as "unavailable" for the fix... Set up a Community Support web page to replace all that. Suddenly the "cost" to fix the customer's problem has gone way down!

        1. runt row raggy

          Re: Cost of a defect is a 2 dimensional problem

          i prefer that you not put words in my mouth so as to preserve room for my foot. :-)

          all i observed was the costs in this case mirrored the org structure. the implications are interesting, but above my pay grade. anecdotally using full cd rather than dedicated testing stages has resulted in much less anxiety about making changes, and more rapid fixes. and for the record, i spend a lot of time working with support and i think they are invaluable both for customers, and engineering.

        2. W.S.Gosset Silver badge
          Trollface

          Re: Cost of a defect is a 2 dimensional problem

          > Set up a Community Support web page to replace all that

          Already done!

          Stack Overflow.

  9. Robert Carnegie Silver badge

    Example

    The British Post Office software - Horizons, was it? - comes to mind. Bugs denied, with vastly expensive consequences.

    1. John 110
      Flame

      Re: Example

      "...with vastly expensive consequences."

      And not just money, lives were lost/destroyed.

      1. Paul Crawford Silver badge

        Re: Example

        Was going to comment generally about "Who carries the cost of the bugs?" but the horizon system is a perfect example where bugs have cost an awful lot and hopefully some at PO/Fujitsu are going to be facing jail time for their dishonesty in how the bugs were handled.

        1. Intractable Potsherd Silver badge

          Re: Example

          If a bug is discovered once it is in production, the cost is largely borne by the user/customer. It becomes very close to an externality for the producer of the code. If software producers were liable for bugs discovered, they would ship with far more robust code. I know that there is no such thing as perfect code, but some of it is terrible.

        2. Citizen of Nowhere

          Re: Example

          Sadly the only jail time which will result from this has already been done -- done by the victims. There is no chance anyone in PO or Fujitsu management will suffer any real consequences for their despicable behaviour.

    2. Eclectic Man Silver badge
      Unhappy

      Re: Example

      For Horizon, I suspect that the cost of fixing the bugs later is vastly more than 100 times the cost of weeding them out in development would have been.

      1. Ken Hagan Gold badge

        Re: Example

        It's not really the cost of the bug though, is it? It's the cost of lying about the bug, in court. I don't need an article in an old IBM magazine to tell me that is going to be expensive if a customer "reports" the bug.

      2. Anonymous Coward
        Anonymous Coward

        Re: Example

        Given minimal compensation will now be 100k to the victims the cost will be at least 90million.

        Given the previous court case that was won cost 80million plus... and the costs of all the prosecutions, defence I suspect the final cost will be excess of the original development several times over.

        The bugs if had been admitted too when they first occured the cost would have been a fraction of the cost now.

        I suspect the fault is likely to be with managers and lawyer's.

        Personally as a developer I would have walked or spoken out...

  10. Will Godfrey Silver badge
    Unhappy

    Somtimes the odds are stacked against you.

    Quite a long time ago we had a situation where the software checked out perfectly at every stage. It went into production, and everyone was happy for several years. Then just one customer complained about errors seemingly at random. It took some time to work out that not only was it just one customer, it was also just one machine. However, that machine had been fine with a previous version of the code (and was fine if we reverted it). Eventually our code virtuoso discovered that particular processor did something slightly different with float->int conversions, and in our new code the order of calculations had changed. To this day I don't know if it was the processor or our code that was actually at fault!

    1. A.P. Veening Silver badge

      Re: Somtimes the odds are stacked against you.

      To this day I don't know if it was the processor or our code that was actually at fault!

      As it was a single processor, my bet is on the processor, especially as there have been problems of that kind with processors before (Pentium floating point bug).

      1. A.P. Veening Silver badge

        Re: Somtimes the odds are stacked against you.

        I'd like an explanation from the downvoter, what would be your bet and why?

    2. Paul Crawford Silver badge

      Re: Somtimes the odds are stacked against you.

      I discovered this in some code that was fine on 32-bit Windows and 64-bit Linux, but had a fault in one mode on 32-bit Linux.

      Turned out to be the same sort of thing where someone in the Linux world decided to be "helpful" and limit floats converted to 32-bit ints that went outside of the range, except here that was an intended and correct thing to do (i.e. convert to integer possibly above 32 bits and let it be truncated to 32). Fix was to covert to 64-bit int then cast as 32-bit so it worked the same on all platforms.

      1. Doctor Syntax Silver badge

        Re: Somtimes the odds are stacked against you.

        Odd usages should be commented to prevent someone being "helpful" later.

  11. werdsmith Silver badge

    If you take a specification, create a test to see if that specification is met and then write software to pass that test, you will miss loads of bugs. Users in production will find them. Cant' beat real world use.

    1. Paul Crawford Silver badge

      Indeed, as unit tests test to be self-referencing the same person's interpretation of a requirement.

      1. W.S.Gosset Silver badge

        Unit tests are the software equivalent of the "security theatre".

  12. Charlie Clark Silver badge

    It's not when but where

    Bugs in undeployed development code have essentially no cost to the user. Once the code has been deployed then, exploits, etc. aside, there is the cost of information and planning, testing and applying patches, which may or may not require downtime.

    If it weren't for the different legal situation, industry could provide a reasonable benchmark for when things like product recalls are required.

  13. markstevent

    There is a fairly large difference between code bugs and design errors. I have been in IT since 1974, and I have always heard that it costs more to fix design errors in production that in development. A simple bug, may not cost any more to fix in production, though a complex bug, with interdependencies, may require a more complicated solution. However a fundamental design flaw (as we have seen with a number of major government projects), may cost enormous amounts to correct, and might well cause the entire project to fail. So I think this is a very broad claim that, as the author suggests, is not and indeed cannot be, verified by research, as you would have to classify many kinds of "bug", where they come from, the effect they would have on interconnected and dependent systems and processes, as well as the extent of the effort required to fix them. Fixing a code bug might cost very little, but manually verifying the integrity of the data affected by the bug could be very expensive.

  14. Doctor Syntax Silver badge

    I've always vaguely thought that the sort of quotes about an error discovered in step N after it was made being a power of N more expensive to correct was based on an assumed process of going back to the step where it was made, working through the whole development from there and the ultimate cost was based on the cost of each step so that the 100 times cost reflected a 10-fold cost from moving from one step to the next multiplied by a further 10-fold cost moving from that step to where you were. Whether the implied assumption was valid or that the costs were really an order of magnitude a matter for debate, of course. As the article implies it's the sort of assumption that gets easily thrown off the top of the head by an instructor in a course on development...

    What is clear is that once something is in deployment the costs of fixing a bug might be insignificant compared to finding and fixing all the faulty data that's now out there.

  15. SusiW
    Thumb Down

    Some companies just don't seem to worry

    As customers using Intel processors (other mfrs also have issues) have found out over the years, bugs embedded in the microcode, can be unfixable - if the manufacturer does not want to swallow the costs of replacing the faulty processors.

    Costs of performance, maths accuracy, security, stability, etc., are usually incurred solely by the purchaser.

    Mfrs can sometimes mitigate these bugs with patches, but again it's the customer who will suffer greatest.

    It seems that some companies would rather not fix coding/hardware issues during development where it is potentially cheaper for all concerned, but continue to supply products that are *known* to have congenital faults anyway.

    I now actively avoid Intel and Apple products due to their ongoing refusals to fix issues in their products and passing the costs, real and hidden, onto their customers victims. Especially in the case of Apple, when they will vehemently deny there is a problem and then charge customers for repairs to faulty designs. (Don't just take my word for it - have a look at Louis Rossmann's YT channel to see what sh*t Apple gets up to on a regular basis)

  16. fidodogbreath Silver badge

    Citation needed

    Reminds me of how it's OK when editing Wikipedia to cite web content that you also wrote as a reference.

    1. Anonymous Coward
      Anonymous Coward

      Re: Citation needed

      My favourites are the circular "authority"s. An example posted on this site "destroying" an EU (Brexit) "myth" was a Guardian article which cited the BBC as its authority. The cited BBC article was almost a copy-paste job and cited the Guardian article as its authority.

      You'd be surprised how common this is, especially in any virtue-meme area. Though typically more than one --accidentally hilarious-- step.

      1. W.S.Gosset Silver badge

        Re: Citation needed

        Hmph. That was me, not Anon. Must have accidentally brushed the Anon checkbox and now I can't undo it.

  17. Eclectic Man Silver badge

    The Register

    Could do some of its own research. On occasion things go wrong with the Register website (I know, but nobody's perfect). Anyway, maybe the friendly people at Vulture Central could calculate how much fixing each problem actually costs and let us know the ratio of planning costs to repair costs. (Just recently there was a report of a comment from an 'Anonymous Coward' which did not have up or downvote buttons active.)

    1. Roland6 Silver badge
      Pint

      Re: The Register

      Would not work - no one would be able to agree on units...

      How many units are in a pint of your favourite amber nectar?

  18. DS999 Silver badge

    Its possible 100x may be an underestimate

    What's a "bug" during requirements stage? It is something where the requirements are found to be in error or missing something simple, and corrected before a single line of code is written. So the additional man hours of effort required to fix it is approximately nil. Any effort post-production will be infinitely more than that.

    You may ask, what if something missing during requirements is caught but it adds 50 hours of work when you do get to coding? I'd argue leaving that out couldn't be classified as a "bug", because if you added it post-production you'd never call something like that a bugfix. You'd call it an enhancement or new feature. If you were doing the work for a customer you'd bill it as a change request.

  19. cageordie

    Well, I can name one that cost a fortune to fix

    We worked with a company that put WiFi in hotel rooms with little short range transmitters in the Ethernet sockets. They worked OK in test and each single one was OK. When they got to the real world customers used both wired and wireless at the same time. Badness happened and network connections were dropped. They hadn't used interrupt driven I/O. When the load increased the system couldn't cope. So switch on interrupts. Easy. But no, they had not included a remote programming technique. So the fix was to visit every hotel room, take the cover plate off, plug in the jtag, reprogram the fpga and software. That cost a hell of a lot more than 100 times.

    If I fix a bug now, in development, it's just fixed and nobody even hears about it. I was rotating a frame of reference for a sensor and one axis came out as zero all the time, easy fix, I'd copied to the Y axis twice, so Z was always 0. Even if the problem was found in testing it's going to be at least hours, and then the release it was intended for will be delayed and the regression testing will take a month. Testing a complex system takes a long time.

    So 100 has always seemed very conservative to me.

    1. Paul Crawford Silver badge

      Re: Well, I can name one that cost a fortune to fix

      A guy I know worked for a major car company and they had a similar issue, some absolute idiot at an outsourced agency had got a mix of units (km & miles) in use (probably did not even drive) and it was not discovered until cars were in the showroom and being sold. Result was a very expensive recall for firmware to be fixed given the legal implications of telling a driver the were only doing 50 (mph) in an 80 (km/h) zone so they had a another 30 to go if they wanted.

  20. bazza Silver badge

    Design Your Release Process to Suit

    As ever, bug costs vary according to system type. A bug on some unimportant website costs not a lot, probably. Whereas a bug in an airliner flight control computer costs $billions and kills people. Just ask Boeing (the MAX crashes being attributed to faults that were not fixed at the design stage, despite the coding company querying the design).

  21. Screwed

    Where is the research which supports the headline claim "Everyone cites..."?

    1. Will Godfrey Silver badge
      Coat

      Everyone knows where that is.

      Oh, I'm leaving already?

  22. fpx
    Boffin

    I also helped to spread that myth. However, my reference was not IBM, but NASA: https://ntrs.nasa.gov/citations/20100036670, from 2004. As far as I know, NASA did exist at that time. It does quote [Boehm81] -- which might be the dodgy study mentioned here -- but adds its own body of evidence.

  23. David Hicklin

    When I was working at an electrical engineering company we worked on a "5x rule", that after each stage (tender, design, production) it was 5x as expensive to fix that if the error/mistake had not been made in the first place.

  24. colinb

    It depends

    I guess your view on the cost of fixing bugs in production vs development depends on context.

    Take 2 examples

    An early stage startup with a fully CI/Cd pipeline might well decide that is cheaper to fix production bugs since

    a) they are really only trying to flip the company to a VC mug and couldn't care less about the actual software

    b) no controls on release push and go all the way to prod, who cares.

    c) the bugs may never be found, happy days.

    On the other hand in a company with Sox-like controls, financial numbers and multiple people involved it a different story

    a) bug impacts the user, might be a blocking bug so causing material cost on a user level, many users affected? x times that impact.

    b) bug logged

    c) bug repro by dev, might need certain data setup so this can be non trivial in itself.

    d) bug fixed

    e) build, deploy to system test

    g) deploy to UAT

    g) UAT has be done by users so needs their time so $

    h) if release requires an outage a Change Approval board has to approve the release. Board sits one a week has 3 person manager level quorum for approval. $$$

    i) release window approved

    j) release happens requires OPs, Dev for smoke test and users for Prod smoke test $$$

    k) close out bugs.

    I haven't even tried to put a dollar number of that cost.

    Bottom here here is find those bugs in dev.

  25. Les Smith

    I think this all goes back to work by Michal Fagan at IBM.

    I have vague recollection of seeing an article way back in the late eighties by him in the IBM System Journal

    This article in CIO seem to imply that the idea of the relative cost of fixing software defects escalating as one progressed through the SDLC was originally documented by Fagan.

    https://www.cio.com/article/2399877/rethinking-software-development--testing-and-inspection.html

  26. Henry Wertz 1 Gold badge

    Probably irrelevant anyway

    Whether the 100x figure ever existed is probably irrelevant anyway. You now have advanced IDEs (Integrated Development Environments), better debuggers, faster compilers and such (so if you don't have proper logging in your program, you can add some to track down a bug and rebuild in a reasonable length of time), heavy use of languages like Python where you don't have to recompile (...usually)... OK they had interpreted languages back to the dawn of time too. Languages now tend to give useful error messages, and line numbers, when things crash too (which is not 100% reliable, since the crash could have been due to an earlier problem, but sure helps.)

    I'm just saying, even if there had been a 100x figure 40+ years ago, things have changed now. Personally, when I've found bugs in my Python code, it may be marginally harder to find the bug later than to avoid a typo or something as I type the code, but surely not a 100x difference or even close to that.

  27. Anonymous Coward
    Anonymous Coward

    The severity of bugs, including design flaws, are defined by their impact on usage. Writing tests only assigns binary value - bugged or not bugged.

    Some bugs that could have tested for but weren't go on to have little or no impact on users. The complexity of usage space is huge - for all practical purposes it may as well be infinite - and some of those cases have a vanishingly small chance of appearing.

    Although that can change over time, e.g., security issues that were once not a problem are now almost a certain fatal flaw.

    Testing should get the low hanging fruit plus something extra. But the tree is infinitely tall, so it's impossible to get it all. At some point, it DOES become cheaper to draw the line and deal with what is found in practical usage. The question is where is that line, and what is the risk function in its estimation? There is no one answer but a fusion of multiple independent expert hunches and a systemic approach can improve the odds of getting it right.

    Eventually it's more effective to bring other trees into play. E.g., when it comes to security, planning must include a strategy to sandbox breaches. Like ships closing their doors to limit flooding when the hull has been breached.

  28. nautica Bronze badge
    Holmes

    Think much? Never mind...

    "Software research is a train wreck," says Hillel Wayne, a Chicago-based software consultant who specialises in formal methods...

    Wayne did some research, noting that "if you Google 'cost of a software bug' you will get tons of articles that say 'bugs found in requirements are 100x cheaper than bugs found in implementations.' They all use this chart from the 'IBM Systems Sciences Institute'... There's one tiny problem with the IBM Systems Sciences Institute study: it doesn't exist."

    Not ONLY is 'software research' a "...train wreck..."; it is a fucking oxymoron in the present-day climate of "...just say anything and post it on the internet; just because it's on the internet most every idiot, INCLUDING, and ESPECIALLY "NEWS AGGREGATORS",--the prostitutes of the internet literary world--will fall for it hook. line, and sinker."

    Why don't all you rocket scientists who will believe anything on the internet simply have your DUMBphones (1) always set to receive only FaceCrap, and (2) permanently grafted to the sides of your heads?

  29. Richard Pennington 1
    FAIL

    Current HMG policy ...

    ... is to let the bugs spread as they will, vaccinate the older users, test and trace until the pings become intrusive, and hope that not too many people die as a result.

    And much the same for COVID.

  30. Nicodemus's Knob

    Not to mention the hours wasted by users.

    The cost to the company may well escalate, but there's not much mention of the number of hours wasted by users trying to get some buggy software to work properly, How many reboots. Multiply that by the number of users and your talking about a huge waste of time to everybody because one programmer couldn't be bothered to check the return codes on functions. Yes, that's my pet hate, programmers that don't check return codes. Yes it takes longer as you now need to think about dealing with failures but if you want solid reliable code always check return codes.

  31. TechHeadToo

    Well, now I've read most of the comments, and my life is shorter than it was.

    And the takeaway is still that

    Thought and requirements and code and testing is cheapest done at the design stage.

    having to correct a bug in production will always cost more than fixing it before release.

    People who think they know about these things, quoting from experience, will sit and argue the minutia of whether it is 87x or 100x or 273x more, and completely miss the point that the root cause of increased costs of whatever type (reputation or dollars) is always going to be poor requirements or poor coding.

    All else is mere avoidance smoke.

  32. Steve B

    I was old school and most of the bugs came out during the flowchart breakdown..

    On my important projects, I used to flowchart them, then break the flowchart down and down.

    I once ended up with a complete 12 ft by 6ft wall covered in a flow chart linked by ribbons.

    It highlighted enough issues and so I knew which questions to ask to finish it off.

    It caused a panic at the Home Office when I asked the questions as apparently I was months away from finishing the product.

    They arranged an emergency meeting with the managers to discuss a new implementation timetable, meanwhile I had finalised the flow chart and typed in the code required all in less than a day.

    My boss was well aware of how I worked and took great pleasure, after hearing their put downs about not hiring professional companies and needing months extra etc, when he looked across to me and asked when I would be ready for testing and I stated the booked session in the morning!

    Cue lots of red faces when they admitted they had cancelled the sessions thinking we were months away.

    It sailed through the testing as well!

    Developing in C was the closest thing to that method as one could just create a stub routine and then expand on that later, breaking the code down to cover all possibilities.

    The biggest issue with testing is that if an input has to be between 0 and 5, for example, the programmer will work in the presumption that the input is a single digit with largest value 5 and smallest 0, this can cause issues when the input is empty or a non numeric string.

    Is this what most of the current windows problems are? buffer overruns.

    The fact that data is being allowed to overwrite code is both a hardware and software design flaw that we actually overcame back in the 70s.

    When I got to Visual Basic, I found that to be awful as there was very little comprehensive timeline linking.

    It was there but hidden in the presentation which meant most new developers didn't have a clue how things actually worked underneath.

    This means that most of the bugs found after the product has been released require a redesign, which doesn't happen. So welcome to the US patch city.

    1. A.P. Veening Silver badge

      Re: I was old school and most of the bugs came out during the flowchart breakdown..

      Developing in C was the closest thing to that method as one could just create a stub routine and then expand on that later, breaking the code down to cover all possibilities.

      That method works just as well in most other languages.

  33. Prinz Rowan

    Its a misconception

    Most people argument with the cost of fixing a known error at design time.

    If you fix an error at design time, you don't know it. You have to think about all the aspects of the design, which is very time intensive.

    And its a tricky task, since some aspects of the task may be unknown. Like: How many visitors? What's the real performance of the plattform under pressure?

    I will not claim any number, but i don't think, that making and veryfying a very detailed design can be done so fast, that it saves time hundredfold.

    This will work only in retrospective, when you know, where the design failed.

    Its not easy to find the right balance between too detailed and too rough a plan.

    But if you do a new task, some errors will go through all phases and pop up at the customer.

    A 100% error free project may happen, if you are adapting a product, but even this seems unlikely to me, if you have to deliver at a reasonable cost-

    The amount of testing an redesigning in building a mars rover, will not be paid in the industry for a production machine.

    They want this only until you tell the estimated cost..

  34. TheBadja

    Mythical Man Month, Brooks, 1975

    The source may be “The Mythical Man Month” by Fred Brooks, 1975. This could be the original source for the IBM notes, as Brookes worked for IBM. He certainly had a study that showed the true cost of projects was 100%-300% more than the initial budget estimates.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2022