back to article They say software will eat the world. Here are some software bugs that took a stab at it

"On the afternoon of Tuesday, September 25, our engineering team discovered a security issue affecting almost 50 million accounts," said Facebook's Guy Rosen in a security update in September. The issue was serious. What was stolen was not passwords but access tokens, an opaque string that identifies a user and grants access …

  1. Primus Secundus Tertius

    Quality of design

    I saw many instances in my career where I thought the initial analysis was poorly done, and processes were not cleanly separated. This led to variables being corrupted in unexpected ways. Coding standards cannot correct for poor analysis.

    All too often, so-called design documents were a restatement of requirements: they said what was wanted but did not show how to achieve that. For 1950s programmers transcribing mathematics into Fortran, the requirements were often enough; but, for example, a modern database meeting modern requirements needs much more thought. Even a simple members list of an organisation should be more than just getting correct postage labels: e.g. reports on how many by county or country, ...

    Design reviews ought to emphasise these matters, but often they don't. Management often feels they obstruct the need to get on with the coding.

    I spent my career wondering how to turn an engineering graduate into a good programmer, and never did find the answer.

    1. a_yank_lurker

      Re: Quality of design

      My observation as code wrangler is the more time spent on design, thinking, and talking to users up front is well spent. Once you have a sound idea of where to begin the actual coding is often fairly straightforward. But what is often done is dump a vague 'design' document on top the programming team and isolate the team from the actual users. So you have team that does not understand the problem guessing what mismanagement wants without any input from users. A recipe for complete disaster.

  2. Duncan Macdonald

    Mismanagement

    Mismanagement is the number one cause of software problems. There is an old rule for all types of engineering (including software) - Fast, Cheap, Good - pick any two.

    If you want Fast and Good then you need to pay for a top flight programming team.

    If you want Cheap and Good then you need to allow a lot of time to testing and bug fixing before release.

    If you want Fast and Cheap then you must accept that the quality will not be Good.

    With most systems the (mis)management choose Fast and Cheap - and then express surprise that the result is not Good.

    A side note -

    Any program that accepts user input must assume that the input is malicious until proven otherwise - input data must be checked for correctness before being acted on. This old principle of defensive programming seems to have been almost completely disregarded in modern software.

  3. Pascal Monett Silver badge

    That's news

    "human factors including finance, deadlines, mis-management, skills shortages, and the challenge of dealing with legacy code and systems"

    In other words, the problem with software quality is less the actual ability of the coder and more the inability of management.

    Who knew ?

    1. david 12 Silver badge

      Re: That's news

      In other words, it's always somebody else's fault. Who knew?

      1. yoganmahew

        Re: That's news

        @david 12

        Yeah, I don't think we as developers can just blame management. Those of us who have been around long enough have seen enough shit code, lazy coders, and people who should really have been doing something else. That's not to let management off the hook, they hired these people and keep them (because they're fast and cheap, presumably), but as a professional, a developer need to stand up for their profession, not blame it's known inadequacies always on someone else.

  4. stu_san
    Meh

    Management is but one problem

    After decades in the industry (doing software, hardware, and firmware), I have noted there has been one constant: If a project fails, engineers will invariably blame management. After all, it couldn't *possibly* be the engineers' fault, right? They put in the hours, weekends, sweat, blood, and tears, so it *must* be management's fault.

    We, as engineers, must look at ourselves as well as management. How often have we made something complex and wonderful when simple and usable would do? How often have we skipped putting extra assertions in a particular function because "nobody would call it that way"? How often have we passed on testing because it's "too hard to test" (let alone not putting in ways to make testing easier). How often have we passed on good programming practice because we don't "have the time" to do it right? How often have we said, "That's a piece of cake" because we didn't analyze the problem sufficiently?

    Anyone who says, "I never done any of those things" is lying, to themselves if no one else.

    I'm not saying that management is not a major contributor to problems. What I am saying is that engineers are often just as much a contributor.

    (Disclosure: I was a manager for about a year. Failed miserably. There's a reason there are degrees for that stuff.)

    1. bombastic bob Silver badge
      WTF?

      Re: Management is but one problem

      simple and usable - yeah I like those things. Good for 'first principles"

      from article: "High-level languages with automatic memory management and no direct use of pointers, such as Java, first released in 1996, have made it easier for developers to avoid some errors."

      while at the same time, creating INEFFICIENCY and BLOAT. And solving LITTLE. See icon.

      A bit of self-discipline and specific "look for that" reviews of the code, by people who didn't write the thing, might be in order instead of resorting to 'garbage collection' memory "management".

      I doubt you'll EVAR see things _like_ OpenSSL coded with a computer lingo that employees "garbage collection". But if that happens, I think it'll be forked by SANE developers who understand the implications and unintended consequences of resorting to 'garbage collection' memory management.

  5. Anonymous Coward
    Anonymous Coward

    Test code coverage (sometimes called dynamic analysis) and static analysis processes and tools were around and generally available in the 1980s.

    1. Michael Wojcik Silver badge

      Test code coverage is a form of dynamic analysis, but a very limited one. The term "dynamic analysis" in the software-quality domain usually refers to systems that use compile- and/or run-time instrumentation to monitor the behavior of executing programs for dangerous and invalid operations, such as memory mismanagement, invariant or contract violation, concurrency issues, etc.

      But you're correct that code coverage is by no means a 1990s invention, nor one to come out of Extreme Programming. Wikipedia cites a 1963 CACM article by Miller and Maloney on the subject.

      Similarly, as you note, static analysis is much older than the 1990s. Johnson's original lint, one of the more famous static-analysis tools, was published in 1978.

      Most of the dynamic analysis tools for Windows, Linux, and UNIX date from the early 1990s or later - BoundsChecker, Purify, and Insure++ all were first released in 1992 or 1993 - but I believe it was the late 1980s when Perens released Electric Fence on USENET, and certainly academic work on dynamic program analysis goes back earlier than the 1990s.

  6. SNAFUology
    Facepalm

    "History will teach us nothing" and therefore is destined to be repeated,

    be repeated, be repeated...

    There has been some great software, in parts and bits, rarely is the whole product great. But those great bits are often lost as companies require new product, programmers want their product to be somewhat different or destructive/distractive Innovation (which basically means killing off other products so your new product can replace them regardless how good or tested it is) has killed off the products regardless of how good or tested they are.

    We are not working on improving one thing but, redoing / repeating the same stuff over and over again.

    How many server systems office suites, game engines, etc.does one need.

    Choice is one thing but Grey Goo Scenario's are not a good thing.

    The whole (missed) point of the Grey Goo Scenario was the interior of the pile would consume itself and only the very outer edge would contact new material that would be converted in to more of the same.

    This convolution is not productive, we have to break free.

  7. doublelayer Silver badge

    So Dev Ops fixes everything, huh?

    This article does a pretty good job of explaining how bugs can be a major problem. And then it comes along with the following line:

    "In a carefully architected DevOps process for a web application, [...] the cost of fixing a bug found late may not be too bad."

    Let's discuss this. A dev ops process has no good definition. I've read the dev ops articles here. They have essentially put the dev ops™ label on every known good coding, management, or systems concept under the sun. Unit testing? A primary precept of dev ops™. Ensuring security? Meetings with managers where the developers are listened to? Having firm documentation about policies for development and usage of the code? All dev ops™ concepts. This tends to assume a utopian ideal of code development and management style, anyway. The problem with this is that none of these things are actually connected. Articles about good policy simply have dev ops plastered on them. So there is no clear way to identify what exactly about dev ops makes these bugs so easy and painless to solve.

    Or is there? Let's fill in that gap in the quote.

    "where a code change can be made, tested automatically and deployed into production rapidly"

    So that's a no. Dev ops articles frequently mention agile as a development style. That quote above clearly describes a system that works similarly to agile, in the sense that code is supposed to change often and get into production quickly. That does not have any benefits with bugs. Bugs will still happen. If and when agile is done wrong, bugs will happen *more* often, because managers think that agile coders should always be moving on to some new functionality rather than repairing things. Agile does at least mean that bugs should be patched more quickly, which has a bit of logic behind it. However, it does not have any specific way of ensuring the bugs are less dangerous.

    Let's talk about the "tested automatically" stuff, too. You can't test everything automatically. Unit tests are great. I expect competent devs to be writing them and to make any changes go through them. But unit tests do not catch every bug. There may be unit tests that nobody thought of, or someone thought of but then nobody wrote. Worse, there may be a bug that you either can't test for or you won't notice until things are put together. Consider that heartbleed bug discussed in the article. It doesn't really have a meaning on its own. Unit tests of invalid data could have caught it, true, but there are a lot of types of invalid data, only one of which triggers this. Only when combined with a thing like a webserver does this bug become so noticeable.

    That's not a thing a unit test, written by one person and never looked at again because "the automatic system will handle testing" is going to notice for you. That's a thing where you want devs writing unit tests and manually running a test suite, looking at the output to think "I wonder what would happen in this case, but there is no test for that. Let's see." and people doing larger real-world testing on larger components. An automatic system cannot possibly try all types of standard input to a large program and properly interpret the results, but a QA department can.

    By making testing simply a speed bump on the road to production, rather than a required turn, you make it a lot easier for things to get through inadequately tested. Write fast and fix things when you find them won't work for rocket launches either.

  8. Herbert Meyer

    Software kills

    Last month, 132 people died because the autopilot on a Boeing aircraft overrode the pilot, trying to prevent the plane from stalling. The plane was not stalling, the pilot and copilot died trying to disengage the autopilot, and take control of the aircraft. Boeing has blamed the crew for not reading a tech bulletin about this software feature.

    Anyone who thinks they are "flying" a modern airplane is fooling themselves, they are merely making suggestions to the control system. Soon, automobiles will be the same.

    1. stu_san

      Re: Software kills

      | Soon, automobiles will be the same.

      Typo: Soon -> Now

  9. Mike 137 Silver badge

    Heartbleed

    The heartbleed bug was not fundamentally a software error, but intrinsic to the specification in the RFC. There was no need to allow a variable length response, and no need to make the requester (the vulnerable party) specify either the response string or its length. Having made these poor decisions the RFC author actually advised that the hazard they created be covered for by the potential victim in their implementation. However, as the required function was merely a keep-alive, a predictable fixed length response generated by the responding party would have sufficed (e.g. its IP address).

    This, and the Boeing 737 PK-LQP accident (and the multiple Watchkeeper drone crashes, and the 2015 A400M crash in Seville and the huge number of reported behavioural discrepancies in "autonomous" vehicles and much else) are not software problems, but failures of the engineering mind set.

    The fact that software mediated all these incidents is incidental - the flaws were all flaws in thinking. In particular, they were failures of foresight and breadth of vision as to side effects and consequences. In practice this differs little from the decision to site a commercial data centre a couple of hundred metres from the Buncefield gasoline storage depot which exploded in 2005, completely gutting the data centre.

    1. Michael Wojcik Silver badge

      Re: Heartbleed

      The heartbleed bug was not fundamentally a software error, but intrinsic to the specification in the RFC.

      It most certainly was an implementation error, even if the specification encouraged that error. It is entirely possible, and indeed trivial, to prevent a Heartbleed-class error in a DTLS Heartbeat implementation simply by doing the obvious bounds checking. That should be done as a matter of course in pretty much any software written in a language that doesn't do it automatically.

      And, arguably, it doesn't matter, since the specification (RFC 6520) and the OpenSSL implementation were written by the same person (Seggelmann).

  10. ZippedyDooDah

    The good old days

    Writing and testing COBOL IMS DB programs was a relatively easy process and I was going to say that our testing was 100% rock solid. Then I remembered...

    As a junior programmer, working for a bank I was given the job of debugging an abend in a batch program that happened in the privatisation of a large UK company. This was pretty serious stuff. I had about five managers looking over my shoulder at one point. I actually turned on them and told them to sod off and let me get on with it, which remarkably, they did.

    This was something I was fairly familiar with, our junior programmers most common error, an 0C7. This was before abend-aid was released so there I was digging through a huge printout. I hasten to add at this point that I hadn't written the program.

    To my absolute delight, and free of managers, I spotted the problem. The author had only defined a total field as S9(9).99. The privatisation was so over subscribed that a billion quid and more flowed into the bank's coffers and the good old 999,999,999 was as big a number of pounds as mere mortals could contemplate. To those non 0C7 chaps, it was a case of "overflow", more properly known as a data exception. Apologies for any syntax errors above, it was a long long time ago.

    So yes, even in those day, with no random input from screens, people were fallible. I left the bank after 30 months of experience to go contracting. My best decision ever.

    Oh, btw, it was Barclays and Jaguar ;O)

    1. Michael Wojcik Silver badge

      Re: The good old days

      Writing and testing COBOL IMS DB programs was a relatively easy process

      Three decades of working with COBOL and IMS, and this is the first time I've ever seen someone make that claim.

      I suppose it could be relatively easy if the development team is very disciplined. Usually, in my experience, back-office systems like these very quickly become "enhanced" into grotesque shambling horrors. When our customers go to migrate them to our IMS emulation environments, often there's weeks or months of investigating what the various programs do, what source they were actually built from, and how they interact.

      But it's true that, as in your anecdote, at least you often got a useful abend and core dump, and could at least identify the point of failure. While that's generally true with standalone programs on Linux and UNIX, and if you're very lucky on Windows, contemporary applications tend to be distributed systems with a huge number of components, many from third parties, and diagnosing their failures can be brutal.

  11. clocKwize

    I think its wrong to say its always managements fault, and we as developers should look at ourselves, BUT if management are allow it to the get to the point that everything *has* to be rushed, because of unrealistic deadlines, scope creep, bad planning, etc, etc, the developers *have* to cut the corners, regardless of wanting to do so or not. Goes back to the Cheap, Good and Fast. Except Fast is "faster than is possible, because nobody knows how big it is or how long it will take"

    1. stu_san

      Someone else made the "Cheap, Fast, Good, Choose Two" comment earlier and I wanted to comment. Since you brought it up again, I will.

      I once had a job where I joked that, when presented the above choice, the manager chose Cheap twice. Yeah, that startup died.

  12. devTrail

    Drawbacks

    Maybe automatic code inspection is improving code quality, but excessive reliance on microservices and the diminished role of software architects is leading to a growth of complexity in the service interactions and the gradual loss of the big picture.

    I've seen people years ago promising that Object Oriented would have solved spaghetti code and the same people claiming few years later that the problem was spaghetti architecture, the solution they said was SOA. In the meantime the same mantra was developing for EAI and few years later all the talks were about ESB and Common Data models. Then everything ended up being labeled as spaghetti architecture and the solution became microservices.

    Soon people will realize that microservices are so much self contained that no developer cares to evaluate the bigger picture and the architectures are slowly turning again into spaghetti with or without code inspection.

  13. An_Old_Dog Silver badge

    The Root of Many "Software" Bugs

    ... is overly-functional/overly-complex protocols and RFCs. There are too damned many "empire builders", and people who delight in the creation of complex things, creating needlessly-complex protocols and systems, which the programmers then have to implement.

    See: "The Complicator's Gloves", https://thedailywtf.com/articles/Classic-WTF-The-Complicators-Gloves

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like