back to article Let's Encrypt? Let's revoke 3 million HTTPS certificates on Wednesday, more like: Check code loop blunder strikes

On Wednesday, March 4, Let's Encrypt – the free, automated digital certificate authority – will briefly become Let's Revoke, to undo the issuance of more than three million flawed HTTPS certs. In a post to the service's online forum on Saturday, Jacob Hoffman-Andrews, senior staff technologist at the EFF, said a bug had been …

  1. Twanky

    Minor arse-ache

    E-mail arrived, checked it for authenticity, ran the renewal processes, done.

    I can well imagine that an out of normal sequence cert renewal is going to be a right royal pain in the arse for some though.

    Really not great for Let's Encrypt's reputation either.

    1. whitepines
      Alert

      Re: Minor arse-ache

      The real fun starts when you hit LetsEncrypt's API rate limiting after a problem like this. It's entirely possible to be down for days if you cock up your update command more than a couple of times and had previously set a low renewal interval.

      Saw a customer site go down because of that, complete with mad scramble to buy a replacement (expensive) commercial cert since once you've hit the request limit, no one can override it (it's all automated 'guv!)

      1. Gordon Shumway
        WTF?

        Re: Minor arse-ache

        Well, if you manage to cock up *that* simple of a command, multiple times even, you should seriously be considering a career change. To one that does not involve electrons.

        1. Glen 1

          Re: Minor arse-ache

          Because, until you hit that limit there are no consequences.

          That said, the staging environment exists for a reason.

        2. This post has been deleted by its author

        3. whitepines
          Megaphone

          Re: Minor arse-ache

          Well, if you manage to cock up *that* simple of a command, multiple times even, you should seriously be considering a career change. To one that does not involve electrons.

          The problem is that all it takes is one cock up (e.g. "up enter" accidentally re-running the update command, in an environment primed with "update often" where re-running an issuance command is not expected to be able to knock everything offline). Or even in some cases no cock up at all is required; the company referenced earlier basically did this:

          * Set short update interval (which is encouraged, given short cert expiry times!)

          * Migrate several subdomains from a commercial cert to LetsEncrypt over a period of several hours (this was before wildcard DNS, the systems involved were quite complex, and under no circumstances was "shut it all down for hours to enable a single, untested cert migration" an option)

          * Test the automatic update system with one last manual update. Oh bugger, API request limit hit with that last test, no way to override, sites offline, scramble for commercial cert.

          When the limit is hit it shouldn't immediately knock out requests for days. It should soft fall back to minutes and maybe increase over time to hours then days, and in all cases LetsEncrypt should always be able to override it (possibly for a small fee if it was customer induced stupidity versus an external attack).

          In fact there's some concern a neer-do-well could effectively knock sites offline via a DoS attack using this little known "feature" of LetsEncrypt, though I haven't really looked into it in detail.

          And don't give me any bull about LetsEncrypt having a useful staging environment. It literally goes after your public Web servers on your production domains to even issue the certs in the first place, so you can't have a fully isolated staging environment that would have prevented the sequence above.

  2. rcxb Silver badge

    Why the mass revocation? Doesn't sound like anything bad happened. The checks just weren't up to snuff. So LE should go back and re-check the domains that were issued certs to see which, if any, should not have been issued from them, and only revoke those few. The certs will all be gone in a couple months anyhow.

    1. hazzamon

      I imagine that the mass-revocation is a requirement of the CA/Browser Forum rules.

    2. richardcox13

      Having read read some of the comment threads it seems revoking all potentially insecure certificates within 5 days of the discovery of the flaw is part of the Baseline Requirements agreed between CAs.

    3. tialaramex

      The rules for CAA makes it essential to check at the time of issuance, not before or after or just whenever you get around to it. Specifically they require the check to happen no more than 8 hours before you issue the certificate.

      Let's Encrypt published a much smaller list of names for which if the check is run today it fails, but there is no way to determine (without time travel) which names, on this list or otherwise, would have failed the CAA check if it had been performed as intended at the time.

      1. rcxb Silver badge

        The rules for CAA makes it essential to check at the time of issuance, not before or after or just whenever you get around to it. Specifically they require the check to happen no more than 8 hours before you issue the certificate.

        That's a flimsy argument. You can't model a practical threat where I want to allow LE to issues certs for my domain today, but I didn't yesterday, and so the cert issued yesterday (which also validated domain control and the like) is dangerous. The other case is infinitely more risky.

        a much smaller list of names for which if the check is run today it fails

        Exactly! Those small number are the ONLY ones that should be revoked. Maybe they were valid when issued, or maybe not, but there you actually have a high risk that the bug caused something untoward to occur. The rest are meh.

    4. Wayland

      Yeah right, they used a loop variable wrong.

      This is actually a power flex. They are testing their muscles. Potential investors will see the power a Certificate Broker can wield. Google Chrome last year made HTTPS mandatory if you don't want warnings. In some cases you simply can't open the site even if you drive around all the road blocks.

      What is Let's Crypt's business model? Give away certificates like the child snatcher gives away candy?

      I can see Google buying all the Certificate Brokers and owning all the web browsers. They will then have web superiority over the Internet.

      1. Anonymous Coward
        Anonymous Coward

        Their business model is to sell professional packages to companies that have a lot of domain names in a single account, such as hosting companies.

        That's what the API limit is for.

      2. Michael Wojcik Silver badge

        I'm not particularly fond of Let's Encrypt or the HTTPS Everywhere movement, because of increased threats such as typosquatting and IDN homographs, and because they're an excuse to avoid fixing the real problems. But even as conspiracy theories go this is impressively feeble.

        Let's Encrypt is primarily a free service achieved by maximizing automation to reduce costs, and funded by premium packages and donations. There are plenty of commercial CAs already, and anyone who's interested can learn as much about the CA business as they like - including what sort of power CAs wield.

        Your hypothetical "potential investors" either already understand the CA business, or won't notice this glitch.

        If Alphabet (or Apple, or Amazon) wanted to buy up the major CAs, they would have. They haven't because the CA business sucks, frankly. Margins are lousy and no one likes CAs - they're a hassle when they work and a danger when they fail. The CA/BF slaps one band-aid after another onto a fundamentally broken PKI so it can just manage to make the work factor too high for non-targeted attacks. The whole thing is dreary.

    5. CrazyOldCatMan Silver badge

      Why the mass revocation?

      Because they could (theoretically) have issued certificates to people wo were not entitledto them - if you submit a list of domains and 'accidentally' included a Microsoft sub-domain in the list their old process wouldn' have picked it up.

      So, safer to revoke all the certs and get people to renew - and that will pick up the wrongly-issued ones.

  3. Anonymous Coward
    Anonymous Coward

    What's this, a bug caused by a language quirk?

    And there was me thinking Go was one of the new breed of languages that didn't have any nasty semantic gotchas like C++ or Java.

    Yet again it proves that there's no such thing as a safe language since to do anything useful it has to give you the power to blow your foot off in which case what is the point of these new languages other than an ego trip for its developers and a lock-in to the environment.

    1. MacroRodent

      Re: What's this, a bug caused by a language quirk?

      > what is the point of these new languages other than an ego trip for its developers and a lock-in to the environment.

      Would you then prefer coding in FORTRAN IV or COBOL, since newer languages are just ego trips? Of course not. No language can ever guarantee absence of bugs, but improvements can be made nevertheless.

      Ob car analogy: Modern cars are much safer that what people drove in the 1960's, but fatalities still sadly occur. That does not mean all the improvements were pointless.

      1. Anonymous Coward
        Anonymous Coward

        Re: What's this, a bug caused by a language quirk?

        "Would you then prefer coding in FORTRAN IV or COBOL"

        I'd sooner do scientific programming in fortran than Go. And if Go, Rust etc had invented any new paradigms like LISP, Prolog or Smalltalk did then you would have had a point but they haven't. They're just old wine in new bottles.

        1. Michael Wojcik Silver badge

          Re: What's this, a bug caused by a language quirk?

          Rust is significantly innovative, thanks to borrow checking.

          I'm not particularly impressed with Go.

          (And I'd do my scientific programming in Julia, probably. Python may have the libraries and community these days, but I find Julia much more to my taste.)

        2. DrXym

          Re: What's this, a bug caused by a language quirk?

          Here is Rust's paradigm - safe by default and portable. Look at the CVE database - it's filled with exploits / vulns caused by null pointers, buffer overflows, data races etc. from code written in C or C++. Realise that NONE of those can happen in Rust because the compiler will kick your ass if you try. The consequence is that compiled code is safer code, slashing the number of bugs which means less development time, better quality and happier customers. Oh and often that code can be more performant because devs can utilitise threading, asynchronous operations etc. because the compiler catches all those aforementioned issues.

          Here is Go's paradigm - scaling, portability, stability and convenience. It is very terse, has a very large standard library (crypto, networking, images, math, hashing, encoding etc. all out of the box), is well suited for scaling network applications, and it compiles code into executables. It also does not suffer from the same kinds of vulns as C or C++ thanks to runtime checks.

          So yeah you can use some older language if you prefer but you have to realise why these languages exist and it helps to use them to speak from a position of experience. What I don't quite follow is the irrational attitude that always permeates discussion about new languages, as if some old timer sees this new fangled language as a threat or something. Nobody is forcing you to use them, but personally I like to keep my skillset fresh and relevant and using a new language or technology is a way to do that.

          1. Loyal Commenter Silver badge

            Re: What's this, a bug caused by a language quirk?

            And I expect it's perfectly possible to create the equivalent of

            do while (true) {}
            in both languages.

            If a language is Turing-complete, it's not possible to specify it such that you can't create a bug, in much the same way it is not possible to determine if any program will complete.

            1. DrXym

              Re: What's this, a bug caused by a language quirk?

              Well that's a stupid straw man argument. I never claimed a language like Rust fixed bugs like application logic errors. But it does fix entire classes of problem that C or C++ allow through - buffer overflows, NPEs etc. It should be very obvious why that is a good thing in a language.

              1. MacroRodent
                Boffin

                Re: What's this, a bug caused by a language quirk?

                As another commentard pointed out, most of the new stuff really has appeared in older languages. Sometimes much older. For example, Simula 67 (from 1967 like the name says) had most of the same features that make Go or Java safer. Managed memory, run-time checks, no wild pointers, compilation with strong typing. Even classes and inheritance. But almost nobody uses it any more.

                Go etc. add finesses, and also follow a syntactic style that people are now familiar with. Simula 67 syntax is based on Algol, so long keywords, begin ... end instead of { ... } ...

                I think one reason the older innovative languages have fallen by the wayside is that at the time they were introduced, known implementation techniques did not allow making the fast enough for production use, and computers also were slower and had less memory. I recall Bjarne Stroustrup saying he started developing C++ for a project where he first tried to use Simula, but it ran too slowly.

                So programmers were enticed by the low-level, anything goes C, and later C++. Then managed languages became more feasible, thanks to faster computers and innovations like better garbage collection algorithms and JIT, but these were applied to new languages instead of attempting to resuscitate old ones. It is easier to "sell" something syntactically C-like to programmers who grew up with it.

                I'm fine with this. Having done most of my professional programming in C for decades, I now believe very few programs should be written in it. Mainly kernels and drivers. (Perhaps one should have a license to use it). Everything else should be programmed only in managed, checked-to-hell-and-back languages. Even then programmers will keep making stupid errors, but there is some hope there will be a a bit less of them, and they are caught earlier.

    2. tialaramex

      Re: What's this, a bug caused by a language quirk?

      Go isn't "one of the new breed of languages". It's from Pike and Thompson whose names may be familiar because they designed Unix. It's young the way a 14 year old Prince starting at Eton is "young". Physically it's not very old, but it does not represent great modernity or innovation.

      This mistake would actually be caught at compile time in some "new breed" languages such as Rust.

      1. Claptrap314 Silver badge

        Re: What's this, a bug caused by a language quirk?

        At that level, Go is Frankensteinish.

        No (initial) support for exceptions, because we hates them!

        Channels, while probably not novel, certainly much more accessible way of handling the matter than most.

        Two-value return as common case to discourage magic. (And we hates exceptions!)

        Screaming fast compile (good for unit testing).

        NO unified testing strategy.

        Classes? Let's just use interfaces.

        Generics? Nah, brah. Type faster.

        Perhaps most of these decisions are the right one. But still, quite a mix compared to "new" & "trendy".

        1. DrXym

          Re: What's this, a bug caused by a language quirk?

          I'd describe Go as a swiss army knife. The standard library is huge and ideally suited for writing scalable web servers, utilities etc. that compile into standalone executables and run a multiple times faster than if they were written in languages such as Ruby, Python and NodeJS.

          I hate the way the language handles errors though which is inexcusably crap.

    3. Julz

      Re: What's this, a bug caused by a language quirk?

      The point of these new languages is so that all of the ip in the stack used by the mega-corp is owned and controlled by the mega-corp. There is also the fact that locking in developers to your language is also a fun and potentially lucrative thing to do. They seem to be, on the whole, chimeras of other languages rapped in whatever shiny the owner can rustle up. Nothing really new, including it would seem, the built in ways to silently cock things up.

      1. MJB7

        Re: The point of these new languages

        What are you talking about? Rust is open source, Go is open source. Google have explicitly given up control of the IP in Go, Mozilla never were a mega-corp but they have given up control of the IP in Rust too.

        1. Julz

          Re: The point of these new languages

          Open source licenses don't really seem to have the effect one would hope with languages. If the main contributor and initial designers of a language moves things in ways that the community doesn't agree with, then the language tends to loose it's wider appeal.

    4. DrXym

      Re: What's this, a bug caused by a language quirk?

      Every language has pitfalls. Some languages, in particular C and C++ have more pitfalls than others. If there is any doubt about this, search for the causes of the Heartbleed or "goto fail" bugs in OpenSSL.

      And regardless of the language, all software can still have application logic errors.

      1. Michael Wojcik Silver badge

        Re: What's this, a bug caused by a language quirk?

        The cause of Heartbleed was a missing length check. It was a trivial error, even if it had non-trivial consequences. Variants of it could have happened even in languages with array boundary checking, for example if the heartbeat code reused sufficiently-large buffers.

        Goto-fail wasn't in OpenSSL at all, but in Apple's TLS stack. And it happened because of poor code style and a failure to use an adequate static-code analyzer. (Some C compilers, run with appropriate options, are sufficiently good linters to catch the goto-fail bug, because it leaves dead code; any decent standalone linter or general static-code analyzer should catch it.)

        Both are in C, so say nothing about C++.

        In any case, there are much more prominent dangers in C, particularly unsafe functions in the standard library (which should be avoided or wrapped in safer abstractions, not used directly in application logic); the use of in-band signaling in strings, stdio formatting, etc; relatively weak typing; and a cumbersome error-handling model.

        I agree with your general point - all programming languages have pitfalls, and some are more difficult to use properly than others. But your supporting argument leaves much to be desired.

        1. DrXym

          Re: What's this, a bug caused by a language quirk?

          No, both of these were caused by problems in the language.

          - Heartbleed might have been "trivial" but it was caused by unsafe code that would have thrown a runtime exception in other languages and been instantly caught.

          - "goto fail" would have thrown a compile time error in other languages where blocks MUST be delimited or goto isn't even a thing.

          The point is the language caused these bugs to enter production with massive consequences.

          And yes C++ can mitigate some of these things, but the real world says hello. These things still happen and do happen.

          1. odyssey

            Re: What's this, a bug caused by a language quirk?

            "goto fail" specifically couldn't have happened in languages where "goto" isn't a thing. But you can create logic errors in every language. As we've seen with this Let's Encrypt bug, you can construct logically wrong code. You could write the equivalent of "goto fail" in a "goto-less" language, by getting a break or termination condition wrong.

            For example the Zune player bug where a missing break clause in date calculation code caused an infinite loop. All languages could have such a bug, the language itself can never prevent it. That's why we need careful, rigorous testing. These bugs should really be seen as testing failures. A failure to verify that the code is correct in every possible condition.

            1. DrXym

              Re: What's this, a bug caused by a language quirk?

              Of course you can create application errors in other languages. The point is that C and C++ have their own errors on top. This honestly isn't up for debate.

              1. odyssey

                Re: What's this, a bug caused by a language quirk?

                No I don't deny that. Higher-level languages make exploitable buffer overflows impossible, for example. They should be used for front-facing systems where possible, and usually are. This bug however isn't due to low-level C memory handling, neither are many other bugs. Therefore it's more rigorous testing that's needed.

    5. Arthur 1

      Re: What's this, a bug caused by a language quirk?

      Go is a niche throwback language, don't confuse it with truly modern languages. It's made by plan 9ers for plan 9ers, really only works properly on *nix since it has a ton of OS-y things baked in, and really is only useful for massive multithreading on servers. It's well known for playing fast and loose, not a safety focused language as you seem to assume.

      As for what's the point of it? Overall there is no better way to write massively lightweight-threaded code in Unix-like environments.

    6. odyssey

      Re: What's this, a bug caused by a language quirk?

      I'd say Java has very few gotchas. Other than equals() vs == it's hard to make silly mistakes.

      1. Michael Wojcik Silver badge

        Re: What's this, a bug caused by a language quirk?

        One word: Deserialization.

        1. odyssey

          Re: What's this, a bug caused by a language quirk?

          I wouldn't say deserialization is a language quirk or gotcha. It can be used unsafely, but so can anything that lets untrusted users do things. Go's use of loop variables, the difference between = and ==, etc are gotchas because they look like they should do something different to what they actually do.

  4. Steve @ Ex Cathedra Solutions
    Go

    Easy-peasy

    Got email - 3 out of about 30 certs affected, one of which was redundant now anyway. Ran the renewal, done and dusted in 5 minutes.

    Compared to the last time a certain company cocked up my certificate issuing and I was down for a week, this was heaven!

    I agree it's a bit of overkill - but on the other hand NOT doing it might be worse for reputation.

    1. Anonymous Coward
      Go

      Re: Easy-peasy

      Didn't affect me but doing this improved their reputation in my eyes.

    2. boxplayer

      Re: Easy-peasy

      Same here, only one domain of about 50 affected anyway, but deleted the cert file and ran my script and it went through like clockwork.

    3. Anonymous Coward
      Anonymous Coward

      Re: Easy-peasy

      Similar experience here, though just one instance needed the update.

      I suspect the possible reputational damage might end up being neutral. LE's transparency on this has been pretty good. The only thing that might have been added is something in the email to indicate it wasn't some scam, perhaps a non-clickable URL to a third party site explaining the email, or somesuch.

      As you point out, compared to the experience of shenanigans of proprietary CAs, this was well handled.

    4. Michael Wojcik Silver badge

      Re: Easy-peasy

      Not doing it isn't an option. Basic Requirements.

  5. tin 2

    I picked a bad day to get into working with LetsEncrypt then! (been seeing random API errors all day - I now presume due to load)

    1. knelmes

      Looks like I picked the wrong week to quit sniffing glue.

      1. Anonymous Coward
        Anonymous Coward

        And don't call me Shirley.

        1. This post has been deleted by its author

          1. Loyal Commenter Silver badge

            I just wanted to say

            Good luck, we're all counting on you.

  6. VerySlowData
    Go

    got email

    We got the email, followed the advice and the certs for our affected domains were renewed.. no muss, no fuss. Kudos to LetsEncrypt for being upfront about this..

  7. Temmokan

    The only problem is the email notification came approx. 12h before the revocation is supposed to start.

    I can consider myself lucky to have only 2 certificates to re-issue, and neither required manual work.

    A sad error, yes. Just curious, do they ever run any tests, or "commit directly to production, what can possibly go wrong?"

    1. tialaramex

      The code with the mistake in it was written about three months before it went live, and then was live for several months before anybody reported a symptom from that mistake, after which it took several days for Let's Encrypt staff to realise the problem wasn't just cosmetic (a user complained that they got lots of the exact same error, the staff took a while to realise that really was because the exact same FQDN was being tested over and over, instead of a different one each time)

      This code had tests when it was written verifying that the correct number of validations is done, but that test doesn't actually check those validations exactly match the set of names to validate. It was also covered by another test to see that all necessary validations are done for a test issuance, but that test uses fresh accounts and so didn't test the scenario where the fault happens. Let's Encrypy's remediation included new unit tests to verify this precise problem.

      1. Anonymous Coward
        Anonymous Coward

        Which meant:

        1) The programmer didn't know how to write code

        2) The programmer didn't know how to write tests

        Anyway, I'm surprised by the bad design of Go about parameters passing.

        1. odyssey

          Same, I have thought that Go looked interesting as a language. But if it behaves in counter-intuitive ways, it clearly hasn't been thought through that well.

  8. Mike_R
    Linux

    No email so far: 08:17 local time 06:17 UTC

    so to be doubly sure ran a certbot update, which took all of 30 seconds, and then check returned success.

    Lucky (?) I saw the article!

  9. demaniak
    Facepalm

    TDD still a thing?

    Seriously, this sounds like something that should have been caught by a (probably very simple) unit test.

    Yikes.

    1. Stephen Wilkinson
      Joke

      Re: TDD still a thing?

      Agile of course!

    2. baud

      Re: TDD still a thing?

      I think the test case isn't that trivial, since it's in a case where more than one domain names need to be checked (batch mode) and a domain name other than the first isn't valid.

      Edit: from the GitHub repo, they only have 65% code coverage, so it's not that surprising that case wasn't covered.

    3. Michael Wojcik Silver badge

      Re: TDD still a thing?

      Yes, it could be caught by a very simple unit test. The problem, as always, is combinatorial explosion: There are a huge number of potential problems that could be caught by a similarly huge number of trivial unit tests. Someone or something1 has to create those unit tests.

      Per a comment above, there was a unit test which exercised the code path in question, but its verification logic didn't check all aspects of the output, and so missed this error.

      1There's a fair bit of interesting research into automating test creation.

  10. Mike 137 Silver badge

    The real world

    Testing is a legacy concept.

    Engineer "The tiny software of the Saturn launch computer took two years to test"

    Manager "We don't have that long, so just get it out the door"

    1. eamonn_gaffey

      Re: The real world

      The testing legacy has been replaced by the mentality of : "sling any shit live because my bonus depends on it", in an "agile" fashion, because we need to "act fast and break things(literally)" yawn, yawn, yawn..... Some things are fundamental, like a comprehensive test pack.

      1. Ozan

        Re: The real world

        Remember what Agile did to MS and Windows. Al lthose years, I was afreaid to install a service pack.

  11. Anonymous Coward
    Anonymous Coward

    Who is governing Public PKI providers ?

    This is such a simple coding error, where is the human intervention to sample the certificate requests for integrity.

    TLS/SSL is relied upon for assurance. This makes you wonder how other PKI providers automate requests and how they are audited.

    Regulations require TLS/SSL yet there is obvious a failing in the regulation of this key architecture artefact.

    1. ItsAFunnyOldGame

      That would be the CA/Browser forum, they agree and set the baseline requirements. However the balance of power there, at the moment, lies with the browsers.

    2. Michael Wojcik Silver badge

      This makes you wonder how other PKI providers automate requests and how they are audited.

      Well, no, it doesn't, because there's a ton of material readily available regarding the state and history of PKIX and the public X.509 CAs. So anyone who's interested in this question can easily find an answer.

      Ivan Ristić's Bulletproof SSL and TLS has some good historical material, for example; and you can stay up to date with Hanno Böck's Bulletproof TLS email newsletter. For background there are the PKIX and HTTP-over-SSL RFCs (5280 and 2818 in particular). The CA/BF makes all sorts of stuff available on their website. And so on.

      Anyone who's paying attention knows PKIX and other aspects of the public X.509 PKI such as code signing are a mess. Arguably they're less of a mess than they were, say, a decade ago; Certificate Transparency and the increasing willingness of browser manufacturers to shut out bad CAs have helped. But it's still a sausage factory.

    3. Anonymous Coward
      Anonymous Coward

      > sample the certificate requests for integrity

      It's highly likely that all of them were perfectly valid. A human review of the inputs/outputs would not have caught this. Only a review of the actual code or test suite.

  12. NerryTutkins

    CertifyTheWeb

    Been using CertifyTheWeb on IIS for a while for letsencrypt certs, they have a new version, if you go in and apply the update, you can then ctrl click the 'renew all' button and it will force all certs to renew, even ones that have not expired.

    Just done three servers in about 10 mins. So all in all, I think while the error is a bit of a howler, all parties involved in the clean up seem to have stepped up.

  13. Blane Bramble

    Had one (of many) SSL certificates affected. Read the email, forced renewal, ran a script to put the updated certs where they are needed. Restarted a service.

    OK, a bit more notice would have been nice, but it took all of 5 minutes.

  14. Simian Surprise
    Facepalm

    FTFY

    "The proximate cause of the bug was a common mistake: Go..."

  15. MtK

    Akamai CDN certificates not renewed yet

    One of our domains appeared in the list. It's for a certificate issued for an Akamai CDN endpoint. Still not renewed as yet.

  16. Richard Pennington 1
    FAIL

    Whatever happened to code review?

    I'm retired now, but in my day (and in my security-critical environment) all critical code was peer-reviewed.

    1. mr-slappy
      Thumb Up

      Re: Whatever happened to code review?

      Hear hear!

      In my day not only was code peer-reviewed, but specifications (does anyone do those any more?) as well.

      I still encourage that approach whenever I am able to (sadly not that often in these days). Peer-review can identify many defects that the author will never see because they are immersed in the work.

      1. Alterhase
        Unhappy

        Re: Whatever happened to code review?

        Many years ago, I was the team lead for RAS (Reliability, Availability, Serviceability) for a new storage product at a large company.

        In that role, I sat in on the code reviews and brought up issues as I saw them. I was explicitly "uninvited" to the review meetings because I was slowing down the process.

        Guess what: when the product was shipped it had a number of issues that I had explicitly warned about!

    2. baud

      Re: Whatever happened to code review?

      Well, it was reviewed, you can take a look at the process there: https://github.com/letsencrypt/boulder/pull/4134

    3. MJB7

      Re: Whatever happened to code review?

      Code review is great - but it is very easy for errors to slip through code review.

      Not exactly the same, but I remember three VERY experience engineers looking at a buggy piece of firmware and agreeing it was buggy - but not the end of the world (just a DOS potential, and it wouldn't be exposed to the internet). Then it all went quiet when one of them realized it was actually a critical security vulnerability. Oops!

      1. Michael Wojcik Silver badge

        Re: Whatever happened to code review?

        Yes. In software as in all things you really want defense in depth. Code review, static analysis, dynamic analysis, fuzz testing, as much unit/functional/system testing as you can manage (ideally with automated and human test generation), ...

        But code review, done properly, improves the incentives for writing readable code. That's very important not just for security but for maintainability, and thus for reducing total cost of development. Having looked at that Let's Encrypt bug report, I'm not surprised the error was missed; the code follows the "every letter is expensive!" style popularized by generations of mediocre C developers. (The formatting is also awful, but I assume that was due to pasting it into a blog post that doesn't support proper whitespace-preserving formatting. Like some other sites I could mention...)

        1. baud

          Re: Whatever happened to code review?

          You can take a look at the actual formatting there:

          https://github.com/letsencrypt/boulder/pull/4690/commits/ba6b85ee126b50d19837a53a52cb6d894594935f

  17. Anonymous Coward
    Anonymous Coward

    A message for Sir Andrew Parker

    0nbb0SWz0Vaw1EcD0Ln41MQp03bX1gQu1GT11gwh

    0Pk91lsg0c8=0=dG0Tia0mib18vL1P8P0Vv21NdB

    1H4U1mWr17c81bRH149X1WIK01go1GHa1iUJ16FA

    0WgQ0Jam0GOd0VBG1CkS1Rj30reB0pMx0YMd0uka

    0eZh19n60R6=1cpN0PJz0GyI0V5D0Kif0dY00H9b

    13xn0h4=13aH0=up1BCF0KTy0J0X1kT=1P=R1Ud9

    0iqI0GcT1B5e13Fb0kjI1GMv1HiU11ft0KWj10CJ

    1a8$1eBI1CEi0MGw1WxQ1kIr14AX0cD10c$t0zYP

    0ESh15Gw0K3=1i9w0RhL1Mb90jd91EaX1Jof0kym

    0jOx1d6x1Xmv1KCZ1Xtn0ffd15xt0BB103P50iV9

    0XgE19og1azy0e6U0i$s18mP1cfA0GwY1WGK0JjV

    05GV04fp16P41Z911cHr0e7a19cm0ZV40X$h1MOa

    0hYk1kN70eQl1eQl0$fv192N0mOp1QaW05p50alB

    0MtW1AEU1m=s1GN40ebt1JEz1YTz0Eot017e1b39

    117$05E21PiB1f5Z1TkM1h3N1Kxg0VDJ01X004Yx

    0Cow18u01jWO0$X51LoY1UXn0pIJ1Juc1Zg=1W=d

    07NQ16Cw1fBB043T0Tca1B$d1fkY0k9=0JI=0FE5

    0HJ=1JaD1If=05KD1jIe02K70BoV1khy1QwN028o

    0BIr0mE60Iex1aeV0l9n0e9r1Uwh1IPR09jJ1UZ7

    0Q7X1Qs40y9G0pvI0H$t0j$b0Iwy1Axw1bq019X7

    1H4Z0TuC0XlM06bM1bRW1PZN1mVf0LYx1i9Q0CHV

    0e8113EP0AxD1XTW1f6d0HQS0L=J1Tsa1DQ404g6

    0lnP1g$=19oM0KmO17QL0Mty0KNO1A5M0lKi0WO2

    05U50$dG1Io605lc1LT80PvQ1Wtu1laN0C4Y1HZ7

    1Uyd1a7i01Jh1E6Y1HGN0PCM1XbP19D60S9W0foz

    16pf0rfP1ETj01AI07CS00ak0qEF00fh09MK1hzI

    0Mt21Prb0uTX11qG094j0v3v0fkO0nZE0XSm0HAC

    16n70s9q11491DAW1fzg0rsI0BX51WnV0OwM0d5S

    0iHK0cgI

  18. Lunatic Asylym Seeker

    Obviously the code was written by somebody learning on the go :-)

    Well somebody had to say it ....

  19. Anonymous Coward
    Anonymous Coward

    Here we Go again

    Can we please stop creating languages with semantics where common bugs are so easy to create and hard to spot? When your language guru says that "The proximate cause of the bug was a common mistake in Go: taking a reference to a loop iterator variable," the right response is "does the language have to allow or encourage such a common mistake?". Is there a valid use case for taking a reference to a loop iterator variable and more importantly, *is that use case so necessary that it's worth the risk?*.

    And yes, I'm sure somebody will be along to say "perhaps you'd rather be programming in FORTRAN or assembler?". Of course old languages have their own problems. That's not an excuse for propagating flaws in C half a century ago, either from deliberate compromises at the time, or where we simply know better now, into new languages.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like