You'll get nothing from clang using -w
-Wunreachable-code .... case is important
In the 19 years that have passed since the first implementation of SSL, you could be forgiven for expecting that the industry could do it right by now: and yet last week, not one but three SSL vendors were discovered to have implementation problems. Belkin was caught not checking SSL certificates; WhatsApp was discovered to …
GCC also has -Wunreachable-code. There are reasons why it is not enabled by default: unreachable code warnings are very often spurious. Debug code may be unreachable with one set of preprocessor flags and reachable with another. Code in inline functions may be unreachable in one instance and not in another. Etc.
"Yep, downvotes.
Shitty developers abound, too. No surprise there."
Shitty comments abound, too... You should expect some down votes for making an unsubstantiated snidey comment about a wide-spread set of tools that a lot of people use by choice.
P.S. Conflating people who 'downvote' you with "Shitty developers" is probably generate some more down-votes for you.
> Shitty comments abound, too... You should expect some down votes for making an unsubstantiated snidey comment about a wide-spread set of tools that a lot of people use by choice.
Maybe the op thought that what he said is did not need to be substantiated given the audience on this forum. For many application areas standards identify subsets of language features that can be used, e.g. MISRA for c/c++
Er, I think you mean "in this case, the compiler". If the unreachability requires knowledge from outside the function (so the optimising compiler can see it in a particular case, but the programmer cannot assume that for the general case) then the code should not be flagged.
Which means it's a case of shitty tools - in this case, the language.
No, in this case, it's the implementations.
There is no reason - none whatsoever - that the implementation should not emit a diagnostic for adjacent gotos. That's not just unreachable code; it's patently unreachable code due to a trivial and explicit construct in the source. It's not a matter of someone using goto, break, or return to disable code (which is poor practice anyway), and it's not a will-never-succeed control structure like "if (0)" (which is also poor practice).
This is clearly a case where the implementation can distinguish between cases of unreachable code that are possibly intentional, and those that almost certainly are not.
That said, I too am of the opinion that if your program is riddled with so much unreachable code that you have to disable the warning - much less have it disabled by default - then your code is crap. Any developer working for me who produced code like that would be required to clean every one of those warnings up.
(And that said, I'll also repeat what I wrote in another post on this topic: It's possible to write good software in C and C++, but it's not easy, and very few programmers learn enough of the languages to do so. The number of C programmers I've met who understand incomplete structure types, for example, is appallingly small.)
In current gcc, -Wunreachable-code doesn't do anything. That functionality was removed. In Clang it does work, but isn't turned on by -Wall. Apparently 'all' doesn't mean 'all' in Clang-land.
The real problem though is that the code clearly doesn't have unit tests, and they probably didn't do any analysis of whether all the code was covered by the tests.
Monday morning quarterbacking. There is always one test you didn't realize you needed to perform. Remember that situation with some unixes where they created horribly weak SSL certs due to a subtle bug that didn't mix in enough randomness? This stuff happens, relax.
By all reports, the test case would be an unholy bitch to create, which makes me wonder exactly how exploitable the problem really is.
Never attribute to Malice anything that can be accomplished by Stupidity.
Come on, we've all looked at code for days and not seen the problem, only to awaken in the middle of the night and realise that the problem is simply that we're always checking for a>b and never a=b or some similar malfunction (sic).
Easy. There are examples out there. Which means that creating a test case is also easy. Time-consuming, perhaps, but easy. All the certificate tests would be time-consuming but easy, and I *really* hope that they've got at least some - and that means that they know how to create them, so creating more shouldn't be such an awful task.
By all reports, the test case would be an unholy bitch to create
Not "by all reports". Adam Langley created a test case for it.
It is a bit subtle. You have to be using a protocol prior to TLSv1.2 and a non-anonymous DHE or ECDHE suite. The certificate chain has to be valid but the ephemeral key signed with the wrong private key. You could achieve this by modifying the ServerKeyExchange message in flight, or by modifying the SSL/TLS stack; I don't think you can easily do it with an out-of-the-box OpenSSL-based application, for example. (You'd probably have to create a filter BIO and modify the ServerKeyExchange message before it's sent.)
Exploiting it? Act as a man-in-the-middle and pass messages through unchanged, except for SKE. By substituting your own ephemeral key in the SKE, you can decrypt all the traffic.
I've also ignored the conspiracy theories, as the "programmer mistake" sounds about right. But I wonder if there is an outside chance that this duplicated line was a result of a broken merge?
Either way, the issue is with the testing, and so comes back to the "do it cheap" culture that the article is complaining about.
Hi Pet,
The other article on ElReg about the SSL issues is here:-
http://www.theregister.co.uk/2014/02/23/apple_mac_os_x_10_9_ssl_fix/
and lists a website (gotofail.com) which will helpfully check your cert authentication process is actually working. I suspect that it is a much more easily crafted exploit than we might think.
Regards,
TB
Security is the one spot where you have to be not just good but perfect, because people will jump on the tiniest mistake (like this one) to crack your things. Security has to be tight all the time and the crackers just need to get lucky once.
It's also the hardest to test, because the testers in your company won't be 1/10th as motivated or ingenious as the crackers.
I won't even get into how completely undocumented SSL is, and how hard it is to use. I've ranted on that before.
1) This is not a tiny mistake. It is tiny only in the sense of how delivering the wrong container is an off-by-one error because the serial numbers were nearly the same.
2) This has NOTHING to do with security & encryption. If you have this kind of problem in your radiation therapy machine, you have Therac-25 and dead people.
3) "It's also the hardest to test, because the testers in your company won't be motivated". You are doing your "testing by the gorilla regiment" wrong
Using goto when goto is the best approach (such as a series of tests with shared non-trivial clean-up) is reasonable. It would be possible to fake the gotos with breaks in a do...while (false) one-shot loop but that is a significant abuse of a loop (and the one-shot nature is hidden at the bottom).
In this case a boolean tracking hasFailed and checking in each test would be reasonable, but if the existing tests are more complex the addition of checking that boolean adds to the complexity.
In the end the pattern of a series of tests with a fall-out to common clean up is seen often enough the pattern is recognised and thus the code easier to understand.
Remember Dijkstra's paper (it is worth the effort to read) is about the overuse of goto when there are better alternatives. Too many people see the title without reading the content and thus "ban goto".
All that said; in more than two decades of professional programming I've use goto twice (C++'s ability to do cleanup in a destructor of local objects really helps).
The code in question highlights the benefits of at least 4 essential coding standard rules:
Never initialise a return value if the language can detect an unassigned value (e.g. java) otherwise initialise to a value that can not be possibly be correct.
Have automated unit tests, both for success and failures.
ALWAYS add braces around blocks of code
Have automated unit tests, both for success and failures.
Plus a couple of minor ones:
only use goto _really_ improves readability (only rarely seen cases where it does)
don't use tabs for code indentation
In this case I've have thought a code development IDE would have made this glaringly obvious to the operator if they allow it to indent your code.
> ALWAYS add braces around blocks of code
They haven't got you writing python yet, then?
Vic.
Actually, I wondered if this was a case where programming in one language blinded a programmer to the bug written in another language.
There are a lot of comments in this and in prior articles about the necessity of using braces even around single-statement if blocks, which up to now struck me as odd -- how could anyone not see the indentation error? But if your brain is trained to view multiple-indented lines as a single block regardless of braces, then maybe you wouldn't "see" the problem.
Note, not blaming python (for what it's worth I'm writing a project in coffescript right now), and I don't know if the coders in question even use python or other indent-block languages. But I do wonder if conflicting programming standards contributed to the problem.
Very low level system code like authentication is often written in C, not C++. In fact, this page...
http://nakedsecurity.sophos.com/2014/02/24/anatomy-of-a-goto-fail-apples-ssl-bug-explained-plus-an-unofficial-patch/
...not only describes it as C, but the code snippet that shows the error is clearly written in a style commonly used in C, but not in C++ (although it is valid syntax). C++ has exceptions... C does not.
The syle of the code in question is awful. Shite even. I'm sure it was acceptable 30 years ago when it was probably written, but really.
Using "goto"? Nil points (and using break within a while loop is just as bad by the way)
Not using braces in a condition block? Nil points.
Assigning within an if() clause? Nil points.
This is a fantastic example of how to write a really poorly structured piece of code. No wonder the fault was never spotted.
I know, some people will say "don't reinvent the wheel". Was it dumb for Apple to (apparently) implement its own SSL library, instead of using something like openssl? Perhaps.
However, serious bugs have been found in widely used open source security software that have been present for a long time before they found (or publicly found) If everyone was using openssl, on average everyone would be more secure. Apple would have avoided this bug. Sounds good, right?
Except when a bug was found in openssl it would affect everyone, all at once. There's no way that could be kept quiet long enough for patches to be developed for more than a fraction of devices, so you'd end up with 3/4 of the world open to attack for a few days or longer.
Your third paragraph is why. Mono-culture is bad, however much better that one is currently.
(This is why I am glad IE and Firefox do not use Webkit: everyone using one browser engine would just lead to a repeat of the first browser wars with Webkits' defects being the new standard.)
Two wrongs don't make a right: software can always have bugs. Open source has the advantage of peer review and the chance to learn from each other's mistakes.
Apple already makes extensive use of open source software in the stack but it doesn't really embrace it. No, this doesn't mean that they should suddenly open source all their stuff immediately but that they can contribute more actively to making key libraries better for everyone. Doing this properly would mean Apple developers could spend time reinventing and retesting the wheel.
Currently, if you buy a Mac your POSIX stack will stay frozen until Apple release a new version of the OS (Apple's openssl on my machine seems to be 0.9.8y, MacPorts is on 1.0.1f). It would be a cinch for them to adopt any of the ports projects and integrate into the OS and lever their own sophisticated QA so that we all get better components.
All of this has nothing to do with a caffeine-infused development culture which I think is irrelevant here. Companies still focus on features over quality. Someone took a decision here not to implement code review, static code analysis, pen-testing, etc and all likelihood that wasn't some kid hunched over a keyboard a 3 in the morning.
The vulnerability was publicly released by Apple on 24 September 2012.
Source: https://mobile.twitter.com/Jeffrey903/status/437273379855667201?screen_name=Jeffrey903
NSA internally boasts of Apple being "added" in October 2012.
Source - slide 6: http://www.theguardian.com/world/interactive/2013/nov/01/prism-slides-nsa-document
More info:
http://daringfireball.net/2014/02/apple_prism
From the daring fireball link you've given :
> Once the bug was in place, the NSA wouldn’t even have needed to find the bug by manually reading the source code. All they would need are automated tests using spoofed certificates that they run against each new release of every OS. Apple releases iOS, the NSA’s automated spoofed certificate testing finds the vulnerability, and boom, Apple gets “added” to PRISM.
That's exactly what's needed. A proper functional test suite attempting to fool an SSL implementation. Not in the hands of the NSA but of everyone who produces SSL-based apps.
I find it hard to blame Apple they didn't have this. But they definitely should be working on it now.
From the daring fireball link you've given :
> Once the bug was in place, the NSA wouldn’t even have needed to find the bug by manually reading the source code. All they would need are automated tests using spoofed certificates
And once again, Daring Fireball is daringly wrong.
"Spoofed certificates" would not in fact work with this bug. See my explanation above. Exploiting the bug requires a valid certificate chain and tampering with the key exchange, for a DHE or ECDHE suite.
I could believe the NSA has some employees secretly working for Apple and other large companies to put holes in the OS to give them easy access.
But I find it harder to believe they'd do it this way with the double goto. I admit I had to think about it for a second or two to realize why that was a bad thing - if they ran their code through 'indent' it would make stuff like that more visible, though I expect if someone was surfing through the code and saw that they'd fix it even if they didn't immediately realize it was a problem (let alone have the further realization it was a big security problem)
Hopefully they'll look at all the code modified by whoever did that to see if they inserted other more subtle issues elsewhere...
a) Compiled without maximum error reporting during compile (If this is release code there should be no errors and no warnings)
b)Never eyeballed by either formal or informal code reviews.
c)No scanning of the code base for the sort of stupid cant-possibly-happen coding errors that in fact can and do happen IRL.
I have seen this, incredibly rarely, myself. It's something to do with Subversion; when committing changes to a resource that has non-conflicting changes, a single line can either be omitted, or repeated. I've seen this twice in a decade. This looks like a accidental double-paste; the two I saw did not. It's the sort of thing you don't discover for days or even months, so investigating what actually happened, testing for reproducibility, is ... difficult.
we know that the software industry has collectively spent a lot of time and energy creating and implementing processes to more-or-less bulletproof code creation
Do tell, because I don't think it's as obvious as you do. Even MISRA has suffered some epic failures, and that's one of the tightest standards going.
Sure security is hard, but would someone at Apple care to explain how such a straightforward bug made it through regression testing. Do they have regression testing? Is it complete?
One could equally complain, that if the coding standards at the organisation insisted on braces, the second ctrl-v MAY have caused a compile time syntax error. Its also possible, that the programmer put it in deliberately to test something and forgot to take it out.
Perhaps the malaise is focusing on features rather than writing boring test harnesses for core libraries?
While everyone points the finger at M$ for this, its not exactly unique to them.
Your question assumes that the flaw was added after an earlier version functioned correctly. We don't know this, and the extra line may have been there from the start. Regression testing will only catch errors that were previously integration tested and worked, but not flaws that were never tested for in the past. It seems quite possible that this code just wasn't properly tested from the start before promotion for release. I have quite often seen situations where a developer performs unit tests to make sure things work properly with correct inputs, but neglect to test for incorrect inputs.
Subsequent integration tests will then often also only test that things work as expected when inputs are correct, especially since the tests are often designed by the same developer that originally wrote the code (they shouldn't be but often are.)
IE the sort of stuff you have a search script to scrub your code for?
I think that the basic ideas of how to produce error free (or low risk of error) software are fairly simple.
Enforcing them (and understanding shy you should enforce them) is hard.
I'm not one for conspiracy theories, and I lean toward the "programmer error" explanation, but if I was a baddie knowing how human cognition works and intent on a subtle, easily missed change that would render SSL worthless, I might do EXACTLY this. I've been a professional programmer since the mid 1980's, and even knowing there was something wrong in the code I would have likely needed to step through it with a debugger to spot the flaw as it just didn't jump out at me. The fact that every other line was "goto fail;" made the offending line recede into the background noise. Code reviews are tedious and boring tasks that will usually only catch things that look "wrong", not flaws like this one that trick bored eyes by repeating a valid pattern with an extraneous element.
Yes it is obvious... once it's been pointed out to you with a big arrow.
P.S. to those claiming that this would have been caught if "unreachable code" warnings were turned on in the compiler... how so? "Unreachable code", as the name suggests, is code that can NEVER be executed, such as inside an "if" block where the condition can never be true. In this situation there was no code that could never be executed, just one line that shouldn't have been there. Complier warnings wouldn't have caught this, but correctly written automated unit testing should have.
> to those claiming that this would have been caught if "unreachable code" warnings were turned on in the compiler... how so? [...] In this situation there was no code that could never be executed, just one line that shouldn't have been there.
Look again, everything in bold is unreachable since the second "goto fail" is not conditional on the two lines above it:
if ((...) != 0)
goto fail; // this one's actually conditional
goto fail; // this one isn't, always gets executed
if ((...) != 0) // all these lines in bold are unreachable code
goto fail;
err = sslRawVerify(...);
if(err) {
sslErrorLog(...);
goto fail;
}
fail:
SSLFreeBuffer(&signedHashes);
SSLFreeBuffer(&hashCtx);
return err;
(from http://opensource.apple.com/source/Security/Security-55471/libsecurity_ssl/lib/sslKeyExchange.c, edited)
This bug would make it trivial to perform SSL MitM attacks against Apple devices.
"Trivial" is an overstatement - it's not hard, but it's not as easy as, say, a failure to perform certificate checks at all would be (and we know there's software out there guilty of that). And there are ways for the client to mitigate it (require TLSv2.1, or disable DHE and ECDHE suites), though admittedly they are rarely used.
As the article notes, this would be a pretty blunt backdoor. It's not nearly as sophisticated as Dual-EC-DRBG, for example. That doesn't mean it isn't malicious, but I'd like to hope that the Bad Guys have a bit more style.
That's just a demonstration that:
1) Bad developers are bad developers wherever they are employed
2) Code is never automagically secure - nor one OS is better than another "just because"
3) "Open source" doesn't mean someone will really look at the code - how this one went unnoticed by the legions of open source supporters? Don't you spend your free time reading code to ensure it's properly written and without bugs? Why you don't?
4) Issue like this would be identified not only by static code analysis, but also by tests complemented by test coverage analysis tools - they should spot the code which tests never exercise. I use one for Windows applications - it's not cheap but it's great, and I'm not the Mighty Apple With BIllions in Cayman^H^H^H^Hsh. Don't they buy such tools for their developers? Or they do but their developers are too lazy and spend more time using the iPhones than writing proper tests and checking coverage?