Yes, well...
That's the kind of code quality that gives us an OpenSSL with a switch to flick to disable SSLv2 that doesn't actually disable SSLv2.
Hello, world. Welcome back to Line Break, our weekly column of terrible code readers have spotted in the wild – think of it as a group therapy session. Let's skip to the main() course. Beauty or the beast Since SSL/TLS is in the news again, let's start with some weird code spotted by Georgi in OpenSSL and LibreSSL, a fork of …
"Opinion please!"
I know I'm in the minority , but I think its an elegant solution. Ok, it'll throw some newbies but newbies won't be debugging complex encryption code to start with (in any sane project). If its too complex for managers to follow - tough. There's a reason they're managers and not coders. As someone else pointed out however, a few comments in the code wouldn't have hurt.
I know I'm in the minority...
Good, then you know the right answer.
but I think its an elegant solution
Or not.
Ok, it'll throw some newbies
And this is why you don't do it. The most important job source code has is to inform other programmers of your intent.
a few comments in the code wouldn't have hurt
All comments fundamentally say the same thing. 'I know I've not explained myself well, but I don't care enough to do it better - good luck!'
PS I'm in the gotos are bad camp. Does it show?
"And this is why you don't do it. The most important job source code has is to inform other programmers of your intent."
Amen to that. Best thing I've read in a while. Pity upvotes can't be multiplied. You have one anyway.
"PS I'm in the gotos are bad camp. Does it show?"
I've always been taught gotos are fundamentally wrong, and out of 5 years of C programming, I can't recall using a single one.
Functions are the way to organise program flow in C, and if you feel you need to goto, that just means you got your functions flow wrong and need to try again.
Openssl is a typical example of complete kludge code, with no organised program flow. It's just hacked, works by sheer luck, out of try and fail, and certainly not because anyone devised a clear program flow through C code.
"And this is why you don't do it. The most important job source code has is to inform other programmers of your intent."
No, the most important job it has is to solve the problem. Not every problem is "hello world". Some like video codecs or encryption are fundamentally hard and the code will be complex no matter what you do so newbies have no business going anywhere near it.
"All comments fundamentally say the same thing."
Thats probably the dumbest thing I've seen posted on El Reg for some time.
"PS I'm in the gotos are bad camp. Does it show?"
That and the fact that you're all theory and no practice.
It's not elegant. It's appalling. You're welcome to disagree but I offer as exhibit A, the recent round of disastrous bugs in OpenSSL. Bugs emanating from the original designers, by the way, not some poor sap newbie who came along later and tried to maintain it.
First rule of engineering is you are not as clever as you think you are. Follow the principle of least surprise. Use constructs and terminology that others can follow. Be methodical to a fault. The only way to reduce the complexity of a system is to black-box its components, and the only way to do that is to make them predictable. These examples fail on every one of those points.
Oh, and personally I would have "assembly man" put against the wall and shot. Then I would shoot his kids, just to make sure it didn't happen again. But that's just me.
> But that's just me.
Not just you. "Kill it" was my immediate thought reading this exquisitely horrible story.
I had to deal with a guy who did assembler style "optimizations" in Mathematica(!), proving once and for all that one can do assembler in any language.
"First rule of engineering is you are not as clever as you think you are"
Nah, the first rule is that debugging is harder than writing it in the first place so if you write code that is as clever as you think you are then you won't be able to debug it.
The example shown can be handled more simply (and so, more reliably) by using nested if-statements and just accepting greater levels of indentation. Lexical scoping makes it manifestly obvious whether you've paired up creation/deletion steps and it is robust against someone coming along a year or so later and adding code in the middle. There is no excuse for using a goto statement in C. It ought to be removed from the language and any significant body of code that thereafter fails to compile was almost certainly buggy as hell beforehand but you didn't know it because you're in denial.
No, the most important job it has is to solve the problem.
Perhaps I'm not very good, but I've kind of come to accept that, on software complex enough to matter, I never solve the problem. It just works right up to the point that someone finds a bug.
The 'puter will do what it's been told, no worries there. So my concern is will the person debugging that new issue understand what I wanted to do? Because I'm obviously not doing it, if I was there wouldn't be a bug to fix in the first place.
It might be better to say the most necessary job it has is to solve the problem. Like the most necessary job of a car is to move. The most important job is to stop.
@ boltar:
Dammit! did we *agree* on something?
If you're code is *informing other programmers* of anything, it is likely not accomplishing anything. I'll guess that the comments in *that* code amount to
# This glorious communication device was written by MEE!!!!!
"No, the most important job it [the source code] has is to solve the problem"
BZZZZZT!
The most important job the PROGRAMME has is to solve the problem.
The job of the SOURCE CODE is to provide a human-readable account of how the programme was implemented in a form that a machine can turn into an actual programme, but more importantly in a form that a human can read, understand, and make changes to with a clear understanding of the implications of the change.
At the machine level (the running programme) there's little difference between a goto and a subroutine jump (both involve loading a specific value into the Programme Counter and loading some registers/well-defined memory locations with arguments in the latter case), but at the source code level it can make the world of difference between a source file a competent programmer with no former knowledge of the code is able to understand in an hour and one that requires the same programmer to spend a week on unravelling its logic.
"The job of the SOURCE CODE is to provide a human-readable account of how the programme was implemented in a form that a machine can turn into an actual programme, but more importantly in a form that a human can read,"
Do I take it you write all your programs in COBOL then?
No, but I write them with the expectation that at some point I'll have to look at the source code again, possibly a decade or more since I last looked at it, and I'd like to be able to clue myself into what was going on in my head at the time I wrote it.
Why do you think writing readable code is something to sneer at?
All comments fundamentally say the same thing
Not true, at all, I'm afraid. Here are a few examples of the different things comments can do:
- Record the date / project reference a change was made under (useful for auditing)
- Explain the reason for doing something a certain way, for example to meet best practice, or to work around a known bug elsewhere that is beyond your control (useful for maintainability)
- Record what the code actually does on a trivial level, for example, 'increment x' (not useful)
- Declare what a section of code does, on a conceptual level, for example implementing an efficient sort algorithm a la Knuth (useful, but probably should be in the interface, not the implementation,and you have to be careful that later changes won't render the comment obsolete)
- Assist future programmers in avoiding the same pitfalls you just spent half a day programming your way out of.
I'd be happier if the conditional wrapping on the truncated handler exited that block with some kind of an unconditional break or an unconditional jump out of the block, just so it makes it obvious that the conditional wrapping is window dressing. Why isn't it being handled as a function call anyway? This is like some prehistoric implementation of function calling. It's a neat trick, I'll grant you that, but just why?
I would use goto to get out of an if or a loop to the end of a routine to tidy up before returning but never into an if or a loop, that way lies madness.
Instead I would have set a variable holding the error condition before jumping forward to the end of the routine where it can be checked to see what has to be tidied up before doing that and returning.
The if(0) thing works but it's just a little too clever.
Actually my favourite if I need a goto style fatal error handler is setjmp()/longjmp() and friends.
No matter where you are, or how deeply nested, you end up in a high level error handler that can examine the state of the machine, decide what to do, and do it.
Better still...
if(err=setjmp(Env))
DoErrorStuff(err);
...means that ALL your fatal error handling is well away from the code that is doing the real business of the program.
Isn't setjmp/longjmp a primitive form of exceptions? Sorry, exceptions suck, even Lisp's vaunted "conditions". Novel and mysterious ways to write spaghetti code.
In OpenSSL I would just rename "err:" to "end:", change the "if(0){" to "goto end;", and add "//fallthru" comments between goto's to clarify the intended control flow.
Better yet, put the nail in SSLv3's coffin and nuke the whole file. Or nuke the whole damn SSL/TLS protocol.
That if {0} is actually a "goto err;" thus I don't understand why they didn't wrote in that way, it would have been at least clearer to read. Once you start to use gotos because of lack of an exception mechanism in C, use it clearly. The lack of proper comments is appalling too - if you attempt to do somthing "smart", explain it.
Anyway, it looks LibreSSL is just borrowing heavily from OpenSSL and probably just removing some little used stuff - but it's not a clean room rewrite...
"Once you start to use gotos because of lack of an exception mechanism in C, use it clearly. The lack of proper comments is appalling too - if you attempt to do somthing "smart", explain it."
Absolutely agreed. In fact if I had to pick the very worst thing about this code I'd say that the label err: is incorrectly named, everything that happens here seems to me (not a C programmer) about freeing resources. There seem to me to be three exit conditions: (a) success (b) packet length error (c) certificate length error. It looks to me like the first test looks for (b) error, the second for (c) then there is a block between the two snippets that is executed if those tests don't detect their errors.
Now I understand, from your comment and a quick Google, that there is no true exception handling in C, so we sometimes use the goto. So can't it work like this? (go easy on me, I'm not a coder)...
/* trap wrong packet length */
if (CBS_len(&cert_list) < 3) {
SSLerr(SSL_F_SSL3_GET_SERVER_CERTIFICATE, SSL_R_BAD_PACKET_LENGTH);
ssl3_send_alert(s, SSL3_AL_FATAL, SSL_AD_DECODE_ERROR);
goto finalise;
}
/* trap cert length mismatch */
if (!CBS_get_u24_length_prefixed(&cert_list, &cert)) {
SSLerr(SSL_F_SSL3_GET_SERVER_CERTIFICATE, SSL_R_CERT_LENGTH_MISMATCH);
ssl3_send_alert(s, SSL3_AL_FATAL, SSL_AD_DECODE_ERROR);
goto finalise;
}
/* code that gets executed if the exceptions above aren't encountered */
/* free resources */
finalise:
EVP_PKEY_free(pkey);
X509_free(x);
sk_X509_pop_free(sk, X509_free);
return (ret);
I'm not normally a fan of the goto and I might have preferred a nested conditional or maybe setting a variable to contain the current error type (or null if no error) and then branching on that lower down, depending on the prevailing style of the other code. But I understand they might have a place where there is no native exception handling.
Anyway, it looks LibreSSL is just borrowing heavily from a fork of OpenSSL and probably just removing some little used stuff - but it's not a clean room rewrite...
It's always been a fork. A lot of stuff has been removed or rewritten, but one of the reasons for the fork was maintaining API compatibility.
Nevertheless, I find it interesting that this bit of code was kept around.
I'm ambivalent about gotos but I do have an antipathy towards 'cute' code and that sample definitely belongs in the cute camp. This kind of coding indicates structural issues so its time to back out and consider an alternative way to implement the required functionality.
I used to have a pet theory about coding issues and bugs which roughly summarized suggested that the reason why code went pear shaped was because of most programmers' inability to type. That is once they'd put a chunk of code down they'd tinker with it, they'd bend the universe around it, they'd do anything to avoid actually having to rewrite it. (Some of the traditional editors didn't help -- 'vi', I'm looking at you...). These days with the fancy UIs (Eclipse or lookalikes) you still have problems typing in code, the editors seem to be more focused on cut-n-paste, hauling in snippets and throwing 'em into the pot, so once again we may still be sacrificing code design on the altar of poor editing.
"It passed through the entire symbol table once for each entry, found the next entry in alphabetical order, output the cross-reference line, marked that symbol as consumed, and went back to repeat, quitting only when it found no unconsumed symbol. It was using an O(n**2) sort!"
No biggie. I once had to deal with a sort routine along the lines of:
while (!is_sorted()) {
generate_random_permutation() ;
permute_data() ;
}
Of course, it was written in a slightly more complicated way, so it took a while to figure out what it was doing ...
MISRA is a bit more pragmatic. They restrict use of "clever stuff" rather than ban it - if you need to do it, and can prove you understand how to do it, then you can by use of a deviation ;-)
The first version of MISRA C basically said that goto should not be used. The 2012 version allows it to be used, but only when certain restrictions are applied:
1) No back jumps, preventing goto from being used to create loops (use the loop constructs for that);
2) No jumping over initialisers, preventing undefined-bahaviour;
3) Only one goto may be used to exit a loop, reducing complexity;
4) No jumping into blocks (this would catch the OpenSSL example).
There is an Advisory rule saying that goto should not be used. This can be elevated to Required or Mandatory if local policy decides that further restriction or prohibition is required.
This basically acknowledges that there are valid uses of goto (such as error handling), but "incorrect" use can lead to code that is hard to maintain and that is more likely to contain control-flow related defects.
"Goto's are evil, only bone idle spaghetti loving coders use them."
Stop parroting what you were taught in you programming classes and trying thinking for yourself instead. In C gotos are extremely useful for error handling, specifically for jumping out of deeply nested loops or switches instead of having to set a whole host of condition variables to exit status.
There's a reason gotos get used in highly complex C projects such as the linux kernel and its not because the coders are amateurs.
Also you should remember that in most assemblers gotos (jumps) are all you've got for doing flow control.
> Also you should remember that in most assemblers gotos (jumps) are all you've got for doing flow control.
Actually... IF and WHILE macros are commonly used in assembly; they help a lot. Still, I fully agree with your point that gotos are very useful in low-level C/asm code.
"There's a reason gotos get used in highly complex C projects such as the linux kernel and its not because the coders are amateurs."
Well actually that's a whole other can of worms you've got there, because the choice of C as the implementation language is highly questionable. For pretty much the entire lifetime of Linux as a project, a C++ compiler has been able to generate equivalent code with "no overhead for the stuff you don't use" and using RAII would eliminate most uses of goto, reduce the number of lines of code and be a darn sight more reliable when extra code gets added a year or two later by a different programmer.
But no. Instead we get a tired old list of excuses about how someone once heard that their friend had compiled a 10-line program (which on inspection turns out to be a carefully crafted straw man of no possible utility in the real world) and been rewarded with a 100KB executable, or some random wibbling about how floating point or exceptions aren't allowed in the kernel, apparently oblivious to the fact that you if you don't use them then any half-decent implementation will not link in the supporting library code (as has been the case for 20 years or more), or some other wibbling about the complexity of a C++ compiler which ignores the fact that GCC is a living demonstration that only the front-end is complex and that part is shared across every target platform. (Seriously, why in the name of fuck do embedded chip vendors write their own IDE and C compiler rather than just contribute a back-end to the GCC project? It makes their project more buggy and less flexible. Always. Predictably so.)
The Linux kernel devs clearly know their shit, but they are as prone to programming prejudice as the rest of us. They do what they are comfortable with and in the short term that does lead to better code. In the long term, it leads the whole project down a one-way street and at some point everyone will wish that "they" had bitten the bullet many years ago and used a more modern language.
The problem with your entire argument is that it's based on a false premise: that no one tried to write linux in C++.
It was tried, back in the time when I had a 386 PC and an entire distro fit on 2 floppies. Maybe it was before the time when "a C++ compiler has been able to generate equivalent code with "no overhead for the stuff you don't use" " But the result was a significantly bigger kernel which ran much slower.
I'm a heavy C++ user but I also have an embedded systems background.
1) Bootstrap - starting a kernel C library is pretty easy, handful of string/memory functions, and a few assembler primitives e.g. setjmp/longjmp and syscall interface. Getting C++ ready to rock is a bit more involved with the startup and static initializers.
2) Debug
The reason they still use C is the assembler generated from the more effective uses of C++ is harder to map back to the source. A simple subset of C++ offers relatively little over c99 with relatively harder assembler to grok.
3) Linus doesn't like C++
I understand where the terror of seeing a GoTo comes from. Way back in the dark ages when I dabbled in some programming, BASIC was the language of choice and Line Numbers were the only way to go. Pick the wrong combination of starting numbers and line increments and pair it with bad planning and you could be in a world of hurt. You'd either user colons to concatenate commands, run out of line numbers, or use a GoTo to jump to a new section of code. In point of fact, my first class taught us to write our program from 10 with increments of 10, use GoTo (or GoSubs when they became available) and put the functions in the 10,000 range with a jump of 5,000 for the next function. Done badly, it IS impossible to debug. Hence the terror.
But I agree the terror is now overdone. No one uses line numbers anymore, or at least line numbers in the limiting way they were used back then. So the really ugly problems that use to exist back then have gone away.
"Also you should remember that in most assemblers gotos (jumps) are all you've got for doing flow control."
Actually, every processor I have ever worked with supports subroutines. The 6502 has JSR (jump to subroutine) and RTS (return from subroutine). Z80 based processors also have subroutines (CALL/RET), and I would assume their X86 successors do too.
I found out about how they work the hard way. I had a game which would randomly lock up the whole system. It took me awhile to find out why: I was using subroutines, but some of my collision/error handling included JMP statements that exited the routine. Of course after awhile, the stack overflowed and the machine melted. Took me awhile to figure out that little nasty bugger.
Unfortunately sometimes clever code is precisely what is the most maintainable and easy to read.
If your compiler allows you to write it.
Years ago I had a situation where I wanted to a call one of three subroutines depending on a variable whose value was 0, 1 or 2. 6809 processor.
In assembler, the job was simple. Shift the value left one bit to turn the 8 bit number into a 16 bit offset, add to the base memory address of the call table in which three addresses of three subroutines are stored, and call the address so pointed to.
The C compiler didn't recognise an array of pointers to functions..
After a wasted day it became if then else if then else...
gotos are not evil per se, are surely evil the way they were used in the early version of BASIC, when you had no other way to invoke different parts of code. In plain C, they could be useful if properly used. Some instructions like break/continue are actually very much alike gotos (they are unconditional jumps to a given location, albeit implicit), and in some ways exception handlers are also (although with a far better semantic and stack-unwinding capabilities). IMHO in C using goto *inside* a function to jump to error handling code is acceptable - if done properly. Other uses are usually just crazy.
Just don't mix semantic - if {0} is just a "smart" trick for another goto, just with an implicit meaning instead of an explicit one.
Otherwise you could make like an ex (luckily) colleague of mine, that resolved the issue of error handling just calling exit(1) every time he encountered an error... in a multithreaded application...
"Goto's are evil, only bone idle spaghetti loving coders use them."
That's a little too black-and-white. tbh, GOTOs are a tool, and refusing to use them outright because they shouldn't be used for everything is like saying you should never use a bucket for anything because it's not suitable for frying bacon in. But sometimes I'm not making bacon, and a bucket is actually quite useful.
I don't see anything particularly wrong with the sparing use of GOTO - if the whole structure of your code relies on them then yeah, it's a horrific nightmare of spaghetti pain (I still wake up screaming after dreams of writing programs in BASIC back in the '80s), but legibility problems only really begin to creep in when you have GOTOs within GOTOs - if you're using GOTO to reach a block which just terminates the program then it's not hard to follow. If you end up following a string of GOTOs (goto 8, do x and goto 24, do y and goto 4, do z and goto 347) then that's when it becomes horrible code.
not strictly true
goto amounts to jump which amounts to "write to the program counter"
comefrom could be implemented at least for toy purposes by misusing setjmp/longjmp or having the equivalent assembler to dump the program counter to some other location (longjmp unmodified will not work out of the box)
Actualy there isn't : a cursory glance at the source file linked to will be enough to disprove this.
Highlights include functions hundreds of lines long, a typically awful attitude to variable and function naming and a scarily random attitude to comments in code. This cheesy flow of control logic illustrated in the code snippet is far from "cute". It could have been far neater by splitting the code chunks marked by labels into seperate functions (as is usually the case when gotos start appearing in C code).
Of course implementing SSL is not trivial or without complexity - but as much clarity as possible is always the smart option, adding unnecessary levels of complexity to an already complex problem is always stupid.
It doesn't have to be cute, it has to be readable and maintainable. If I see if(0), me and 99% of everyone else are going to assume it's not getting executed. We'd be wrong, sure, but are we here to lay clever traps for each other in some kind of free-for-all intellectual superiority deathmatch, or are we here to get a job done?
In the same vein, I don't understand people who think that the shortest code is the most elegant, and write stuff that can only be understood if you know every obscure operator precedence rule and can mentally apply it to a dozen operands at the same time. What exactly is the beauty in that? Do you think you'll run out of brackets?
Code like this is what's often referred to as job-security code. If you want to make yourself indispensable to your employer you can do it by either being so good at what you do that letting you go would be a horrible mistake, or by writing code that's so befuddled and obscure nobody else could ever possibly understand it. The latter is a lot easier to do than the former.
regarding "if(0)", the similar "while (true)" is acceptable at a pinch because that's shorthand for "There's code here that should run in an endless loop but which can exit for any number of reasons, not just one single one I can express in a conditional while", though it's probably still best avoided if possible because it's too easy to forget to trigger an event that will actually exit the loop. "if (0)", on the other hand, is purely job-security. Most programmers would fail to notice that the code in the condition is not dead code and quite a few would even mentally block out the if (0) because it's so out of synch with most programmer's expectations it can fail to even register.
I've had interviews for contracts in a couple of places that didn't actually use any source control systems (I never actually took the work out of fear for my sanity; they were never looking to do things properly, just wanting someone to add in a couple of more features ASAP). Usually engineering firms where it started out as a simple bit of code that the (non-software) engineer wrote and it just grew and grew. Not quite as bad as this, but going down the same road.
"Really? If your code is valuable and no-one else can compile it you have a serious barrier to redundancy."
That's true. Unfortunately, this guy can't compile it either. He's compiled it once and can never do so again without re-writing the whole damn thing. This is not an anti-redundancy measure, this is just insanity.
If I had a password only I knew on my large collection of immaculately-versioned libraries, then I have a serious barrier to redundancy. If my boss comes to me 3 months after I've published the program and says 'hey, we found a bug here, could you fix it and re-release' and I say 'um, no, the library file for that does not exist anymore', then I have a serious barrier to ever being hired anywhere ever again.
"... I honestly cannot figure out why anyone would do this."
Have you never had an email from someone where they replied to the last irrelevant email they got from you instead of typing in your email from fresh?
Have you ever checked the document properties of a report or presentation and found they relate to a completely different document, author and company because someone just grabbed the nearest thing and recycled it?
"Have you never had an email from someone where they replied to the last irrelevant email they got from you instead of typing in your email from fresh?"
The analogy doesn't really work. This is more like someone sent one email in 1986 and has just been forwarding that email every time he needs to send a new one to someone, only with every single detail changed, for the last 30 years. And has been deleting the old version each time and then forwarding the new copy.
If his reasoning was along these same lines, I'd have expected to see his 'library' containing every function relevant to every program he'd ever written, rather than being endlessly stripped out and rebuilt. t's like no-one explained how to reference a library to him, and so he's been re-using that one file rather than learning how to reference in another... but if that were the case, I don't see how he was ale to make anything even functional tbh.
This doesn't make sense from the point of view of laziness, or for job insurance, or really from any point of view at all. It's frankly just deranged.
... I honestly cannot figure out why anyone would do this.
It seems pretty obvious to me. For the guy each project was completely separate from the rest so he felt justified in starting from scratch but with the same approach and files as before. By repurposing the library he was able to write his application code the way he liked to.
I don't agree with this approach but I can understand it and I know that I've done similar things in the past, though usually copying something like utils from A to B. The real problem seems more systemic: why no VCS? And why were other developers charged with recompiling his stuff?
" The real problem seems more systemic: why no VCS? And why were other developers charged with recompiling his stuff?"
I don't think we can really assume anything with his approach tbh. He's writing in C using a methodology which is more suitable for bare-metal coding. He's apparently never considered the idea of version control, or using backups. This sounds like self-taught, but self-taught to an insane degree - like the guy has never heard of other programmers or read a style guide. I wouldn't be surprised if I heard he was writing the whole thing to work on a 9-bit system he's built himself out of tin cans and broken lightbulbs.
Tom, here, original author of the story on "Assembly man".
The guys was:
1- very very sociopath, so would never talk to anyone, would even work alone in a locked office. Mad
2- totally self taught, indeed. And indeed to worrying levels
His background was serviceman, not software dev. He was probably recycled here.
The story started when some minor tweaks to his programs were required, and probably noone was able to understand why he would take so long to do any of them.
Then I was sent to investigate.
As someone else has said, the guy, indeed was using static libs without knowing their purpose, therefore wrongly. It probably never occured to him programs sometimes have to be compiled again, due to modifications or platform change.
That was my conclusion when he exaplained its usage of libs. I remember this moment, locked in his office with him, staring violently at the ground, in a ghastly silence :-)
If using such if (0) { tricks could at least make it more readable
e.g. define a macro named
START_MAKE_CODE_INACCESSIBLE_EXCEPT_BY_DIRECT_NAMED_GOTO
that macro would then hide "if(0) {" in something more explanatory
And another macro
END_MAKE_CODE_INACCESSIBLE_EXCEPT_BY_DIRECT_NAMED_GOTO
That would just be closing brace "}"
Still horrible code, but by using those macros, easier for casual inspection to give folk a clue, proving you can polish a t*** (albeit only slightly)
Back in the day, macros were used a lot for readability and often hid real nasties such as that when macros actually inspected!
Preprocessor macros are probably the second best way to shoot yourself in the foot after gotos (if not the best way).
They can be genuinely useful when implemented properly, but I've seen far too many cases where it went pear shaped due to macro expansion issues (hint: braces and/or brackets), or unintended macro redefinition issues somewhere in the include path - especially when cross-compiling stuff.
So: use them wisely and sparingly :-).
Macros can be much more evil than GOTO and only exist because C was partly conceived as a machine independent Macro assembler to make porting UNIX simple and easy.
Numerous ways to shoot off your legs with a Macro.
Learning C++, we were told they shouldn't be used at all, only part of the evil backward compatibility. I learnt C++ in 1988. Why after nearly 30 years are we still using C?
"Why after nearly 30 years are we still using C?"
You answered your own question - its a macro assembler and for low level to the metal coding you want explicit code. What you don't want is a load of implicit actions and hidden memory allocs/deallocs going on behind the scenes which is what you get with sufficiently complex C++ when you may be severely restricted in the machine facilities you have available and also want to know exactly what will happen at runtime.
Also some people simply like explicit code and will put up with the pain of re-inventing the wheel occasionally in C if it means they don't have to use C++. I'm not one of them - just saying.
I understand why massive projects like Linux will not move to C++ (as long as Linus is around, at least).
I think the more important question is - why is commonly-used, security-critical software like OpenSSL written in C? Why in the world would you want to be in a situation where that goto code starts looking appealing due to the limitations of the language? Or have to always loop manually over the elements of some array or data structure (which you had to roll on your own)?
Why would the only option to ugly C be C++ that uses all the complexities and gotchas of the language, when for a long time now you can write C++ in a language subset that is much more clear and simple, and completely avoids several of the most popular types of buffer overflow/pointer math/off-by-one issues of the equivalent C code. The horrific "implicit actions" and memory allocations need to be understood and controlled, true - but any security-critical, commonly-used software surely needs to have a very strict coding paradigm anyway, and the contributors need to have a high level of expertise.
Or if you are not limited to C-like languages, pretty much any reasonably-popular language would be a better choice than C...
hmmm this brings back memories of when I was learning C, the Pascal programmers were all using macros
#define PROCEDURE void
#define FUNCTION int
so their functions could look like Pascal
If you don't know what if (0) {} does, either a) don't use C, b) don't use macro's to hide that ignorance.
If you want to look at abuses of C that you should understand, lookup Duff's device.
The C language is small, goto's and labels are part of the language, get over it.
I don't know about that first one, it has a twist of genius to it but at the same time is an abomination that should be struck down. The second one is so bad as to be almost unbelievable but I've worked with programmers that were that bad so I'll certainly give it the benefit of the doubt.
The thing that gets me about the SSL code is that the two paths (bad packet length and cert length mismatch) have their meat in different parts. truncated
could easily be in the earlier if
block (like the other one is), and then simply goto f_err;
(like the other one does).
And (assuming a1
is zero when there's no errors), you could write the final if(0)
as if(!a1)
for better readability.
It didn't look particularly like Z80 or 68000. It screamed 6502 to me.
I regret that I do not have to source code to the most horrible bug I have ever found. The project involved using DSPs to send position data regularly by radio (pre-GSM), but with each radio transmitting at a different time. I had time from GPS, so sync was easy. Apart from a little assembly language for the signal processing, the bulk of the code was in C. The kit worked fine in the office. Hours of testing with no problems at all. On the next day, I took it to the customer for a demonstration, and the kit sat there doing nothing.
After some frantic bug hunting I discovered the fault was in the C library for the DSP. The 32-bit add function was made out of a pair of _signed_ 16 bit shifted add instructions. The DSP was massively optimised for 16 bit variables (and fixed point arithmetic). CHAR_MAX was 32767 (sizeof(char)==1 by definition in C). As a result, almost all of my C code used 16-bit (int) variables and long int was only ever used for the time. During testing, bit 15 was zero, but the demonstration was at a time when bit 15 was 1.
The easy bit was fixing the code during the customer's lunch break. The difficult bit was explaining what I had fixed.
The trouble with C is that it is little more than a glorified assembler. The language lacks a significant number of structures that we take for granted in more modern languages, structures that make life easier when building more complex code.
But you can convince it to do things that are useful. Often with gruesome distorted bits of code that at first sight make little sense. The problem isn't that you do this, it is if these bits of code are not rigorously codified and documented, and wrapped up in a clear and maintainable manner. Macros can be one way of doing do (with a lot of care). But clear and enforced coding rules for their use is the key. There should never be a time when anyone looking at the code is in any doubt about what is being done and why. If the idiom is useful, I would expect it to be used everywhere as the standard mechanism for such condition handling. Personal idiosyncratic use of such features is the problem, and speaks of deep problems in the project.
The failure in the SSL code is not the use of the code, but rather that it is blandly written with no apparent consistency with the rest of the source. Just a one time use of an idiom. That suggests almost zero care, and is deeply worrying as a pointer to the quality of the rest of the code.
"The failure in the SSL code is not the use of the code, but rather that it is blandly written with no apparent consistency with the rest of the source. Just a one time use of an idiom. That suggests almost zero care, and is deeply worrying as a pointer to the quality of the rest of the code."
I don't think this is the only problem with openssl.
As has been pointed out in http://www.openbsd.org/papers/bsdcan14-libressl/, by Bob Beck, a lot has to do with:
- every single function of the API is public. Crazy design if you want my opinion.
- the API assumes the OS provides nothing, no randomness, no calloc(), nothing, then goes on doing everything in a terrible way, all of this to be compatible with platforms long gone and forgotten
"all of this to be compatible with platforms long gone and forgotten"
It didn't even get this bit right. If a function is missing on your platform, your response should be to provide *exactly* the function that is missing and *only* use your implementation on that platform, so that everyone else is allowed to enjoy bug fixes as and when their platform provides them.
"Personal idiosyncratic use of such features is the problem"
As long as it can be understood why is it a problem?
"and speaks of deep problems in the project."
Oh rubbish. It simply speaks of different coders having different approaches. There's nothing worse than a project thats micromanaged down the tiniest detail of the way the code is written but some anally retentive manager who thinks he can do the job better than the people actually doing it.
> As long as it can be understood why is it a problem?
Because some other bloke probably *won't* understand it, having never seen it before.
It took me a bit of time to wrap my head around the "if (0)" trick, and made me remember asshole coders who loved doing things like that just to show off the fact they were a smartass with a keyboard.
"Personal idiosyncratic use of such features is the problem"
As long as it can be understood why is it a problem?
It's in the definition of "idiosyncratic" – similar only to itself.
For example, I could always redefine my booleans. Would you like to work with my code?
"As long as it can be understood why is it a problem?"
And this, ladies and gentlemen, is everything wrong with the software writing profession summed up in a single sentence. It's almost beautiful in its elegance.
YOU might be able to understand it NOW, but what about in 10 years time? And what about the poor sap who inherits your codebase after you've departed your current employment?
Architects wouldn't brag about how esoteric their blueprints are, in fact an esoteric blueprint is a hallmark of utter failure. Obfsucated source code should be treated with the same contempt.
My favourite example. Came as example code with a microcontroller, IIRC. Needs to be viewed in a monospace font to truly appreciate it.
unsigned char i2c_rd(void) // read an 8b streaming
{ unsigned char bit_count = 0 ; // bit counter the 8b streaming
SDA=1 ; P1M2=0x05 ; // prepare SDA as input (=1)
while(bit_count<8)
{ eep_buf=eep_buf<<1 ; // shift left 1b eeprom data buffer
dly_usec(4) ; SCL=1 ; // rise-up SCL
shift0=SDA ; dly_usec(4) ; // read bit_n from eeprom
SCL=0 ; dly_usec(2) ; // pulse SCL
bit_count++ ; dly_usec(2) ; } // increment bit counter(repeat for 8b)
P1M2=0x0d ; return eep_buf ; // SDA open drain(return data buf)
Needs to be viewed in a monospace font to truly appreciate it.
Wrap in a code (pre for double linespacing) block to do that.
unsigned char i2c_rd(void) // read an 8b streaming
{ unsigned char bit_count = 0 ; // bit counter the 8b streaming
SDA=1 ; P1M2=0x05 ; // prepare SDA as input (=1)
while(bit_count<8)
{ eep_buf=eep_buf<<1 ; // shift left 1b eeprom data buffer
dly_usec(4) ; SCL=1 ; // rise-up SCL
shift0=SDA ; dly_usec(4) ; // read bit_n from eeprom
SCL=0 ; dly_usec(2) ; // pulse SCL
bit_count++ ; dly_usec(2) ; } // increment bit counter(repeat for 8b)
P1M2=0x0d ; return eep_buf ; // SDA open drain(return data buf)
Many, many years ago I had to assist a colleague who had a sort problem. This is in the early days of relational databases on a computer with relatively limited capability. Unfortunately I can't remember all the details (it was about 30 years ago) but he essentially retrieved each record from the database searched for individually to avoid sorting and taking into account the database performance and his program's internal algorithms the execution was order N^4. He requested my assistance because his program worked fine when tested on a 10 record example (although he really should have twigged that taking 2 seconds was rather too long) but when tested on real world data (only 120 records) it just "looped". He did go an fix it when I pointed out that based on his algorithms expected completion time was about 12 hours and it was working just fine. I wish I could remember all the details.
The project was split amongst two people -- me and the other bloke. I wrote the i/f layer and he wrote the stuff underneath. We first shipped with dummy innards to test the (embadded) system. When the time came to include, I couldn't find his code in the SCM. So he emailed me the code and it wouldn't compile. Turns out that he mucked about with his only copy.
Writing extensions for ERP software - it's American so it has data object fields "CostCenter".
Most other extensions also use the spelling "CostCenter" except the one I worked on this week... Two days later I found out I was trying to call "CostCenter" when I should have been looking for "CostCentre".
Whether that if (0) construct is a good idea or not is irrelevant. The fact you have to talk to C experts, and get different opinions, about whether it is a good idea means it should never have been used in this code. In the code for something like 'less' or 'ls', sure I'm on board with that, knock yourself out.
For security critical code you want maximum clarity and you don't use anything that has a chance of confusing other C coders, even if it means more LOC or slightly reduced performance.
The biggest problem I see with the OpenSSL code is that it leaves you at the mercy of your compiler/optimizer. You have to trust that the optimizer will properly traverse all possible code paths and not strip out the entire if (0) block as unused/unreachable code. It may work fine for whichever compiler and optimization settings the developers used, but there's no guarantee it will work for everyone else.
C is a language defined loosely by a set of rules, any number of which may be broken in any statement.
I remember acquiring ownership of some 8051 C code. We were just about out of memory and even with optimisation on, the compiler wasn't being very smart. As such, I was playing games with the compiler to try to get the smallest code by rewriting bits of it. I left it all in the comments to blow the mind of whoever came next.
// amazing what generates the smallest code, isn't it.
// if (cksum != (rx_end[msgid][1] + (rx_end[msgid][0] << 8)))
// if (((cksum & 0xff) != rx_end[msgid][1]) || ((cksum >> 8) != rx_end[msgid][0]))
// if (((cksum >> 8) + (cksum << 8)) != (*(int *)rx_end[msgid]))
// if ((UPPER_BYTE(cksum) + (LOWER_BYTE(cksum) << 8)) != (*(int *)rx_end[msgid]))
ptr = rx_end[msgid];
if ((*ptr != UPPER_BYTE(cksum)) || (*(ptr + 1) != LOWER_BYTE(cksum)))
It's only a few bytes difference but that was enough to help. Looking at the resulting assembly code, it was all about whether the compiler reloaded DPTR (16-bit load on an 8-bit processor) or was smart enough to just increment it as it moved between the low byte and the high byte.
for completeness, the macros are
#define UPPER_BYTE(a) ((BYTE) (((WORD)(a)) >> 8))
#define LOWER_BYTE(a) ((BYTE) ( (WORD)(a)))
Sometimes the rules go out of the window due to other constraints and the best you can do is write War and Peace in the comments on why the fugly mess is the way it is.
Well at least that's better than the random sort. This one is known to finish but in an unknown time. It was submitted on a test where the question was: provide an example of a sorting algorithm that is know to finish. Since it didn't set any time requirements one student provided the following solution:
Check list to see if it is in sort order.
Select a random item in the list. Check it against the next item in the list. If they aren't in order flip them.
Return to checking to see if in order.
... its the way coding is/was/might/will be commissioned?
The coder merely did what the contract required using a library that was context sensitive yet taken by others improperly to be library as defined in the big/bigger/BIG picture rather than the coders tiny snippet perspective/context?
Just wondering - that's all
EDIT ps: what's wrong with nested nesting iterations on a locally context sensitive library huh-huh-huh?
As far as the fine fellow writing Register based code, I recall that that was how we wrote Easycoder code (in 1970), which was the assembler for the Honeywell Series 200 computers, and which was a version of IBM 7050(?) autocoder.
As for having problems with GoTos in Basic, Digitals Basic-Plus certainly had GoSub and FuNction calls which allowed a very clean programming style.
It's fugly.
Really really fugly.
It's also why OpenSSL is starting to get a well-deserved reputation as swiss-cheese. God save us from programmers who think they're being clever with crap like this. It's error-prone, it's difficult to read and therefore understand and therefore debug, and you'll waste hours hunting for bugs that could have been so much easier to find if only you had thought about the poor bastard who would be maintaining this mess in 10 years time instead of the size of your coder-peen.
Don't use GOTO unless there is literally no other option. AND DON'T TRY TO BE CLEVER! EVER! You'll only end up shooting yourself in the foot in the long term.
Wow, all this angsting about simple gotos. I would mention languages with label variables and the good old computed goto from Fortran, but the first commercial project I ever worked on used an interpreted Basic variant with a language feature that even I thought was a step too far:
return n ( where n could be a numeric value or variable )
Whats special about that you're thinking, it simply returns the value n to the caller.
No no no.
What it actually did was to return control to the (numbered) Basic line that was n lines before or after the line containing the call to the subroutine terminated by this return statement.
Oh, and variable names had a maximum length of 2 characters and comments were discouraged since they were held in the (small amount of) memory along with the program code & data.
Don't get me started on overlaying data in Coral...
"I say nuke it from orbit."
Well, that's not so simple, even if World+Dog would agree it is desirable.
Basically, most of openssll's insanity (every single function is public) can have stuck to everything using it, so changing this with a clean API can and certainly will break a lot of thing.
It's well explained here: http://www.openbsd.org/papers/bsdcan14-libressl/ as the reason why the folks at openbsd decided on a fork.
...<don't answer that>... but it seems to me that anything that triggers either the call to "truncated" or to "f_err" will essentially execute similar code *except* for one constant being passed to SSLerr - SSL_R_CERT_LENGTH_MISMATCH vs SSL_R_BAD_PACKET_LENGTH.
Wouldn't the code have been more readable if a variable had been set to the appropriate constant and *then* a standard chunk of code called which passed the variable to SSLerr?