They make good points
But will the lowest cost offshore out-soruced company and the companies that choose them get the memo?
Yeah, I don't think so.
Business and technical leaders should prepare to focus on memory safety in software development, the US Cybersecurity and Infrastructure Agency (CISA) urged on Wednesday. The federal agency, part of the US Department of Homeland Security, published a paper entitled "The Case for Memory Safety Roadmaps," arguing that memory …
> CISA suggests that developers look to C#, Go, Java, Python, Rust, and Swift for memory safe code.
Your code will only be memory safe because you will have implemented nothing and instead spent all your time writing / maintaining generated bindings rather than writing code that does actual stuff...
Besides, these guys have clearly never looked at the code behind the .NET or Java VMs... That memory safe code is floating ontop of a cesspit of rot.
Rather than talking about Rust, perhaps CISA should actually have a play at implementing something with it. They basically are falling into the same category as all the other 14 year-old Rust developers on reddit.
Feel free to point out the 'rot' in C# 12. Your comment is about 20 years out of date. Its like people who advocate writing in raw sockets vs pipelines/channels, or who think they need to write their own threading because... reasons. There are some very smart guys (e.g. David Fowler, see "System.IO.Pipelines: High performance IO in .NET") who spend their days polishing code until it shines. They deserve better than asinine commentardry!
> Feel free to point out the 'rot' in C# 12
Is a 1.8 MB C++ source file containing the .NET Core VM garbage collector alone enough for you?
https://raw.githubusercontent.com/dotnet/runtime/main/src/coreclr/gc/gc.cpp
Can you spot any memory errors in that? You have 24 hours. Go! (During my thesis I found 2. They are there... So much of that code is ancient rot).
Anything built on-top of this is basically on a foundation of sand. That is pretty much any .NET language.
(Note: Careful; opening that monster of a file outside of raw mode might freeze your browser).
I've written C# code that - despite my best and careful efforts - have occasional spasms when the garbage collector kicks in. Admittedly, my programming was a few years ago, but something else that often cropped up in C# was the obligatory nested using { } construct used to dispose of managed resources. Sometimes you have to do far from the obvious to dispose of managed resources correctly beyond wrapping them in using statements, involving temporary assignment of variables before disposal, and similar tricks. I'd hardly call any of that nonsense "safe". I often called it "yuk".
"That memory safe code is floating ontop of a cesspit of rot."
True.
and as I point out in the title, using a GC model memory/object management scheme gives you the boiler plate inefficiency you deserve.
As a general rule, experienced C and C++ coders can bang out reliable code without too much difficulty. Evidence, the Linux and BSD kernels.
No thanks on the GC stuff. Properly written code avoids these problems.
Government agencies are always late to jump on the bandwagon.
Time for the next fanboi-driven language du jour to make its appearance.
Weren't we told that Ada was the be-all and end-all of all programming languages? Happily, I pragmatically stuck with C, assembler, Cobol and Fortran :-)
You're forgetting that no new stuff can ever be any good in the field of IT. If it hadn't at least been considered by 1973 then it's no use.
All the things you mention are you fault because you're either not trying hard enough, paying full attention at all times, or, frankly, just too stupid.
(Okay, sure, you may actually be writing software that people use, who might appreciate you putting in more effort, attention, or simply being cleverer, but what does that matter, it's their fault if they lose their work or are more susceptible to malware, they should've bought software from someone who tried harder).
It's the same with heavy industry really, that peaked around the 1700s, everything since is just 'health and safety bullshit', sure maiming is less frequent, but how will workers learn if they get to keep all of their limbs?
You're forgetting that no new stuff can ever be any good in the field of IT. If it hadn't at least been considered by 1973 then it's no use.
Although I will happily admit that many things in software are getting better, there was work done before 1973 that was very good but got lost because it was "too expensive" and has ended up being painfully reinvented since. Multics, Burroughs memory safe stack machine architecture and RISC load/store architecture are the ones that come to mind immediately. Reading Dijkstra's notes will bring up other lost ideas. And there's whole new steaming piles of nonsense being created in software today as well as the good stuff. Software obeys Sturgeon's Law and it takes time for the filters to refine it down to the good 10%.
...but that was true back then too.
It's also true of every industry... ever.
I also do think all the basic ideas in computing have in general been done before, most things now are just re-packaging of older ideas.
However, the attitude that anything 'new' is by definition wrong, or attempting to improve, what, after all, are the 'tools' of our trade is a bad thing I really do feel is counter productive.
I expect Rust would've been invented in the 70s if they'd had the processing power to write compilers for it. A lot of the advances in tooling and languages are due to more powerful machines. You can now realistically have a compiler enforce things that would've been computationally unfeasible in the 70s.
Also.... hammers still exist... C/C++ will still exist, getting territorial over programming languages doesn't really help. I used to be very much 'IF you can't manage to program in C properly you probably shouldn't be programming' camp, then I made one too many accidental buffer overflow mistakes and I grew up.
To last a pair of shoes to their soles is a highly skilled art, but a mechanical machine will do what will take you half a day in about 30 seconds. It doesn't detract from the skill of the hand finished way, it just gives you another option.
If your tooling allows you to not worry about a whole class of 'problem' then you'll get quicker as a consequence.
"Also.... hammers still exist... C/C++ will still exist, getting territorial over programming languages doesn't really help. I used to be very much 'IF you can't manage to program in C properly you probably shouldn't be programming' camp, then I made one too many accidental buffer overflow mistakes and I grew up."
Requoted in order to highlight a fantastic comment of truth. Why do so many people fight the advancement of things like Rust? Ego. "I don't need no stinkin' compiler help!" is along the same lines as "I don't need ABS, I can to better than that machine!". That latter phrase didn't wear well, did it?
I like turning the power off before I work on light switches. Some people rightfully point out it’s not strictly necessary and takes more time; if you do things right there is no issue working directly on the wiring and if you know what you’re doing then you shouldn’t need the additional process.
I leave those nutters to it, and turn the power off.
On Elliott systems, designed for industrial/military situations that had to deliver high performance on limited resources, the Algol dialect had extensions for address-taking and indirection - what we would now call pointer arithmetic. This implementation was created by Tony Hoare.
Re: "It's the same with heavy industry really, that peaked around the 1700s, everything since is just 'health and safety bullshit', sure maiming is less frequent, but how will workers learn if they get to keep all of their limbs?"
I've always thought this with cars. While cars undoubtedly have advanced in efficiency since the 80s, how much is that efficiency reduced by the fact they were likely built by the same machines cars were in the 80s?
I can basically guarantee there are no 80s era robots assembling cars, and probably no 80s era machinery, because machinery wears out and robotics especially has progressed.
What actually *does* harm efficiency is safety norms leading to lore weight despite stronger materials, and people having a hard-on for SUVs. The SUV trend has eaten over 80% of the efficiency gain brought by technological progress.
Weren't we told that Ada was the be-all and end-all of all programming languages?
Yes we were, and we (those of us who were doing government-funded work at the time) happily and blithely ignored it, primarily because Ada was a warm, steaming heap. Rust is, of course, better, but simply because it is some gov't agency's current shiny-du-jour doesn't make it a be-all-and-end-all.
In fact, for those of us that have bee 'round the block more than once, a gov't stamp of approval is actually a big yellow warning sign...like that in the icon.
Got introduced to Ada at Uni. Being a fan of Pascal style languages (I wrote stuff in Delphi in my own time in those days after progressing from TurboPascal) I liked it. OO plus nice things like custom ranges.
However, as it's use in the real world is mostly limited to the industry of killing people, I didn't take it further.
Why create a safe Systems Programming Language when we should be creating a safe Application Programming Language.
Because systems software underlies everything we do with computers so if it's borked everything is borked, and application languages often eschew the low level features necessary to do things like bringing up memory management and initialising I/O on a freshly booted machine.
Accessing files? You're using the low-level capabilities of the system.
Accessing network? You're using the low-level capabilities of the system.
Accessing memory? You're using the low-level capabilities of the system.
Showing anything on screen? You're using the low-level capabilities of the system.
Hell, starting your application? You're using the low-level capabilities of the system.
Yes, high-level languages and frameworks hides the details, but the kernel beneath touches everything your application do. An unsafe system is an unsafe application.
“ 67 percent of zero-day vulnerabilities in 2021 were memory safety flaws”
There’s your answer. If you fix memory corruption at source you fix 2/3 reported vulnerabilities. Languages like Java and C# appear in far fewer vulnerabilities, so spending resources there gets you less improvement.
Which class of vulnerabilities are you hoping to squash with a new application programming language, how common is it vs memory corruption, and how will it fix it?
Because you can't write a Javascript of Python interpreter in those languages (not without going turtles all the way down)
It's not that you can't, it just that nobody has yet. IIRC Squeak and Pharos (Smalltalk derived systems) are written in themselves, as are several Lisps. Turtles all the way down can be a very elegant solution, once you apply the fixed point combinator.
There have been quite a few successful computers based on Pascal and Algol. Any Turing-complete language can perform self-hosting (compiling itself to binary code). And that is practice, not theory. The Algol mainframes and the C# based OS Singularity were (mostly) memory-safe in the kernel.
Is it possible to design an MMU that doesn't suffer from buffer overflows, type confusion, uninitialized memory and use-after-free bugs ?
“In an MMU, regions are defined as groups of page tables and are controlled completely in software as sequential pages in virtual memory.”
I came up with the same idea a long time ago, but then the MMU would need to support hundreds of thousands of protected memory regions, which I don't think is feasible. Capability Based Computing is, however, doing something similar by tagging memory allocations.
It is a valid choice for any new software aside from one or two areas, which is unsurprising, since it was designed as a general-purpose language. Microsoft has already chosen it over legacy languages; the same with Google, Xwitter, and more.
Rust is not a bandwagon. It never was.
YMMV for specific architectures of course.
I had hoped those links led to a list of supported microcontroller families with downloadable libraries, but no, it was just another webpage with “hey, Company X used Rust in its embedded products!” and a CS textbook about how to put together a language runtime.
I don’t need to be convinced that Rust is better than C, and I do have the ability to write and test bindings but it’s a huge job and nobody’s going to pay me by the day to write a complete HAL and Rust bindings for whatever microcontroller their products happen to use. Until the MCU manufacturers invest in support for the language, Rust may as well not exist on embedded. And if you’ve ever done embedded programming, you’d know that MCU manufacturers barely invest in any kind of software tooling support as it is, let alone bindings for a minority language.
Also embedded customers are conservative - they want to be able to keep using a codebase for well over a decade: writing in anything but C is seen as a risk that they won’t be able to find a developer in future; you and I may know that’s probably an inversion of the truth, but you and I aren’t the ones with the budgets...
Another route might be to introduce Capability Based Operating Systems, such as CHERI to eliminate memory errors.
But the blind spot everyone seems to miss is that C and C++ are essentially Systems Programming Languages which we are using as Application Programming Languages. The lack of memory safety in C and C++ isn't a bug but a feature! C is merely "high level assembler."
What we should do is to create native-code Application Programming Languages which have memory-safety already built-in. Java and C# are obvious candidates weren't it that they need runtimes to function. I therefore propose creating a new Application Programming Language which compiles to native code. We used to have such a language, Pascal, but it fell out of grace when everyone started using C.
In the case of Java (or probably most languages targeted to the JVM), you could just get it be compiled to native code ... e.g., using GraalVM.
The resultant executable is normally a bit slower than running Java on the JVM (because you lose out on some of the JIT optimizations), but it greatly decreases the start up time and memory usage.
The biggest issue with Java is almost everything has to be an object, so it is less efficient in its memory usage, but there are efforts to improve that as well (see project Valhalla).
For some reason this idea hasn't gained much traction. Probably because most programmers view Java as a VM language even though it CAN be compiled to native code. Few bother to take the GraalVM route.
If Microsoft had made the decision to compile C# to native code from the start we wouldn't have this need to migrate the entire world's codebase to Rust.
I propose a new language with a syntax like C# but with "managed pointers" where the user isn't allowed to directly access memory but can still built pointer based structures (like linked lists and such) in a relatively efficient manner.
The big problem was finding an Algol-68 complier to run on your hardware and OS.
Algol 68R was my first programming language(*) and later I ported Algol 68C to Pr1mes. It was just a matter of being in the right place at the right time.
(*) I worked at RRE where it was written and learnt it by being given an existing program that modelled the electronic structure of semiconductors and told to add spin-orbit coupling into the modelling.
My understanding it was only a problem if you wanted to use anything other than an ICL 1900 or 2900 series system… :)
[ Joke adaptation of the saying ”Nobody ever gets fired for buying IBM” saying.]
BTW there is an open source compiler: https://sourceforge.net/projects/algol68/
Not disputing Pascal has been wildly used, however be aware:
“ Pascal, a computer programming language developed about 1970 by Niklaus Wirth of Switzerland to teach structured programming, which emphasizes the orderly use of conditional and loop control structures without GOTO statements.”
Ie. It was intended to be a teaching language, just like Basic …
Perhaps it gives the truth to the idea that you should give children real wood working tools as they will quickly learn the correct way to handle them and not touch the blades… however, when having this discussion we need to step back and ask whether as a first language we should be teaching procedural programming or non-procedural programming as people’s first computing language…
HP MPE was a successful multi-user, transactional OS written in a Pascal variant. Pascal itself is NOT memory safe, but you can of course limit yourself to not using pointers. Turn on bounds checking. Then you have sort-of-memory-safety. Still no support for multithreading, which is a critical feature these days.
Entire corporations ran their business systems on HP MPE and most of them loved it for reliability and security. It was axed because HP was run by people who preferred to be resellers of other companies such as SAP, Oracle and Microsoft. It would still exist(and make money for HP and customers) if it were not for these surrender monkeys.
MPE ran on powerful PA RISC servers with 16 or more CPUs. Thousands of parallel users. A mainframe OS in all but name.
First, PASCAL is a very good language, especially for students. Strong typing, simplicity, good string facilities. I like it a lot for its spirit.
BUT - it is not memory safe. heap memory is managed essentially the same way as C does, with consequentially the same "bug modes" such as "use after free".
You could create a memory-safe Pascal variant, complete with "smart pointers everywhere" and multithreading-aware type system. But that would no longer be Pascal, but something else.
There are lots of situations where a might benefit from a memory safe approach but at the same time there are lots of applications where it is both functional overkill and, worse, its effectively papering over cracks in the system or code design. What you're doing with this is not fixing the underlying problem but turning one sort of problem into another as any detected error condition would throw an exception that needs handling. For many applications its customary to manage faults by throwing up a dialog box with some kind of error message and then terminates the process. There's many, many, situations where this is unacceptable -- you might not have a user interface or even a user, you can't stop or restart the system because its running a machine or its part of a group of machines where the disappearance of one process or unit will cause a cascade of other faults. The only sure cure for this is design -- you have to make sure that any fault that is catastrophic enough to stop the machine only does this as a last resort and that any shutdown is managed. So I figure Rust won't do any harm but it certainly won't be of much help because the underlying system issues still have to be anticipated and prevented.
(Incidentally, I recall that Ada was type safe, memory safe and so on and generally prevented programmers from doing the sort of C shortcuts (such pointer arithmetic) that could get you into trouble. How does Rust stack up against it? Ada had a reputation of being a bit difficult to implement so assuming Rust is offering roughly the same capabilities is it going to have the same pitfalls?)
To disable the "extra" checks when necessary (such as when working with hardware) , allowing system code to be written.
However, C++ also allows memory (and type safe) code to be written, meaning it can be used for strict application development (as well as system development).
There's nothing wrong with Rust, but I hate the way it is being marketed as "better" than C++ - it reminds me of the Ada zealots who tried the same sort of attack on C/C++, and that didn't lead to Ada being widely adopted within the "small" microcontroller sector.
C/C++ can both be used effectively when subsetted (using something like MISRA). There is a claim that Rust doesn't need this, but there are a number of groups already looking to develop safer subsets of Rust (such as SAE's JA1020).
1.) Ada by itself is not fully memory safe. SPARK Ada could be said to be.
2.) There have been Ada programs which were carefully built, but produced an exception when lots of money was at stake. Ariane V, first flight is premier example.
3.) Exhaustive testing according to the V Model will bring you the confidence that your carefully designed and written system is also doing what it is supposed to do. Unit Testing, Software Testing, System Testing, HIL testing, careful operationalisation, massive datalogging/analysis and a bunch of other validation measures used by premier projects such as Jäger 90.
Of course, if you start cheating on testing, all bets are off. Tests must be first and foremost REALISTIC. And somehow "complete" according to the system use cases.
4.) Memory safe languages can detect quite a few "hidden" bugs during 3. That is exactly when you want to learn about them. Not when an airman's life depends on it.
but I still can't quite grasp how a *language* can suddenly become "safe". What makes it's compiler (which in itself has to be trusted) so much better ?
having started my life in bitslice, programming with switches and lamps, I do wonder if this is all a bit emperors new clothes.
Especially if we end up with "we don't need to test it, it's written in Rust"
One of the main issues with perceived compiler "bugs" is that a lot of them are actually down to the code containing instances of undefined-behaviour (very common with C/C++). The compiler assumes that the code that is supplied to it does not contain UB, and it can produce garbage if it does.
Have a look at this. The "missing" return statement results in undefined-behaviour, leading the compiler to generate a call to error() (and nothing else), even though the call is unreachable.
Heh.. your specific example is a genuine bone-fide compiler bug: granted, the net effect on program state of that function is void, but the error() call should have been pruned too. Have you reported it?
That said, I know versions of this bastard well: a function will have conditional “return” statements dotted through it, but no final one. “It works”... because up to now, nobody ever exposed the path that reaches the end of the function; once you do, suddenly all hell breaks loose. Turning all warnings on, and promoting them to errors is tedious, but it’s often the only way to catch shit like this.
Actually, I still do not understand why “exiting a non-void function without returning a value” is not an error. It’s stated as undefined behaviour in every version of C since K&R - in my book, causing undefined behaviour is an error, not something that you can ignore. If you don’t want it to be an error, then suppress it in your crappy code, but then at least a later maintainer will know that they’re dealing with the work of a masochist or misanthrope.
That's the problem, it's not (but most programmers would reasonably think it is).
Omitting the return statement at the end of the function means there is undefined-behaviour, and a single instance of undefined-behaviour within a C/C++ program means it is not possible to predict it's behaviour, and this example shows the sort of strange thing that can happen as a result.
If you disable optimisation (-O0), then "normality" is restored.
Even so, this is not an optimisation bug; the example highlights the impact that the presence of undefined-behaviour can have on the code generator*, which assumes that the code it is processing does not contain undefined-behaviour.
* a lot of code generation is "mathematical spells" which don't work when their underlying assumptions are not valid.
Yes, I did notice that -O0 fixed this (but good god, next-instruction branches!), but I feel that’s a bit of a copout from the compiler writers; the same kind of passive-aggressive bear-trap that GCC used to hide under developers’ feet - it’s disappointing to see that attitude creeping into clang too.
If the behaviour of a function is undefined, then by definition the function is un-optimisable, so locally tagging it as -O0, or making it “optimisable” to { void; }, and anything between are all okay, but I’d argue that allowing the optimiser loose on a graph that is known to be defective is just asking for trouble, so it really highlights my main complaint: that program-code from which the compiler cannot produce a defined behaviour from should be an error by default, not a warning.
(And yes, I believe things like “x = x++;” also fall into this category. That’s an error of thought, but even if the standard specified an outcome for this, I’d make it a warning because either reading is equally sound, logically.)
The issue here is that "undefined" means "undefined by the language standard", some compiler for a weird embedded system can perfectly well chose to additionally define that behaviour (even if it was not specified as implementation defined) in a way that's useful. Although equally there's nothing to stop the compiler writers interpreting "undefined" as "stop with error in this case", except the sheer amount of code that might break. However, this is gradually happening, I occasionally have to compile relatively old stuff, and things that would once have got past gcc increasingly just trigger errors.
I agree that this example is detectable and should be reported - as it is if -Wall is added to the command line.
However, in the general case it is not possible for a compiler to detect all instances of UB within a function (for example, they may be related to link-time issues).
Compiler only "look" for UB if they include specific checks to detect them (as is the case for a missing return). More subtle issues (data-flow related) are generally not detected, though things like address sanitisers are now quite common, so this is changing. The code generators do not have enough information/context to be able to detect UB (it's "game over" by the time the intermediate code gets to them), and simply plough on with the assumption that it is not present.
So, it is not a case that the compiler is optimising a function that contains UB - it is simply has to work with what it is given.
The worst thing with UB is that a lot of it (especially with C++) is virtually impossible to recognise when looking at a piece of code (you need to "remember" the whole code base), which is where static analysis comes in to play (though it can also not detect all instances due to Turing's halting problem).
Actually, your example is flagged by the default warning settings of clang, but we all know people who don’t consider warnings to be a code problem...
I agree that the general case, which often relies on runtime state, can be impossible to detect, but it’s not a decision between detecting all instances (impossible) or none (useless) - it shouldn’t be too much to ask that when the compiler does detect something that produces undefined behaviour, it reports that as an error by default. After all, we all know how much shipping code is compiled with warnings off, just so it can get through automated build-system checks.
And again, for particularly old or ratty code that used to get through GCC (and don’t start me on the shit that GCC used to allow just because it made writing the linux kernel a little easier…) but now does not, locally suppressing errors is the way to go, if only to highlight that the source-code probably has other issues too.
(Incidentally, thank you for an interesting discussion – they’ve become harder to find online lately, even on El Reg)
There are three different languages: C giving no warnings, C with warnings allowed, and C with warnings = errors. The second one is safer than the first, the third is safer than either. Since we are going on about safer languages, using the first or second variant is criminally unsafe.
Actually, I still do not understand why “exiting a non-void function without returning a value” is not an error.
Chances are, it is still undefined because there is a metric buttload of code written that a) has non-standard/proprietary workarounds and/or b) would crash (or not compile) if you actually fixed it. The C and C++ ISO committees are (and always have been, even more so that today) very sensitive about trying to maintain backwards compatibility, and in a classic case of the tail wagging the dog, are also beholden to compiler manufacturers that bitch and wail about the amount of work they would have to do to standardize on a revision for some undefined behavior or another.
I'm looking at you Micros~1 (and gad! I do not like what I see...)
The OP's example generates a warning with every compiler I've used. I therefore question your assertion that compiler vendors are complaining about the cost of detection. More likely is that they, and the smart people on (and supporting) the standards committee take the view that this is a solved problem and if you can't be bothered to read your compiler output then it's not their problem.
Specifically, what's returned from a C function is whatever is in the processor's accumulator. If you're planning to use a value then return it explicitly. "Original" C had some lexical shortcuts that allowed things like implied 'int' types for function but we're talking really, really, old versions -- Ur-C, if you will. Code needs to be maintainable so it needs to be explicit so that someone can maintain it, often years in the future.
I've always had a sneaking suspicion that a lot of coding problems stem from people not being able to type properly. They'll do literally anything rather than throw a piece of code away and rewrite it. My suspicions are reinforced by the stereotype of someone coding on a laiptop -- those keyboards are "for occasional use only" they're they wrong size and shape for anything other than hunt and peck typing.
“Specifically, what's returned from a C function is whatever is in the processor's accumulator.”
The standards don’t actually mandate any specific mechanism for return, but for the sake of interoperability, all compilers for a given architecture will follow the same rule, whatever that happens to be. Almost all compilers I’ve inspected the output of used return-by-register where possible, usually using the lowest-numbered suitable register, but I know some architectures where that is a problem for certain types (“double” on CPUs without FP registers).
I do agree with the tendency to hoard code, though. Sometimes it’s from lack of knowledge about what’s outside (anything that touches a global state is very hard to remove, for fear of destabilising other parts of the system), but I do thing that there’s a mindset in some developers that all changes should be as minimal as possible - often at the expense of the legibility of the resulting code. That might make sense in a hugely complex piece of software like the Linux kernel (where your chances of having a pull accepted are inversely proportional to the size of the diff), or any other large codebase that has been “battle-hardened” rather than formally tested, but on small modules that are functionally tested, re-writing and culling code should not be out of the question.
But C was intended to be used by people who were professionals and so knew what they were doing…
Plus given the platform constraints of the time, there was a natural expectation that programmers and systems programmers specifically, were more capable…
Obviously, with decades of hindsight we know that even experts make mistakes, but additionally, much code is written by “more normal” people (for want of a better expression to cover self taught through to those of us who may have degrees in computing but didn’t major on formal programming, programming calculus etc.) and thus programming aids such as provided by Rust serve a useful purpose.
One way to make C/C++ more of an application language instead of a Systems language is to max the warnings level and to carry out all the initial testing in DEBUG mode as this should catch stupid programmer level errors like "not all paths return a value" or "variable not initialized " or "mismatched variable assignment".
The one thing I have seen so many times is a bad programmer can still create bad code even in a good language. Shit code is still shit code not matter what it is written in.
I once went through a huge application that had no warnings enabled at all, with the help of one colleague fortunately. We enabled one warning after the other, some generated hundreds of warnings, I think we fixed a few thousand altogether. Including some that were daft (if x is unsigned then I should still be allowed to write “if (x >= 0 && x < 100)”), but about 20 were genuine bugs.
(Old person here......)
Compilers do have occasional bugs but -- and this is a really big BUT -- they invariably come from people pushing the envelope of what the compiler can do. They're the really clever sorts that write obscure statements, lean on code optimizers to take care of code that should have been written correctly -- tidily -- in the first place and so on. They go looking for trouble and they find it (and, needless to say, its why their code is late and is often unfinished or doesn't work properly).
I can't say this often, or loud, enough -- no amount of coding will fix a lousy design. Compiler bugs exist -- I found one once -- but they're easy to detect and even easier to work around.
???
Everyone of those code segments is syntactically correct! So it will compile. But it will depend on the compiler on exactly what machine code is created and the hardware on how it runs.
If I saw this during a code review you would get a major bollocking. For the use of magic numbers, variable names that mean nothing. The != in the for loop
It reads like its relying on wraparound of unsigned integers, and setting the value of a transposition table for the ASCII charset, the use of hex is a hint that it should be unsigned - which makes the wrap around behaviour well defined.
I'd probably want a comment but it doesn't seem that bad, provided its in a function with a decent name.
I have had the opposite where I converted from double to int32 and it was expected that overflow would occur, but the upper wrap-round range was the right answer. Worked fine on many compilers but later gcc took to limiting double->int to max values! Doh! So I converted to int64 and then cast to int32 to take the same lower 32 bits as intended.
I did document my code to explain what and why it was being done, but the compiler did not read that.
the use of hex is a hint that it should be unsigned
Yea, but there is no indication as to what those HEX numbers mean... Ok in char 0x20 is space and you should use char(' ') to be clear if space is what you meant. But as an unsigned char what does it mean?
Also what will it mean to the next developer in a couple of years time when they are maintaining this shit code!
I'm not quite following you. At a glance - it does It handles two disjoint ranges without branching by use of unsigned wraparound.
The numbers are the start and stop of a trivial loop. You saw (0x20) Space, and It's skipping things which are not 7bit clean.
I think that's what is being hinted at by explicitly saying the top bit is set 0x80.
0x20 is 32 as an unsigned or signed integer - given the ' ' would be an integer constant, using the hex value is more correct, if you prefer - some people like the decimal using the suffix, e.g. 128u or 32u.
int lut[256];
// use unsigned wraparound for 0--31 .. 128--255
memset( &lut[0], &lut[255], 1);
for( unsigned char x=0x80u; x!= 0x20u; ++x)
lut[ x ] = 0;
vs slightly more portable
// explicitly mask to access the range 0--31 .. 128--255
memset( &lut[0], &lut[255], 1);
for(int x=0; x!=160; ++x)
lut[ (x + 128) & 255u ] = 0;
they invariably come from people pushing the envelope
That was not my experience. Quite often compiler defects arose from a combination of fairly normal language features. The software we were writing was deliberately intended to be very unadventurous and not include anything overly clever language wise - we still found cases where the compiler would generate incorrect target code for perfectly correct source code (and this was a mature compiler).
We also used to review all the known compiler defects to make sure none of them applied to our code - the majority of them were not in obscure parts of the language or triggered by particularly convoluted code. They also tended to be split between optimized and non-optimized compilation modes.
Reading through the known defects list is always enlightening - if not a tad worrying.
They want the world to buy into their shitty paradigm of unreadable code that is "proven safe" but doesn't fucking work.
The rest of us will carry on with Software tooling based on the intended purpose. You can pry C, C++, and asm from my cold dead hands.
Memory Safety is an important tool in the toolbox, but it does not absolve you from executing a test battery from Unit to HIL Tests. Most importantly, it does not absolve you from cheating on test cases.
But that is true with each and every engineering product. If you test a car by opening and closing doors, you still don't know how it acts in a curve, how it acts over a pothole etc etc.
For safe languages, at compile time, the compiler will either warn, or refuse to compile if the code is obviously incorrect or exhibits unexpected non-deterministic behaviour.
For quite a few of these languages, if the algorithm is fairly short/simple, perhaps say 30 lines or so, then there is a good chance that if it compiles then it is also correct.
But when you take an unsafe language like C, the programmer doesn't get those checks (although linters can help). The programmer *might* want to:
- read that uninitialized memory (maybe because they mistakenly believe that will inject some randomness)
- read the entry just before or after the array end.
- convert that pointer to a struct to a pointer to an Integer instead (I can't recall if the C standard mandates the structure layout, or if this is also not deteministic)
- concurrently modify that block of memory that it actually allocated and used by another part of the program (e.g., as a broken effort to signal some completion between threads)
- free that memory, but keep using it/accessing it anyway (perhaps they can just share it with another part of the program).
C gives you great power by very easily allowing you to achieve all of these things. Sometimes this may be want you want to achieve, but probably 99% of the time, this isn't what you want to do, and if you really do want to do that (e.g., memory map some hardware registers) it would be better to explicitly call that out in the code rather than just make it the assumed default behaviour.
I also often see the argument that smart people can be trained to write safe C code if they are good programmers, however there is statistical evidence that this is simply not the case, even the very best programmers make mistakes, and those mistakes can be very costly in the amount of time and effort it take to debug them, and also the cost to the poor person or organization that experiences the error. Errors that a safer language prevents you from ever making.
I've also only mostly listed some of the simple common issues in languages like C, as soon as you start trying to write and debug concurrent code your problems get magnified.
So, given the choice, I strongly prefer writing in a safe language, because more of my time and energy can go into solving the problem, and less time is spent trying to think about array bounds when compilers can reliably do that for you with no effort, and minimal runtime cost.
If you are productive and enjoy working in $language_de_jure - fill your boots.
You've not listed issues, so much as recycled myths about C. If you want to write shitty C code - go ahead, but professionals exist, writing elegant bug free code. They tend to write elegant bug-free code in whichever language they are working in.
C and C++ require a bit more upfront design, and some experience to use well. It's possible to gradually evolve an C codebase into a C++ codebase.
FILE *fp = fopen(...);
..
// can forget to close - leaking fp
fclose(fp);
can become
// declare a default deleter using fclose
namespace std
{
template <>
struct default_delete<FILE>
{
void operator () (FILE* file) const
{
fclose(file);
}
};
}
std::unique_ptr<FILE> fp = std::make_unique<FILE>(fopen(...));
// cannot forget to close, as RAII takes care of this
The usage is the same, as its the same pointer, with memory overhead (unique_ptr is just a plain pointer).
That approach allow you to gradually ditch your code which replicate functionality implemented in the C++ standard library, while not throwing the baby out with the bathwater.
There is also a point that the functional approach of Rust is closer to pure template meta-programming in C++ ( Which people suggest less is more) than the multi-paradigm approach of C++, or the C's pioneer spirit of "You can fit it in a macro, I'll compile it approach" - which is why everybody compiles down to C when prototyping toy languages.
Rust just doesn't have a dog in this fight.
You want something that might actually be worth switching over from C which is not C++ - take a look at DLang.
> the compiler will either warn, or refuse to compile if the code is obviously incorrect or exhibits unexpected non-deterministic behaviour.
No, the compiler MIGHT warn you if the code has memory related bugs.
Or it won't because `unsafe` is a thing, and when presented with the choice between wrestling with the borrow checker, and just circumventing it because "i know what I'm doing", guess what will happen if the language ever got a lot of traction.
And all these warnings, etc. don't protect ANYONE from logic bugs; A program that passes the Rust compiler can just as easily be a piece of crap that breaks at the worst possible moment, as a program written in Go, Java, Python, C# or any other language.
but I still can't quite grasp how a *language* can suddenly become "safe". What makes it's compiler (which in itself has to be trusted) so much better ?
Time.
Rust was designed in the late 2000s, when computers were running at multi gigahertz speeds and had gigabytes of memory. This means you can write much more complex compilers, and knowing this you can specify the language with much tighter restrictions than a language designed for the PDP-11 in the 70s.
Remember in C a prototype exists so the compiler knows what function call to generate without having seen the function. Now it would be trivial to infer this information. However in the 70s this was a sensible choice.
The compiler of my memory safe language is about 10 000 lines of C++ code and takes about one second to compile 10000 loc on an RPI. Memory Safety is not expensive for the compiler.
Also, in the 70s they already had ALGOL, which was in many ways memory-safe.
C "won", because it was "given away for free".
Turning Turbo Pascal 3 in a single-threaded, memory safe language would be not hard either. It runs on an 80286, 8Mhz like a lightning.
The largest, up front problem with C++, is that the language is so large that I doubt anyone, including the gray bearded language lawyers, knows all its nooks and crannies. What hope the implementors are going to get the semantics right for each implementation?
The average programmer (developer... if you must) just is nowhere clever enough to use more than the smallest fraction of the language effectively or even correctly. Basically getting mediocre C code screwed over with poor C++ (now with extra AI sauce.)
I can already see Rust bloating in the same way.
Application programmers probably should not be trusted to manage memory. Automatic memory management (garbage collection) is hard, at least to do efficiently or in the face of multithreading, so what hope your average Joe/Jane Coder getting it right 100% of the time?
Pretty much implies the use of languages like Go, Python & 100s of others which also do bounds checking and type checking etc.
Last time I looked there appeared to be a growing ecosystem of formal verification tools for Rust code which is a glimmer of hope if you can educate enough software engineers to be able to use them. De Morgans laws are about the limit of most programmers. :(
>The average programmer (developer... if you must) just is nowhere clever enough to use more than the smallest fraction of the language effectively or even correctly
This is absolutely true! What a skilled developer should be aware of:
- Apply what you know/master
- Educate yourself on what you do not know/master
- Practice new skills before going "live"
- Be aware: your know-how is your biggest strength, and your biggest weakness
Automatic memory management does not imply garbage collection. It's merely one approach - and generally quite a poor one.
Every GCd language has a whole pile of additional memory management functions purely to work around limitations of GC. When do you need to call "dispose", for example?
Pure RAII is better in almost all cases, the only real exception being that GC allows memory compaction, which can improve average performance in some cases.
This is why Python is RAII and not GCd, for example.
Any large Rust codebase will be peppered with "unsafe" keywords, because there's many things that the core semantics make hard, slow or simply prohibit.
Same as any large C++ codebase is likely to be peppered with "unsafe" keywords.
The argument for Rust is that "unsafe" acts as a reminder to be careful. C++ also has "unsafe" markers, and a variety of existing tools to both warn and suggest "safe" alternatives.
If "unsafe" worked, then it would have done so in C++ (new, void*, reinterpret_cast)
C is another beast entirely. Lumping it in with C++ is highly misleading.
"... garbage collection. It's merely one approach - and generally quite a poor one."
It's just another tool with advantages and disadvantages. Like any tool, it has to used responsibly and with care and planning, not thoughtlessly.
By it's nature GC encourages using and cycling through more memory. That's a cost that must be measured against the benefit greater freedom from detailed memory management.
Whether it is a poor choice depends upon the usage case and CPU/memory resources.
But, the need for care and planning are still there. Common deadly bugs:
- keeping unnecessary and unused references to memory, preventing it from being recycled.
- references to the inside of an object to member that becomes out of sync with with the reference though the object handle.
So a GC program can easily evolve into using blocks of never un-referenced memory, where design is stymied to ensure that no short-cut handles ever point to out-of-sync memory, and the potential advantages are GC are cut off at the knees.
Judicious use of a small number of unsafe code parts is Good Engineering. Ideally, limit the unsafe code to system libraries such as File handling, TCP/IP, GUI and so on.
Much better than 100% unsafe C code. Or "modern" C++ code with some "accidental multithreaded sharing".
Engineering that has "life in hands" is about minimizing risk in an economic fashion. If 99% of your code is memory safe, that is much better than 100% unsafe C or C++.
The largest, up front problem with C++, is that the language is so large that I doubt anyone, including the gray bearded language lawyers, knows all its nooks and crannies.
Fun little test(*): So you think you know C?.
(*) For C not C++. Since C++ templates are Turing Complete I hate to think what an equivalent test for C++ would be like.
...maybe America and friends are currently hit very hard in the crown jewels.
I see reports left and right how foreign actors are hitting NATO government institutions really hard. Some mid-level government entities completely encrypted or their laundry hang out to dry.
Having said that, indeed, one man's security is another man's security risk. That even applies to two men inside NSA.
Some government workers go back to paper for critical stuff(HR security), as far as I can see.
When Rust becomes a "standard" in the sense that ANSI C did then the credibility goes through the roof. For now, I would remain paranoid that code written today probably won't have a particularly long lifetime as feature creep rumbles on.
I do appreciate the ideas in the toolchain but moving to something that isn't backed by standards is a problem for anything other than toy projects.
Most (all?) functional safety standards currently require the use of "a standardized language". Rust is currently undergoing certification by TÜV, but that is not the same and I see this as being a problem.
I can understand not wanting to standardize, as that can make it more difficult for a language to evolve (especially if that evolution is divergent) and "pressure groups" can get undesirable changes into the specification.
However, there are a lot of very successful languages out there that are not standardized (but they are not used when functional safety is a requirement).
This is wholly the point. I can write good software in Rust, but the absence of standards / certification I can never deploy that software anywhere important.
Fully acknowledge all manner of software is deployed in places that they shouldn't be to sub-par standards, but then this is why El Reg gets to write about the failures of those diabetes monitors... Or similar.
Proving a system "safe for life-critical applications" is only to a minor degree dependent on "certified tools".
What you must do is to prove that the system you built with the toolchain is "safe". This is done with processes like V-Model, ASPICE, ISO26262 or DO178. The railway folks have their equivalent standard and the medical folks, too.
Essentially: test the h3ll out of your system. If you have a relevant compiler bug, it will most likely crop up in this process. Of course you will track the issue down to assembly level.
So you can safely use Rust for ABS or flight control, IF your engineering processes are robust and faithfully executed. Today.
Manure. Pascal was written as an educational language to inculcate the yout' of the time in the concept of rigorous block structuring and goto-less programming. (How else can you explain the brain-dead syntactical differences between declaring a constant and declaring a variable?) It was never meant for doing the kinds of things you can do with C; as such, it would not be able to supplant C without having been fiddled to a (more) unrecognizable mess.
It was never meant for doing the kinds of things you can do with C
The big issue with that is just because C allows you to do all the things doesn't mean you have to. I have seen unmaintainable code written in C full of gotos, continues, breaks, returns everywhere in a function and try throw and catches. All legal and all possible.
I'll agree in principal that the Pascal types were never meant for doing the kinds of things you can do with C. That's not necessary a bad thing either...
My point being in part that C was never intended for writing code for levels above an operating system.
Buffer overruns (still a source of security holes) are easy in C, with Pascal you'd have to put some effort in.
"Pascal was written as an educational language"
This persistent myth is based on a logical fallacy. While it was intended to teach good [structured] programming style that didn't mean it was purely for educational purposes.
If I intend to go shopping later, it doesn't logically imply that *all I can do* is shopping.
See also the recent nostalgia article on el Reg about TurboPascal, with which real world programs were written.
Agree just because Pascal and Basic were primarily intended for education doesn’t mean they could not be used eslewhere; just like C, to be used by system programmers, got used for applications, replacing Fortran and COBOL.
TurboPascal only really existed because Pascal was a simple language (although still a vast improvement on GWBasic and 8086 assembler…), writing a Pascal compiler and code generator was second year undergrad course work in 1980. Remember, the PC developer market was more hobbist and people who had done a degree, where Pascal was taught.
We forget that back in the early 1980s, C was still a minority language on the rise and there were many other contenders, C I suggest only really rose to prominence in circa 1985, (when LivingC for PC was released to a receptive market). So Borland’s decision to back Pascal (with the ability to drop into assembler) makes a lot of sense. Interestingly, Borland tried to create Turbo-C in 1985 and failed when they discovered it required a wholly different compiler anpproach; ultimately delivering Turbo C++ in 1990.
I will quote *someone else's* comment from the recent TurboPascal article (https://forums.theregister.com/forum/all/2023/12/04/40_years_of_turbo_pascal/)t:
"doing type-safe checking in the compiler, rather than relying on the developer to do it, resulted in lots of processor time spent on bounds checking and other things that the other language compilers simply didn't do."
"C version, which was doing address arithmetic and never checking array bounds."
"Since time is money, Pascal lost out."
Of course, all the checking that Pascal did which made it slower than the C equivalent made it more robust. Obviously back then, as the commenter says, speed was the primary consideration. However, since the internet, that premise no longer holds.
There'd also be no JIT compilers, thus no bytecode languages like Java and C#, and interpreted languages like Javascript would be far less abusable due to abysmal performance.
Although we've actually moved back to a (modified) Harvard, what with the "execute" flag on pages.
So tell us Mr Feds, how may were charged and sent to prison for all 22 million SD 86's the Chinese made off due to simple criminal incompetence by Fed appointees, employees?
https://en.wikipedia.org/wiki/Office_of_Personnel_Management_data_breach
None?
So yeah. Lets all use an unfinished language with no formal spec, no verifiable compiler, and first developed because some idiot dot com f*ck heads at Mozilla did not have the technical competence to do a simple code refactoring / re-architect on a "mature" codebase. You know, stuff that has been done for many decades with very large spaghetti codebases. It was nt even the worst codebase I've ever seen. Not even close.
Because? Security...
Maybe all these code kiddies with PhD's might ask people who have been shipping this stuff for decades about the memory leak, memory safe verification, memory coverage monitoring etc tools (both software and hardware) that have been available since, well, the 1980's. And that was without the sophisticated JTAG etc direct processor support that we've had since, well, at least the 1990's.
Clueless idiots. Some of use have been writing verifiable memory safe code (for very large shipping codebases ) in C etc since before they were born. Its not that difficult you know. If you actually know what you are doing. Which codekiddies with PhD never do. Because it mostly involves commonsense and keep it simple. All that stuff they dont teach in grad school because you cannot write long pointless academic papers which no one ever reads about it. Because its way to simple.
First used Un*x around 1981 but hated it and have spent my whole professional career writing commercial (shipped) software for every OS but a Un*x derived one. When MacOS X a target platform always use QT. Not wasting time on that NS crap. Which was already getting old in 1993.
And when porting over Un*x originated code it always goes through some kind of rewrite because most of those Un*x folk really dont believe in error checking, asserts, etc You know, all the actual software engineering fiddly stuff. Although there have been one or two honorable exceptions to this rule. Over the last five decades.
Still have my well thumbed copy of the the first edition of The Unix Haters Handbook. A fun read.
So you see, you know so little about the sharp pointy end of the software business that you could work out that my statements were very far from the Land of Un*x. Which is less an OS, more a text adventure game. That you will never win. Or ever even finish.
So you must be one of those kiddie-programmers with one of those useless CompSci PhD's then.
Dunno about that but Rust.org recenly had a very childish spat in their governing board. If the steering comittee of a language can't behave like adults, says a lot about the people using the product, as an old fart I'm certainly not trusting some upstart language made by children who haven't learn to work together properly! Ha ha!
According to the CWE Top 25 Most Dangerous Software Weaknesses published by Mitre there are 15 issues that have been present for every year from 2019 to 2023.
Of these 15 five are memory safety related which could be mitigated or eliminated by using MSLs.
Out of bounds write (no 1 in 2023)
Use after free (4th in 2023)
Out of bounds read (7th in 2023)
NULL pointer dereference (12th in 2023)
Improper Restriction of Operations within Bounds of a Memory Buffer (17th in 2023)
Of course it is possible to write safely in non a MSL - the point is that for far too often it simply doesn’t happen and this has been the case for many years. If we want safer software (and we should) then a move to MSLs is definitely part of the solution.
https://cwe.mitre.org/top25/archive/2023/2023_stubborn_weaknesses.html
My first reaction was to run away from Rust.
It's not that I do not think that the various TLAs are not technically astute, they are, it's that my paranoid mind thinks that if they like it, they have a back door in the software big enough to drive Elon Musk's ego through.
The very interesting story of what happened and why to the DES key length between the first IBM version and the final version is all you need to know about "TLA endorsed" standards.
The upside is that you can make a very lucrative business from selling non "TLA endorsed" standards software to all the other TLA's and .mils. Very lucrative. Once knew people who just minted it by selling comms software whose only marketing line was - Guaranteed Not DES. Remember one USMC sale for a high 5 figure $ sum (in the 1980's) where the whole conversation lasted about 5 minutes and it went along the lines of - Does it run on X? Does it run at Y kb's speed? And you absolutely guarantee it does not use DES? Sale made and $$$$$ check followed soon after.
What was even funnier is it seems that people at Fort Mead were also good customers. And not the "R&D" crew either.
Security advice from a Government agency charged with systems and network security, that can not keep their own systems patched and updated. I refer to the hacks into Government servers through an unpatched Adobe Cold Fusion exploit, that has had a series of patches for this zero-day on the five year old version of this software. Yeah, top notch IT systems security. if the IT departments are run by Mr. Magoo!