This is good and big but ....
am I the only person to appreciate this?
I have not studied rust but with this, if I wasn't going to retire soon, would get me to learn it. Unlike Infinidash.
Ah well.
The Rust for Linux project, sponsored by Google, has advanced with use of a beta Rust compiler (as opposed to a nightly build), testing ARM and RISC-V architecture support, new Rust abstractions, and more. A new set of patches submitted to the Linux kernel mailing list summarises the progress of the project to enable Rust to …
I have retired so I won't be bothering.
In 1991 I read The AWK Programming Language. I retired earlier this year and that book alone has got me through so many difficult times. While I have learned other languages none have come close to awk.
Verity may wish to have words about that ! .... W.S.Gossett
Whenever one has compilations that invoke live testing ACTive runtimes returning <=> there be thrones aplenty turned irrevocably to Rust for SkunkWorX, REST and Greater IntelAIgent Games Play.
Methinks that has C** realised as being more deeply embedded and fused into the foundations concrete of the Linux kernel than would enable them to be viably imagined and declared a pair of separate and separable entities.
Verity may wish to have words about that too for it surely calls upon her expert teasing expertise ‽ .
C at its level kind of can't be replaced.
C already makes a sacrifice from ASM in the manner that everything is a "call" (jump). C++/Rust/OOP further that sacrifice by the concept of a v_table (jump, jump). You can make even more sacrifices, which is where "scripting" languages come in (jump, jump, jump, jump...). In the end, it really is all about wall clock as that directly equates to human life spent (sounds funny, but think about it). If you have to make a sacrifice for usability, make it wisely.
FWIW, there has been attempts to replace C at its abstraction level, but the wide spread understanding of C has made them all but a memory. I'm not sold on the "security" of _ANY_ language as it seems today that word is just flaunted around to increase/decrease the popularity of things (which in today's world seems more financially motivated than for security, usability, Et Al.).
There have been quite a few non-C based operating systems, some of which are arguably safer than Unixoids/Windows, because they use bounds checking inside the kernel.
Here is a small list of them:
https://en.wikipedia.org/wiki/Burroughs_large_systems
https://en.m.wikipedia.org/wiki/ICL_2900_Series
https://en.wikipedia.org/wiki/Singularity_(operating_system)
https://en.wikipedia.org/wiki/HP_Multi-Programming_Executive
Finally, a Rust-based OS, which already works in a prototypical fashion:
https://www.redox-os.org/
"C++/Rust/OOP further that sacrifice by the concept of a v_table (jump, jump)."
Rust isn't OOP though, and doesn't use a v_table. Most calls are statically determined at compile time. Polymorphism is achieved through the use of generics and the code is monomorphised to allow this.
There are exceptions because a dynamic dispatch allows you to do things that are hard otherwise. One of the changes with the Rust 2018 edition is the introduction of an optional keyword to mark usage; that will become a hard requirement for Rust 2021.
This is my naive assumption then, sorry. I assumed that if compilation succeeded, traits would have to have some conceptual equivalent of a v_table during runtime for enforcement. Again, sorry :-(
As far as OOP, well to me it is but I don't consider OOP being tied to inheritance, just it's memory/method accessors. I guess Rust's OOP nature is somewhere between OOP-C and C++. I'll have to generate the asm and look... sometime.
Doubling the numbers of languages used does not double safety. Since when did any modern language live up to the hype? Sticking with plain old C is the safest bet, and the Rust community can make their own OS to compete with BeOS for least memorable effort to outclass C as systems programming language.
Look at the CVE database for the kernel or any other software written in C or C++. You will find that approximately half of all bugs are related to things such as buffer overflows, double frees, null pointers, data races etc. Things that these languages enabled and as a consequence made the code vulnerable to security or safety issues.
They are also things that Rust would have prevented from becoming code in the first place, let alone escalate to the point that they appeared in the CVE database. That's where the interest stems from using it in the kernel and in other parts of the system.
Look at the CVE database for the kernel or any other software written in C or C++. You will find that approximately half of all bugs are related to things such as buffer overflows, double frees, null pointers, data races etc. Things that these languages enabled and as a consequence made the code vulnerable to security or safety issues.
Do not confuse C and C++, they are worlds apart (though C++ contains C an an unnecessary subset, which means it can be misused)
Errors of that type are easily made in C, which is the language in which most of the kernel is written.
In a modern language, such as idiomatic C++, one would not need to write the sort of code in which those errors tend to arise.
They are also things that Rust would have prevented from becoming code in the first place...
Indeed, Rust is another modern language that relieves the programmer of a lot of error-prone low-level chores ... that does not make Rust a silver bullet that can magically improve code quality without programmer effort.
in a STABLE and WELL TESTED language like 'C', senior people generally know what to look for in code reviews, and SHOULD ALWAYS LOOK FOR THESE POTENTIAL PROBLEMS, especially WITHIN THE KERNEL (and when being submitted by junior/inexperienced programmers).
The problem is NOT the language. The problem is lack of PROPER review (all of those CVEs dealing with memory management) and/or NOT using well established coding practices, whenever a memory-related issue causes a CVE to show up.
the kind of thinking that suggests abandoning the C language because of mistakes is the KIND of thinking that might say "Let's replace steering wheels in cars with GAME CONTROLLERS, since EVERY CAR that has EVER had an accident HAD A STEERING WHEEL."
Microsoft and Google have 10,000 times more experience than you do with C and C++ programming, and their engineering managers have decided to develop some new projects in Rust, as have Amazon and many other companies. Rust solves real problems in developing memory-safe and thread-safe high-performance code by replacing runtime errors with compiler errors. Your ill-informed capital-letter ranting isn't going to change the trend.
Idiomatic C++ is kind of a unicorn thing since code is going to smear across time. Few projects are blessed (or cursed) to rewrite themselves to the latest C++ standard when one appears. And C++ code will probably need to call C libraries in places or will have C-esque code for one reason another, e.g. reading data into a buffer.
And even the latest C++ isn't going to enforce object lifetimes, thread safety etc. That doesn't even get into the traps that C++ lays in its language for the unwary - inadvertent copying, the rule of 3 (or 5), virtual destructors, implicit constructors, nullptr and all the rest.
So I don't see that it's hugely better. It's not hard to find CVEs in large mature codebases written in C++ along similar lines to those written in C.
C++ has exactly the same problems as C, if used naively. For example, std::vector::operator[] is not bounds checked. If you dont use RAII, heap errors are almost preprogrammed.
Most importantly, C++ has no multithread-safe memory concept whatsoever. Best of luck debugging multithreaded memory errors.
True, operator[] isn’t bounds-checked, but bounds-checking at access is the least efficient and laziest way to avoid buffer overruns; and in any case, containers also have ::at(), which does do bounds-checks, if you’re happy with the performance penalty.
But the problem with overruns happens not at access, but earlier, when you compute the index that you want to put between those brackets, and that is where C++ containers are vastly superior to the features offered by C. In my (reasonably long) experience of debugging C codebases, overruns happen because code is relying on “in-band” methods of determining the length of structures, which can be lost when those structures are copied into inadequately small buffers.
The C-string itself is the classic example of this problem: copy a string that has strlen() == 10 into char[10] and... oops. no terminating NUL anymore - even if you used the overflow-safe strncpy(), because that function does not guarantee a NUL-terminated result, contrary to what most new developers would expect.
By contrast, vector::size() gives you the number of elements in the container (or bytes in the string). It’s explicit, it’s cheap to call, and it’s always present. It makes it trivial to sanitise indices at entry to modules, and prevent a whole swathe of overrun scenarios.
Rule 1: When you’re doing random-access with un-trusted indices, use ::at(), which does perform range-checking.
Rule 2: For iteration, use for (auto i = container.begin(); i!= container.end(); ++i) rather than iterating by index on size(). It’s the same cost, and will still work without a recompile if you change 'container' to be a set or some other type later on. ('++i' avoids a performance penalty if some iterators have expensive copy behaviour, but most compilers do now make this substitution if you say i++ without using 'i' in an expression)
Neither of those options are available in C without writing your own container classes.
In my experience with memory safe languages, bounds checks cost about 10% more CPU Runtime. Modern CPUs seem to perform the bounds check and the access "virtually" in parallel (speculative execution).
It is time to admit humans are NOT perfect "code generators". If we can mitigate the effects of our imperfect work, that is very good in my opinion.
I think we agree that bounds-checking on every access is expensive, and 10% sounds about right if you were iterating a memory block byte-by-byte in such a language. I often get a touch of the heebie-jeebies when writing such code in C#, but I reassure myself with the knowledge that (in my code) these tend to be once-per execution operations, so the overhead is small in the grand scheme of things.
However, speculative execution on a CPU is nothing like bounds-checking on an array. Speculative execution is where the CPU pipelines one or both of the code-paths that could result from a branch instruction, and then discards the machine state arising from the “losing” branch once the actual branch condition can be resolved. If a memory access on the speculative path access is to an invalid address, that exception is noted but not raised until its proper place in the instruction stream is reached; if such an access occurs on the discarded path, it will noted, but it will never be raised because that machine state will be discarded.
Array bounds-checking is not the same as the use of guard pages to trap runaway pointers before they get a chance to do too much damage. That is a hardware feature, but it will not help you if you trash your own memory by overstepping your array boundaries. To fix that, you must use software checks, which do consume CPU cycles (or, more strictly, instruction pipeline slots).
Well, C was a modern language once, so clearly you think that it lived to it's hype.
In the end, while there will always been some guess work, the real solution is to try things out, slowly and cautiously, see if they work and make an judgement at the end of it, rather than just assuming that "C is the safest bet" based on your prejudices.
Rust is designed to fill a hole and address a variety of problems that C does have. There is no serious doubt that C has its problems and allows a wide variety of bugs which occur with regularity. Rust is designed to address these; whether it will do in the context of the linux kernel or not is what we do not know yet. This is an attempt to answer that question.
Since when did any modern language live up to the hype?
Depends what you mean by modern, but Java, C# and F# seem to have been pretty solid. Kotlin seems largely pointless (might have been different depending on the outcome of the Oracle header files copyright case), Scala is interesting, but I suspect (like APL and LISP) that it is likely to be difficult for many people to get into the right mindset.
Both Go and Rust are trying to solve some genuine problems with software reliability. I think that's laudable given the huge amount of code that's now in safety-critical systems and that things like automotive industry guidelines arguably don't do much to mitigate the inherent dangers of C.
Having said that, you don't want to be constantly struggling with unfamiliar languages because they're temporarily fashionable and I'd be somewhat more impressed if less time had been spent needless inventing new syntax and jargon. But we do need something along these lines.
Sticking with plain old C is the safest bet
Given the number of known errors in C code, this is plainly not true. Rust is very C-like but was built with a view to reducing some of the known common errors in C code. It has been adopted by several projects for this reason and, thus far, people seem to be happy with the results. In many ways it's a bit like a pre-compiler, which is why it's suitable for an alternative implementation: many of the existing tests should be usable.
If a language has been around for twenty years then a massive number of highly experienced programmers have discovered, reported, and fixed bugs in it - after a long time of working with a language you normally know what works, what doesn't work, what not to do, and what to check every time you do it (e.g reading text string overflows).
Switch to a new language and you are going to have to work through everything again, it will be a long time before all the bugs and issues are discovered - I'm not saying that Rust is bad, I'm just saying that it's going to be a while before we can be 100% certain that it's bug free and that every coding method never creates a problem (resulting in free bugs on the dark web).
You're mixing up two different types of errors in my mind. First, that of the language itself and it's libraries. Which have been fixed up and work quite well after 40+ years of development and use.
The second is "programmer" errors. Since not all programmers have 20 years of experience, they will continue to make the same old errors all the time. Just look at any undergrad (or even grad!) programming class to see this!
The C language is wonderful, and was pretty well designed for the systems of the day, but it's string handling is atrocious and makes it painful to do work that people need to do alot. This is why perl became so popular for programming, since it make so many problems that weren't trivial in C much easier to do. And other scripting languages also did the same.
But C excells at low-level system code, where you don't tend (then at least, moreso now) to do much string manipulation. And it's also a place where getting just enough abstract and away from Assember let you be so much more productive, but also you kept almost all the performance of well written code in assembler.
So I'm all for rust as the basis of most kernel drivers, it would limit a whole class of problems. It won't be perfect, but it really can't be all that worse.
You make it look as if the only problems of C are related to strings. This is just a subset of all memory safety errors which occur in practice. All C arrays potentially suffer from index errors. All C heap memory suffers from use after free, double frees and unitialized pointers. Have a look at the CVE database to get real world data.
The people who wrote the HPUX ping of death bug were most likely seasoned developers, not rookies.
Same goes for the many bugs in Windows, in Adobe flash and PDF, in TrueType, Unix utilities and hundreds of thousands of other places. The first time Unix userland utils were run using valgrind, there were loads of memory errors detected.
All C heap memory suffers from use after free, double frees and unitialized pointers.
Then WHY aren't these things simply being looked for and proactively prevented?
(what DO they teach in these schools?)
perhaps a debug malloc/free in the standard C library could specificaly look for double-free, and programmers could use EXISTING COMPILER WARNINGS to find potential unitialized pointers to solve these 2 problems. 'Use after free' can often be avoided by forcing pointer assignment to NULL by convention, after calling 'free'. Then in testing (which you SHOULD be doing) it's very likely you'll get a page fault crash or kernel panic instead of a vulnerability.
It has been my observation that use after free is generally caused by one of 3 "code smells":
a) a junior programmer maintains old code and did not see the 'free()' operation above
b) a pointer variable is re-used when it is already assigned (and the old value is free'd by accident or some other complete cluster-blank happens and now you're re-using a free'd pointer)
c) object reference counts have not been used or were not implemented properly, and the object or block of memory is being shared with varying lifetime requirements.
So, you ALSO look for these specific cases in code reviews before a change or new thing is committed.
In short, a reasonable set of practical solutions has just been presented. No need to CHANGE PROGRAMMING LINGOS (other than Google "feels" we should).
... but surely once you've written the algorithm you (and I quote) NEVER! HAVE! TO! TEST! IT! AGAIN!!!... right?
Or maybe a decent unit test suit and code scanner will pick these things up before they hit production...
Agreed the string handling bugs are mostly a subset of other memory management issues, but they're particularly pernicious due to the traditional nul-terminated representation and string format specifiers (together with sscanf/sprintf and %n meaning even incautious printf can write to memory it shouldn't, not just access).
C has been around for 50 years and people, including experienced and skilled programmers, are still making the same mistakes. Does that tell us anything? In many situations that might not matter, but in systems you really want to avoid those if at all possible.
https://www.zdnet.com/article/microsoft-70-percent-of-all-security-bugs-are-memory-safety-issues/
https://www.zdnet.com/article/chrome-70-of-all-security-bugs-are-memory-safety-issues/
Using C and C++ is like not using an ABS brake, "because I know how to properly brake".
No, it's using C/C++ incorrectly without the available tools to detect and remove these classes of bugs, refusing to use libraries and langage facilities and well known patterns explicitely made to avoid those bugs, and repeating ad nauseam the same errors, that is like refusing to use ABS.
That said, the competition with Rust is very welcome, and will force the C/C++ ecosystem to address the issue of incorrect code.
I mentioned this already, but I'll mention it again.
Programmers need to PAY ATTENTION to compiler warnings. This goes TRIPLE with kernel code.
(it's amazing how many bugs are caught when you read and heed the warnings, even if it is a pain in the backside to do so - and clang seems to be even more helpful than gcc, last I checked)
I always compile with
g++ -Wall ...
And I will fix any warning before I proceed to perform developer tests.
But this does not mean g++ will tell me all the memory safety issues that rustc would tell me in equivalent code. It simply is impossible for a C++ compiler to detect the same types of bugs as a Rust compiler can find. This follows from the language specifications.
Maybe by 2025 the C++ folks have added the same memory safety mechanisms as Rust in their language spec, then you might have a point then.
RAII is a useful design pattern, but how do you enforce a useful design pattern? It might not even be useful in your particular case.
At some point your programmers have to actually know stuff to program, as much as it pains project managers who think programmers are interchangeable cogs and as much as it pains people who think there are technical solutions to fix bad programmers.
Name me any other branch of engineering, architecture, or design where people expect tools to make up for lack of knowledge.
Yeah, I'd've said D was O(two decades) old. That makes it the generation before Go (2009), Rust (2010), Swift (2010), and Kotlin (2010).
It was up and coming for a while when C++ development had stalled. I played around with it and it looked like a genuine contender. Then C++11 unblocked the pipes and D never offered enough to justify the jump. And at this point, I can't see it gaining mindshare. I'm sure aficionados will keep it alive; but it's not something I'd trust a codebase to.
EDIT: Systems language get taken up when they offer new "safeties". C++ offered type safety (and cleanup safety via destructors/RAII). AIUI Rust offers memory safety and race safety. D doesn't offer any new safeties.
Rust has a steeper learning curve over other languages. It's odd 'cos there are some things Rust and Go agree on and something each does better. I tried getting into Rust but it wouldn't click, but being bit of a lazy bugger I went down the Gopher path and that was partly 'cos I like the mascot but partly not being a full time coder I just wanted portability and self contained builds for utils in my automation stack, Go was quick to learn and get going. One day I might give Rust another crack.
"Which means, in my bombastic opinion, that it has the potential of following the same path as ADA"
It could do. My feeling with Rust is that the learning curve is actually getting less over time. It's easier now to learn Rust than it was in the past, mostly because the compiler handing of the hardest bit (the lifetime and borrowing rules) has got a lot cleverer.
Of course, it's hard to be sure from personal experience; I learned Rust a while back and that it feels easier to me could just be that I am better at it, but I have heard similar anecdotes from others.
The best code bases follow the KISS principle. Adding another language for no apparent reason and the whole tooling that comes with it introduces an exciting new level of potential issues. Why would you split a code base up into multiple languages, anyway?
Normally you'd pick a language for a project and stick to it. Rewrite it if is was a poor choice later on. I don't mix C++ or C with Ada unless I have very good reasons to do (compatibility, where the Interfaces module in Ada can be useful), but the better way is to still bite the bullet and just rewrite the whole thing in Ada so that you're not mixing and matching different paradigms and toolchains needlessly.
Since Linux won't be adding Ada support any time soon for whatever random reason Linus dreamed up, I guess that means that if I feel the itch to contribute to an OSS OS, I might as well pick a BSD or something fun like ReactOS, who don't seem stuck in monolithic kernel land, either.
The problem with ReactOS is it is chasing a fast-moving target, Windows is whatever MS says it is, and nobody will use a Windows clone that does not run recent Windows software. Linux started out as a Unix clone, but that was a much simpler and stabler target, and nowadays some OS'es (even Windows) are trying to provide Linux compatibility, instead of the other way round...
Adding support for new APIs which are depreciated by MS three years later would be a waste of time, which most of them are after Windows 7.
When MS finally work out what they want from Windows then ReactOS can add support for it. As it is, a solid platform which runs Win32 would be a pretty good thing.
Your cute "small" language C has created an enormous amount of exploitable bugs. The Linux guys seem to attempt a gradual conversion to Rust.
It definitely makes sense given the history of bugs in the Linux (and many other C-programmed) kernels.
Will it work out ? We will see.
There are some highly interesting kernels such as seL4 around and they also use Rust for their higher level/application parts.
When I see a quote from Google like this (from the article):
"we feel that Rust is now ready to join C as a practical language for implementing the kernel"
my thinking is something like "Are we basing this potentially dangerous exercise on what SOME people *FEEL* ???"
It makes NO sense to change the programming language of the Linux kernel. It makes a *LOT* of sense to keep it consistent throughout, in order to avoid the potential INEFFICIENCIES and/or INSTABILITIES of any necessary 'translation layers' (read: shoehorns) for calling conventions/ABIs/standards/etc. between lingos, _AND_ to make it possible for legacy code to be MAINTAINED. And WHAT is the benefit gained by doing so? The risk, in my bombastic opinion, as well as DEVELOPER TIME, *GREATLY* outweighs any possible benefit to doing this.
What is SO hard about LEARNING TO CODE 'C' PROPERLY???
(if Linus 1.0 called use of newer C compiler features "compiler masturbation", what is he REALLY thinking about THIS??? Linus 2.0 may lack the chutzpah to say what needs to be said, BEFORE something goes "boom")
Many "change for the sake of change"s have been done in the last decade or so. Australis. FLATTY user interfaces in general (apparently driven by Google). An OS that spies on you and slings ads at you, and even strongarms you to use a cloudy login. Subscription versions of desktop software.
NONE! OF! THESE! CHANGES! ARE! GOOD! THINGS!!!
And I have to wonder, HOW many of them were DRIVEN! BY! GOOGLE!??? Micros~1 may just b along for the ride on this one. or not.
AND... if you have to "gerrymander" a programming language to NOT do what it is originally designed to do (crash and burn on memory allocation failure) JUST so that you can use it in the kernel [read: "shoehorn" it in there anyway] then something is seriously wrong with he plan. Or, the lack thereof.
"What is SO hard about LEARNING TO CODE 'C' PROPERLY???"
It just is. Look at the actual commits. A very few people are good at it, sometimes even they make mistakes. Most coders are abysmal in C, or would be if they were forced to use it. Think about that for a minute. The committed C code would be orders of magnitude worse if all the shit coders out there were forced to use only C. Shouting at the universe won't change that.
So if Google wants to do something against the obvious security risks of the Linux kernel, you come here shouting and changing the subject to ChromeOS.
Maybe you just learn something new and better than what you already know ?
Or maybe you listen to Sir Tony Hoare and what he has to say about memory safety.
io_uring
is getting more capable, and PREEMPT_RT is going mainstream