“Minimal adjustments”
They make it sound so easy. If it was, why couldn’t they automate those adjustments too?
Computer scientists affiliated with France's Inria and Microsoft have devised a way to automatically turn a subset of C code into safe Rust code, in an effort to meet the growing demand for memory safety. The C programming language was created in the early 1970s and has been used to build numerous critical systems, …
The HACL conversion to Mini-C required only minimal code changes, and the EverParse conversion required no source code changes at all. Just because the C language has some problematic features doesn't mean that your C program uses any of them.
So, hypothetically we could create a "Mini-C" compiler and add things like "fat pointers" and run-time bounds checks to it. Then see if our C program compiles with it and passes its tests. If it does, we're done without using Rust and with no source code changes, just a recompile.
If there are a few spots in the program which need features that Mini-C doesn't have, then factor those out as separate regular C files, and include them in the project, just like Rust does with "unsafe" code. We're then done without using Rust and with limited source code changes.
Perhaps Rust would handle a few corner cases better than a hypothetical Mini-C, but if a C variant can do 90 percent of what Rust provides it would be more likely to be widely adopted sooner and so provide more security in a practical sense when looking at the industry as a whole.
If just replacing existing code bases with a new language was simple, easy, and cheap, all that COBOL code out there would have long ago been turned into Java. That hasn't happened, and I similarly suspect that C will be around for a very long time to come and lots of new C code will be written.
'C' is a difficult Language to parse because of exception cases and implicit conversions, "mini C" parsers have existed for a very long time (Haskell used a mini-C intermediate language for over a decade before LLVM rendered it redundant), there is even a ANTLR C parser for parsing C in Java. It would be new if it produced safe Rust code, but it doesn't.
The principle of parsing a language into Abstract Syntax Trees (AST), then rendering the AST into another language is not rocket science, Hiperspace.SQL is an example that translates SQL into C#
I think one of the things you're missing is that more and more C code is now being maintained by people who may "understand" the language but didn't grow up with using it for everything and are, therefore, less aware of the many potential pitfalls that have been with C from the start. Compilers have been demonstrably better at creating machine code than people for well over a decade now and it's reasonable to start with a high-level language and rely on a compiler to produce fast but safe code.
The HACL* conversion required minimal code changes and the EverParse conversion worked without source alteration
This is how you know you had fantastic quality C code to start with. Unfortunately those projects will be the exception, but hey, once you've patted yourself on the back for having great C code by auto-translating it into Rust you may as well go ahead and throw away the C code..
Perhaps no re-skilling costs, but there might be other costs associated with achieving and maintaining high-quality C code that could be reduced by adopting Rust.
For example, there might be additional peer review and test costs associated with ensuring you don't have memory safety errors that you can dispense with, or the re-work rates caused by defects found during testing might be lower with Rust than with C because with Rust the defects are avoided or immediately revealed when the code writer performs a test compilation before submitting the code.
Preventing code defects or eliminating them earlier in the development lifecycle is often more cost-effective.
Don’t dispute your rationale, however, remember many companies will be looking to keep costs down, hence I expect many are not doing much of what you suggest (and many would recommend) as it just “adds cost”…
So whilst there is C code to be maintained and people skilled in C readily available, the potentially cheapest solution is to add the mini-C parse to the development process and keep the head in the sand….
Personally, until Rust has a formal language definition, as per all previous serious languages, it’s a toy/academic language…
What is so interesting is what are the reasons why it is taking so long, given the amount of support Rust seems to have, for the formal language definition to be produced… This would suggest Rust isn’t as perfect as some would want to make it out to be…
Agree with all your points. My comment was to point out that there is often a more complex cost-benefit equation - particularly if quality/correctness/security have value, and I think the points you make here expand on that.
Unfortunately a "race to the bottom" on costs also has adverse consequences, many of which get regular coverage hereabouts. I think that also applies to the "skills" of those working in large parts of the programming industry.
It would be nice (IMHO) if the software professions could shift the focus of software creation so that more importance was given to eliminating classes of error we have known about for decades and still see arising. Despite all the advances made in programming language design and compiler technology there seems to be a great deal of resistance to moving on from a rather minimalist 1970s approach. (That is more of a general lament rather than a belief that Rust in particular is the way forward - in my active days Ada was the 'new way').
Well, if you look at Rust code it looks pretty C-like. There's C code that would be line-for-line identical in Rust.
There's other C where there's like syntactic differences but it one-to-one coverts.
Then there's C code, where, you know, it's valid to have like an float, or a long (is that 32-bit or 64-bit on your compiler) or a long long (64-bit or 128-bit?) and then go ahead and access it as ints. I imagine there are things like this that don't autoconvert given Rusts stronger guarantees on type safety. I'm sure there's a way to tell Rust "just do it" but that's probably the point where you want to take a look at the code and make sure it'll still do what you intend for it to. Or dealing with threads, where Rust has a way to do it safely, but your C code might need some tweaks to follow the safety model (if your code is indeed thread-safe it probably doesn't need big changes, but you might need to give hints on locking or something so it can ensure the code it converts is safe.)
I'd expect you could have some pretty significant programs that'd convert straight through; code that uses those "corner cases" that C most certainly allows but are hard to ensure safety on would need those manual adjustments.
It's not really the superficial syntax differences that would be the issue translating code. The issue would be raw API calls, pointer / buffer abuse, assumptions about nul terminators, code with side effects and so on. I suppose there are certain patterns of bad code a C translator could identify and replace with safe code, but it would radically alter the form in the process. Or it would do something worse than alter the code, which is to leave it alone but slather it in unsafe {} blocks calling C library functions so that it's legal Rust but also Rust with all the benefits and safeguards turned off
So they turned C into not-C so they could then turn the not-C into Rust? So basically getting you to do all the heavy lifting of porting and/or adapting the existing code to the subset that already aligns with Rust to make the process *seem* easier?
PS. Those other quotes about how an existing working solution is apparently the devil and all should migrate to the new One True Path remind me of the equally fanatical laser-eyed Bitcoiners. Yes your solution has a place but will you please shut up preaching about it? It's not exactly perfect either.
The easiest way to tell a Real Programmer from the crowd is by the programming language he (or she) uses.
Real Programmers use FORTRAN.
Quiche Eaters use PASCAL.
Real programmers don't need all these abstract concepts to get their jobs done, they are perfectly happy with a keypunch, a FORTRAN IV compiler, and a beer.
Real Programmers do List Processing in FORTRAN.
Real Programmers do String Manipulation in FORTRAN.
Real Programmers do Accounting (if they do it at all) in FORTRAN.
Real Programmers do Artificial Intelligence programs in FORTRAN.
If you can't do it in FORTRAN, do it in assembly language. If you can't do it in assembly language, it isn't worth doing.
Even Unix might not be as bad on Real Programmers as it once was. The latest release of Unix has the potential of an operating system worthy of any Real Programmer, two different and subtly incompatible user interfaces, an arcane and complicated teletype driver, virtual memory. If you ignore the fact that it's "structured", even 'C' programming can be appreciated by the Real Programmer: after all, there's no type checking, variable names are seven (ten? eight?) characters long, and the added bonus of the Pointer data type is thrown in, like having the best parts of FORTRAN and assembly language in one place. (Not to mention some of the more creative uses for #define.)
http://www.quiche-eater.com/the_real_text.html
Real programmers can change the value of 4. In FORTRAN. Unfortunately, not a joke.
Yes, that was true in at least on one implementation of FORTRAN one? or II?, where the hardware didn't support "add immediate". I can't say that c would have been the same, because by the time c was developed, hardware design had changed so much. And hardware was so much larger and faster that slow inefficient languages like c could compete with Fortran where slow inefficient programs were suitable for the task.
One thing mid-era c had in common with FORTRAN 20 years earlier: you could use either language to crash unprotected operating systems by using arrays to overwrite system memory.
As an ancient FORTRAN hand (I used versions I and II and IV), I an attest that the problem was that parameters were always passed by address, not by value as in C. So an external FUNCTION or SUBROUTINE could write to what was expected to be a variable. Since constants were pooled (if needed in memory), this could be very bad. Fortran 77 was after my time with FORTRAN, but I expect the problem remained in that version.
I remember that gem from 40+ years ago
Fortran (then, now?) passed all arguments to subroutines by reference. When an integer constant was passed presumably the compiler would assign a temporary initialized to that constant and pass the address of that temporary.
Some Fortran compilers on some systems (IBM?) assigned storage to all the unique integer constants used in a program presumably because of architectural constraints or memory/speed concerns.
When the constant 4 was passed as an argument to a subroutine the address of location containing the constant was passed rather than the address of a copy.
If the subroutine modified the formal parameter corresponding to the actual parameter 4 then that constant (formerly known as 4) becomes 3 (or whatever) during the remainder of program's execution.
I also recall you could also overwrite Hollerith literals (eg 4HHELLO) for even more fun.
I thought FORTRAN was created for scientific stuff, but COBOL was the lingo for printing out accounting stuff, right?
That's how I remember it, anyway. I did a grok to verify it, seems to concur, language name being derived from "FORmula TRANslation", originally intended for scientific use.
Although the ASK/MANMAN system (business database stuff for HP and VAX minis) is, in fact, written in FORTRAN...
That was COBOL (Common Business Oriented Language), the language meant for pointy-haired bosses and those who like to type "multiply" instead of "*".
FORTRAN (Formula Translation) was for the Boffins. And as a historical note, the predecessor to Doom was Adventure. It was quite the dusty deck!
An awful lot of mathematical code that was written in Fortran in the 1950s and 1960s is still in use today and lots of stuff continues to be written in it. Some clever people worked out quite quickly that there was no point in trying to reimplement these libraries in C, let alone all the languages that have come since then, so they don't. For a while, this meant keeping mainframes running well past their EOL, but virtualisation and other tricks mean that Fortran lives on in areas you'd never expect it.
Safe rust code does not have memory safety bugs, because it can’t.
Some C code does, and when it does, it’s bad. How do you know if your C code is going to have one of the bugs that an attacker can exploit? “Trust me bro”?
They’ve identified the subset of language features that they can auto translate from “maybe it’s safe?” to “for sure, this will be safe (from these really common, really bad class of errors)”, without refactoring.
It seems to me that having the ability to know where in the program are the bits that need refactoring or block this sanity test, should be a good thing.
Converting to safe rust is step one, you’ve missed the second bit: actually compiling it and getting a safe program. You can write safe rust and the compiler will reject it, because it isn’t actually safe… so even if you are not using any of the incompatible features of C, this process would still add safety and validate the code.
So, rather than convert trivial code, this shines a light on complex edge cases - which is where all the problems come from.
Don’t like Rust? Cool
Don’t see the value in Rust? Hubris
Safe rust code does not have memory safety bugs, because it can’t.
So what does it do if I try to write to array entry 101 when the array only has 100 entries?
Prevent the write and continue
Crash the program
What? Because in both those cases the result is not good in a production system.
1. So are they expecting programmers to write programs in mini-C, so that those programs can be translated to safe Rust? Why wouldn't we write those programs in safe Rust and just skip the writing-in-C part?
2. By selecting a "safe" subset of C, you remove the advantages of using C in the first place. "With great power comes great responsibility", etc.
3. If people want to use a "safe" language, why aren't they using Ada?
"Why wouldn't we write those programs in safe Rust and just skip the writing-in-C part?"
Because you have an existing code base represent $hours work and you want to convert it rather than scratch build it.
But that's the wrong question. The right question is what does Rust bring to the party when you already have code that, as I follow it, is mathematically validated and is written to such a narrow standard that it can go through this with few alterations? You want Rustification to reveal bugs and make it less likely those bugs re-occur in the future. But that means ordinary code, which means pointer-arithmetic and type-punning and all those other "unsafe" things ordinary C will do.
>The right question is what does Rust bring to the party
Simple, you re-write your C code so that it doesn't use any dynamic memory allocation or raw pointers. It can then be converted into RUST, which is memory safe because it doesn't use any dynamic memory allocation or raw pointers.
I think the theory is that you convert to Rust, and then all the new modifications must be done in Rust so it will be more difficult for people to introduce that kind of bug because Rust will block them. Or you could add those checks into a compiler and refuse to merge something unless your C compiler accepted it. Either way, while there are cases where it would be theoretically useful, I doubt there will be all that much adoption. Among other things, I wonder how much readability is lost with conversion from limited C to Rust, because every other conversion I've done has involved some weird additional syntax which my brain takes longer to parse.
"So what actually happens say in a production system?"
You see a compiler error and fix your code. Or you see a compiler error, decide you can't or won't fix it, and label it. That's the point of checking for and not allowing those things. It requires a different language in this case, but it's very common for C compilers to complain about certain things that can be compiled but shouldn't be, and it is also common for a production system to operate on the policy that you don't have compiler warnings on your build unless you can explain why they are not a problem here.
"What happens if the memory access is live, IE an array overrun from a user or database entered value. This cannot be detected at compile time, so what happens in the production system?"
You get a runtime error. When the compiler can detect it, for instance array_of_length_10[100], it won't compile. If you try it by getting 100 into a variable and then using it on the array, you get an error and you can handle it, as opposed to C, where you get undefined behavior based on what is in that memory location and whether you're supposed to have access to it. You are right that you can't prevent that at compile time, but by failing the operation immediately, you prevent classes of bugs and make recovery from the error feasible. For example, it is now possible to abort whatever operation involved that mistaken access and log the error rather than waiting for the process to topple over.
And in many production systems, that's already what happened. If you got an index that might be in your array, it wasn't atypical to have some code that looked like
if (index >= array_len) {
log_error("Requested bad index %i from array with %i elements", index, array_len);
return clean_and_abort(&context);
}
The only difference is that this is made a bit more automatic, with a more immediate and clean failure if you forget to do this.
No, the idea is to demonstrate that, if the progams can be written to run in Mini-C, they can be ported with minimal additional work to Rust and be expected to run safer. Whether you do this will depend a great deal on your codebase and your ability to maintain it: skilled C programmers never did grow on trees and they're even rarer now. You might, for example, decide to make write future code in Mini-C to give you options. Or you might just use it for smoke-testing to give you an idea of what you might have to do at some point.
There is an awful lot of C code out there that has no need to be unsafe but no one knows whether it is or not. Rewriting it from scratch in a language like Rust is not an option for most, so some kind of transpiling is bound to be welcome.
"You can't convert all C to Rust, so we made a crippled C that isn't compatible with C or Rust, but which you can then convert to Rust, and it doesn't work for all C".
Yeah, I don't really see the use case here.
How about this:
- If people are using pointer arithmetic, it's for a reason. And Rust won't be able to do a damn thing differently or any more securely about that.
- If you want people to write inherently safe code, you can't write it in C *at any point*. We have any number of languages for that.
- You cannot write safe code that interacts with devices, computer buses or the majority of hardware directly. It's just not possible. The only fix for that? To radically change hardware architectures to return only well-formed descriptive device identifiers with a set protocol for communicating with them (i.e. no memory addresses, no DMA, etc.).
Until we change all the BIOS/UEFI, PCIe etc. specifications to allow arbitrary device detection and communication at bus speed using only, say, JSON or XML or similar languages, you're still going to need to write those bits in C (as they have to make assumptions about what the underlying bits mean and directly access memory), and that means you can't write them memory-safely in any language at all, ever, whatsoever.
I didn't down vote, but the assertion low level code has to be written in C is false. I've written the kind of code described in ASM, Forth and C++. I presume you can write it in "unsafe" Rust. So C is not the only game in town.
And how does JSON or XML get transferred to the host? Is it an "unsafe" DMA or PIO? (Yes, I've written PIO code.) Whereas port based IO just looks like a function that takes or returns a value so that is intrinsically safe.
I'd've also thought a DMA could be made to appear safe - if we trust a kernel function which does the low level stuff so the device driver just makes a DMA() call in, presumably, the same way memory-mapped IO is made appear to be safe. But I don't know enough about modern IO or Rust to be sure on either case.
The downvote wasn't me either, I don't downvote anyone's comments... you are all entitled to your opinion.
I agree with this bit :
"You cannot write safe code that interacts with devices, computer buses or the majority of hardware directly. It's just not possible."
However, I don't agree with the next part :
"The only fix for that? To radically change hardware architectures to return only well-formed descriptive device identifiers with a set protocol for communicating with them"
Because ultimately if you just push the problem down the stack towards the hardware then you just move the problem. A protocol would need to be implemented in some language, a firmware needs to be implemented in some language, even CPU microcode might at a push be a language of sorts, ok... maybe no longer the OS kernel's problem but still a problem.
I'd probably prefer a situation where a problem is resolvable by changing the software via an update than changing or reprogramming the hardware and we're already too far down the path of having mutliple 'firmwares' which are just bits of software running *outside* of the control of your OS anyway.
Again, not obvious who/why downvoted, unless it is simply a True Believer jerking. The comment that I would baulk at is "If people are using pointer arithmetic, it's for a reason" as there is always a reason, but not always a good reason.
Sometimes it would be done to make code faster/simpler when a device has significant performance constraints, sometimes it is actually needed as you are writing an allocation library or similar, but sometimes just "because" and it looked cool at the time.
This post has been deleted by its author
I am also not the down voters (since I just got here!) but I see some things some people would down vote on:
> If people are using pointer arithmetic, it's for a reason.
Most of the pointer arithmetic I've seen in C is just 'I can write this with a series of *foo++ = some_byte instead of bothering to declare a data structure, so why not?' These could easily be replaced with array syntax or structs (basically any actual data structure beside raw bytes), which would be much more readable, much more maintainable, and then safer in another language that checks array accesses.
> If you want people to write inherently safe code, you can't write it in C *at any point*.
I kind of agree, but we have some angry C people reading the comments section who hate the idea of a new language replacing the only language they know and will reflexively downvote this. And you CAN write provably safe C code, so this is technically wrong, but then you'd have to limit yourself to a feature reduced C (as in TFA) then run it through a prover, which also makes people angry and requires some infrastructure support. Might as well just write it in something else, so the spirit of this line is true.
> you're still going to need to write those bits in C (as they have to make assumptions about what the underlying bits mean and directly access memory), and that means you can't write them memory-safely in any language at all, ever, whatsoever.
You can, and I have, written hardware level drivers not in C. And you can certainly do it in Rust, though, for example in Linux, if the kernel doesn't actively support it then it's just not worth it. But that was a Linux kernel limitation (before Rust for Linux), not a Rust limitation. Another issue is gcc vs LVVM, but again that is toolchain issues, not the language.
I don't find any of this worth a downvote, I save those for aggressively BAD comments. And I'm just nitpicking because someone asked about what was possibly nitpicky. But some people are a lot more downvote happy.
I realize we're getting close to having more people who didn't downvote trying to explain them than there were actually downvotes, but I do have a few comments that could explain part of it:
"If people are using pointer arithmetic, it's for a reason. And Rust won't be able to do a damn thing differently or any more securely about that."
This is an unproven assertion. There may be a reason that could be done better or equally well, or it might be required. It all depends on why that is in there semantically. That level of generalization is as wrong as saying that everything in C should just be rewritten in Rust because it all works.
"You cannot write safe code that interacts with devices, computer buses or the majority of hardware directly. It's just not possible.": It depends what the hardware is, but that's not the case for all of it. Plenty of devices can be used without having to share raw memory as they imply.
"Until we change all the BIOS/UEFI, PCIe etc. specifications to allow arbitrary device detection and communication at bus speed using only, say, JSON or XML or similar languages, you're still going to need to write those bits in C (as they have to make assumptions about what the underlying bits mean and directly access memory)"
Not all the time. Direct memory access is necessary for only some hardware, and it's perfectly possible to handle nonstandard protocols in a memory safe way. It's not as elegant as having everything in a standard form, but it suggests that some classes of simplification are not possible when they are not only possible but sometimes have been deliberately written that way. Even when you do have to do something in a memory-unsafe way, C is not the only option for doing it, though it's often the most convenient.
Yes, one is me.
You say "You cannot write safe code that interacts with devices, computer buses or the majority of hardware directly. It's just not possible."
You've clearly never written IBM mainframe assembler. Nor worked as an IBM mainframe System Programmer.
Writing "safe code that interacts with devices, computer buses or the majority of hardware directly" is part of the requirement to be able do the job.
Here endeth the first lesson.
Even if 1% of a kernel is using unsafe operations such as setting the registers of an A/D converter or a PWM circuit, this is vastly better than 100% unsafe lines of code. The kernel will be vastly more robust against cybernetic threats and previously unknown bugs.
An immediate, clean stop of your system is vastly better than "soldiering on" with "mysterious behaviour".
If people are using pointer arithmetic, it's for a reason. And Rust won't be able to do a damn thing differently or any more securely about that.
If the reason people are using pointer arithmetic is because the language only provides pointer arithmetic as the means to operate on particular data types, whereas another language provides a more complete abstraction of such data types, then you can do something about it.
I’ve seen this advocated for at fewest 100 times. I care less and less…. ;)
It’s not a rule, it was just proposed by some guy 400 years ago. If I use it, and you understand it as I intended - then the language has done its job.
You can correct people about using the wrong tense, or even the wrong word - but if I’m telling you there is a Sabre tooth tiger behind you, you should probably just act on the information received, rather than fuss over the delivery of it; that is, of course, entirely up to you
Sounds like it could be useful if used for C linting or gradual rust conversion over time to me. In some legacy C code you sometimes find pointers used where they don't have to be used. Of course all the code for hardware interrupts, DMA and critical regions would be pushed into C libraries and the more application like code converted to rust overtime. As long as the the binary assets are no bigger or slower I can't see why not to use this when one wants to reuse older source code for a new product.
The key is hiding the pointers under the covers. Rust will let you assign a byte array to an existing blob of unknown structure (like network packets or images) and then just use indexes. Which are, yes, pointers under the covers, but the covers are important. And then the Rust compiles to very deterministic machine code like C and is comparable in speed since they're both going to the same assembly language and get optimized the same. The C also has to check whether that offset went off into random memory, it just has to do it explicitly.
And at that point there's no reason Rust isn't as fast as C. Most of Rust's safety overhead is at *compile time* There is of course some runtime overhead for Rust for dealing with object ownership transfer, but if your code is super tight and time critical then you wouldn't be doing any of the things that trigger that in Rust anyhow, any more than you would be doing free() and malloc() at that point in your C code. You can kind of think of Rust as C++ in this specific case - you wouldn't be using all of C++'s virtual class/method stuff in a super tight inner loop either, and so C++ can be just as fast as C in this scenario. Same with Rust.
If you really have some crazy-ass scenario where you have to be randomly accessing memory all over addressable space then you can resort to Unsafe Rust. Which... yeah, why bother. But in most tight scenarios (and I do image processing and have been looking at network packet filtering) I have not found lack of explicit pointers in Rust to be a bottleneck.
A common thing in C, Pascal, Forth, etc. is to allocate a chunk of memory which will hold a structure, fill in the fields in that structure, then update pointers to add/insert that structure into a linked list.
What does the "pointerless" version of that look like and work like?
Making it act as a dynamic-length array (which we'll need to be able to query its length at runtime) is gonna suck when we have to insert or delete items if the list is "long". An insert would be: sequentially search the array for the insert-before point (if we don't already have an index to that point), remember the insert point, go to the end of the array, expand the array by one element (which is a bunch of bytes in a structure), go to the new end of the array, walk back through the array, copying each element from A[n-1] to A[n], until n "points at" the insertion-point element, then overwrite the insertion-point element a/k/a A[n], with appropriate new values.
That's slowwwww as molasses in January. It might be fast-enough for some applications, with "small" values of n, but not for the more-general cases.
I seriously recommend you just have a play with Rust if you're interested enough to slander it. It'll be fun, you're not obligated to continue using it once you realise it has pointers, linked lists (if you truly want them), hash tables, some quite nice trees, and that they're all easy to swap in and out to trial their performance, and you'll be better informed to point out it's true weaknesses in future.
Any serious language will provide typesafe references and the allocation of some sort of object/struct where the references point to.
The references will be either NULL or valid, no funny C-style tristate.
No need for raw pointer operations and crazy casting.
The only exception to this are a tiny fraction of kernel code(say 1%), which indeed need funny, unsafe casting of e.g. "array of process" to "array of byte". This can be done by unsafe Pascal, Ada, Rust. Maybe a few lines of assembly will be needed, too.
I appreciate the effort for Mini-C.
But "safe" subsets of C exist for decades, like the MISRA C standard that tells you what (not) to use.
That makes Mini-C "yet another subset" in my opinion.
It is good to apply Mini-C to changes, but before the full code tree can be "recompiled", there is a huge effort ahead.
Writing this, I wonder to what extend the "memory safety" issue is resolved by applying Mini-C by itself to existing code, so without moving to Rust?
Well, F* my OCaml! Hard to beat such tasty French KaRaMeL (formerly known as KReMLin) to satisy one's modulo taste bud theories! But let hardware exercise those fat pointers I say, that's where (concurrent) bounds checking on your code's data access waistline belongs, so it fits through doors, buses, and pants, treasure caches of data and instructions, all without bloat, discomfort, or overflow! Get me a Low* FPGA CPU to F*-run mini-C on and I'm sold (out!)! Word ...
But also, bless inria for its F*, Coq, and Rocq Hard Prover (and Tom for this inspirational TF*A)! ;)
I've corresponded before that AI can make a huge difference in porting C and C++ code to Rust. A lot of it is grunt-work and the icing on the cake will always be the domain of experts, no matter how smart AI becomes (unless it becomes as smart as humans, of course).
The same holds for porting COBOL code to Java or C#. It's mostly manual labor, but even when done in low-wage countries it's still to expensive to do.
You probably could build an AI* that could do that conversion, but you have two challenges that have prevented it before and will again:
1. This has been around for a long time: readability and maintainability. I have a program here which was originally written in assembly, I think Motorola 68K assembly. That wasn't very useful, and it was not attached to a lot of interfaces, so it wasn't hard to translate to C. So now, I don't need to try to virtualize something to make it run. However, I still don't really understand what's in here and can't modify it. I have C code that produces the same results that the original one did, but modifications means drilling down to understand what every part was for, and that is tricky because I had only machine code to start with. Every time automatic translations happen, readability tends to get lost. Even if the translation is perfect, it can be difficult to impossible to make modifications to it later.
2. Accuracy. This is a bigger problem with modern AIs, but it's always been a challenge. Most software does not have complete, mathematically proven test cases, where if the tests all pass then we're absolutely certain that nothing is wrong in this code. Mostly, we have basic test cases, where if any of them fail then something is very wrong with the program. Often, we don't even have too many of those. This means that when you're translating code, you can't know for sure whether it's even accurate to what the original program would have done unless you run them in parallel forever and set off an alarm if they ever disagree. Modern AI is very likely to add bugs when it translates. Admittedly, most of those are likely to be obvious, ranging from its output simply not compiling to the program crashing obviously, but when it looks like neither of those has happened, that does not prove that the attempt was successful.
* AI: depending on what you intend this to mean. We have explicit rules-based programming language translators around, but they generally don't qualify. I assume you mean an coding-focused LLM, which often make mistakes in their output, especially when the job is big. Translating a large program, and it would have to be large or you could let a human translate it, with insufficient documentation is very likely to be too large a task for such models to complete correctly.
Big upvote on point 1. I've seen a number of code translation schemes COBOL->C, BASIC->C, 4GL->Java ... Conceived as a fix to out of date hardware/toolchains or lack or programming resources. The problem then is that you now have a system that cannot be maintained. If you make a change to the original code then you have to retranslate into the new code - and that requires a programmer fluent in the old code, saving nothing. Otherwise you have to completely abandon the old code and only maintain the new code. If you have seen the results of any these language translations they are hideously contorted and very difficult to make all but the very simplest of modifications. It is mess and I've never seen a single such system end in anything but failure, or a completely static code base that can never be changed.
"As the Internet Security Research Group's (ISRG) Prossimo Project puts it: "Using C and C++ is bad for society, bad for your reputation, and it's bad for your customers."
This is so vile and misleading... the world runs on top of C and C++. The technology based on those programming languages made (and still makes) the world a better world. And no, they are not bad for my reputation, quite the opposite. I am also fluent in many "memory safe" languages like Java, C#, Rust, Python, and JavaScript, so believe me when I tell out that the ISRG's sentence is garbage.
Thing is, whilst the world does indeed run on top of C (and C++ to some extent), there's been an awful lot of CVEs attributed to careless use of those languages in writing bits of operating system, system service, etc.
Most such faults have occured because, whilst there's plenty of knowledge as to what one is supposed to do when writing code in such languages - e.g. validating inputs before processing them, those things have not been done by the developer and their omission has not been spotted in a review. For example, the HeartBleed bug was entirely due to inputs not being validated on an interface specified by English language RFC and implemented in hand written badly in C code. This was particularly poor because we've had the tools and technology to specify interfaces in schema with input validation defined, and automatically implement them in a language of your choice for decades. ASN.1 has been around forever, XML and JSON are newer ideas with the same capabilities.
Rust is proving capable largely because the amount of review effort required is significantly reduced, especially for anything slightly complex or multithreaded. A developer presenting a pile of Rust for review that is devoid of the unsafe keyword that compiles and runs is already a long way ahead in the review process. By comparison, C/C++ that compiles and even runs has to be treated with the utmost suspicion by anyone reviewing it or relying on it fresh out the box, even if it does lint cleanly.
The amount of time spent by the two developers getting to compiling and running code may not be so very different. However, as it's clear that it's everyone else's time thereafter that matters most, Rust wins on that basis. To argue it more bluntly, picking C / C++ over Rust for a new project today if Rust is a genuine alternative is deliberately making the job of review and delivery harder and slower, which is not nice for the reviewers or the people waiting for the end result. Of course, there's lots of valid reasons still why Rust may not be a genuine alternative for a news project (e.g. availability of developers), but one really should test one's assumptions these days. If Rust does explode in popularity, one could find one's fresh-shiny C project devoid of developers in only a few year's time.
And this is showing up in projects.
There's the Redox OS, written in Rust. Ok, that's not production ready. However, one has to look at the size of the development team and their productivity. They went from nothing to a running and fairly complete OS with a graphical desktop in an astonishingly small amount of time (3 years). This was far more code written and got running than - say - Linux required, because Linux was merely a kernel on top of which the GNU user land was deployed with X-11 (these two already existed and had taken years to get together).
Some guy at Mozilla has reimplemented the GNU core-utils (ls, echo, cat, etc) for Linux in Rust. Along the way a number of latent bugs were found, and the end result runs faster. For a "mere" re-write to be finding bugs in code as old and as depended upon as this, and to find ways of speeding it up (look up Rust's "Fearless Parallelism"), is somewhat surprising. But, there you go. I think quite a lot of distros now make this available as an alternative package.
The parallelism (concurrency) is key, for my money. Rust takes multithreading from something that's a major headache you only do if absolutely necessary, to something that's breezy, and comes without the overhead of the garbage collected languages that tend not to scale so well (*). Considering where the hardware is going - has already gone - it's going to become unsustainable for pros to go through their careers writing mainly single-threaded code, and the race conditions are likely to become a bigger slice of the CVE pie as other issues are dealt with.
Ironically, or maybe not, parallelism is what makes a language like Rust possible with it's heavy static analysis without unconscionably long compile times. In the same way GPUs up-ended the status quo in neural net development, the widespread availability of many-core desktop CPUs is shaking up software development.
I have enormous respect for C and it's influence on modern languages, but honestly the reluctance some people have to acknowledge it's short-comings is a wee bit bizarre. The world moves on, computer hardware is unrecognisable from where it was 50 years ago, and really it's a miracle C has lasted as long as it has. It will still have a niche for a long time to come, and that's pretty cool, but it's not going to suddenly gather new generations of devotees who are sick of sleek modern languages. (C++ is a bit more interesting, because you can maybe see it benefiting from the competition, but it's days of being the nearly unchallenged choice for high-performance code are over).
(*) That said, I hear very good things about Go's multithreading too, depending on the context. (Still a dirty bloated CG language though, obviously, ugh).
Agreed.
There is an uncanny relationship between Rust and Go (well, Go's CSP) that I think most have not spotted. Rust's fearless parallelism is based on its compile-time knowledge of object ownership. By knowing what is owned by who and when, it can deduce when it's safe to pass off some functionality to another thread.
The relationship comes from how that's done. I imagine that at present the Rust compiler simply starts a thread and shares the data address with it (safe in the knowledge that the main thread isn't going to access it). However, that is not the only possible implementation. It could just as easily serialise the data, pass it to another thread as a copy via (for example) a pipe, and accept the result in turn in the same fashion.
Thing is, if I rename the "pipe" to "channel" as such things are termed in Go, and you have a multiprocessing architecture implemented much as a Go programmer would have implemented it. Except the Rust compiler would have done it for itself.
The implication of that becomes very profound when one considers what that might allow hardware to become. At present, the Rust compiler is assuming that the code it generates runs on an SMP hardware environment - one global memory equally accessible to all CPU cores. However, if it could also serialise and copy data down a "channel" instead, what happens if that "channel" is a hardware inter-core link? The cores wouldn't have to be able to access all the memory in the system, just their own; they'd be exchanging data through these links, not via shared memory.
The profound bit is that to the developer, they're still writing what looks ostensibly like single threaded code, but Rust would be auto-parallelising it. And the machine itself would not be having to implement an SMP environment. And at the moment, it's having to implement an SMP environment for C that is causing all the problems with cache and CPU security such as Meltdown and Spectre (and derivatives), so getting rid of that problem wholesale would be a good move.
The implication of that is that one could write software in Rust, and build it either for an SMP environment (such as it runs on today), or build it for purely NUMA hardware. So if our hardware platforms had to transition from SMP to NUMA to continue to improve speeds and/or to lose problems like Meltdown for good, Rust has the potential for making that transition painless (or less painful); the source code would not have to change.
I know there'd be a lot of minutae to sort out, but a surprising number of these have already been done. For example, what I've outlined implies that there's a large number of hardware cores, one for every thread started by every process running on the machine. Modern hardware has a lot of cores, but not that many. However, the old Inmos Transputer had (in effect) hardware virtualisation of cores; multiple "threads" could run on a single Transputer, the hardware scheduled between them however many there were, and each thread behaved as if it had sole access to the cores resources such as inter-processor links (channels); multiple thread's traffic was multiplexed over the hardware links.
The Future
So far as this old coding dog is concerned, that's where the future lies. Rust is accidentally a pivotal language in the future of computing and computer science.
But as this old coding dog knows all too well from years in the business, the software development industry seems remarkably resistant to new ideas, better ways of doing things, tools that eliminate work, and remarkably dismissive of things that help improve rigour for no-effort.
This makes it very off putting for hardware manufacturers to try something new, because they know it's incredibly unlikely that they'd bring the software industry with them. Just look at Itanium's fate...
We shall see. It's interesting that companies like MS seem willing to put effort into adopting Rust for their OS. This kind of organisation has got the money and the power to re-write its most sacred software, and is also big enough to go play in the hardware space. If companies like Microsoft or Apple worked out the future like this, it could happen. One way they could do it would be to continue to support SMP code such as C, but only as a software emulation. Whereas Rust (or even Go) built and ran a whole lot faster being native for the hardware.
Some might suggest that this smacks of a 1980s coding dog calling and wanting their beloved 1980s hardware back. But my response is, if the ideas are so antique and useless, why do modern generations keep reinventing it and appreciating it? For CSP in particular, I've seen that be invented, die, re-invented and die, and finally re-invented yet again. Maybe Rust is the thing that finally makes it stick both in software and in hardware by hiding it from the developer!
I do like Rust but as someone who writes safety critical code for a living I struggle with the whole crates.io ecosystem.
I started to dabble with something this week, a basic web service using one crate (Rocket). It then downloaded another 112 crates during first compilation. Suddenly I have absolutely no idea what code is in my application.
Maybe I should start playing with limited/crateless Rust as I do think it has enough traction now to be worth some investment to learn.
> I do like Rust but as someone who writes safety critical code for a living I struggle with the whole crates.io ecosystem.
Welcome to the dependency hell reloaded... nowadays all programming languages with a dependency manager/repository (*) suffer from that problem. C++ with vcpkg (the package manager I use/know) is still manageable because it works differently (Microsoft curates the official supported packages, and you stick to a global vcpkg version, which groups a series of packages instead of individual package versions).
(*) Especially JavaScript/TypeScript, where a lot of micro-packages (e.g. check this https://www.npmjs.com/package/isarray, 50+ millions downloads/week, 1 function with 1 line of code!) exist due to the silly adoption of the Unix philosophy "do one thing well" for library writing (also there are a huge amount of lazy programmers that introduce micro-dependencies instead of writing them by themselves).
Call me weird but I like writing code using pointers. There's a certain challenge you get with writing C code in avoiding those segfaults, buffer overflows etc. Always validate your inputs and test your limits, iteratively speaking.
Having said that I would consider starting a new project in Rust, but convert older stable C code? Probably not except for security front ends.
Like Google says C is going to be around for a long time. Time will tell if Rust is the one.
> There's a certain challenge you get with writing C code in avoiding those segfaults, buffer overflows etc.
Not really much of a challenge, I'd say. The actual challenge is knowing the language well enough to use the features you need. In that regard, like every other language...
So Rust gets us to memory safety!
(1) The Agile crew have eliminated "requirements". Does Rust help with testing something with no requirements?
(2) The Agile crew have got us to inflict functional changes on users on a regular schedule. Does Rust help these confused users?
(3) And then there's "technical debt". How does Rust help with this?
To get to the point....there are serious problems with application development.....Rust (probably) fixes some fairly minor low level problems.
Rust was designed to fix a couple classes of problems, not everything in software development. No language can fix or even slightly help with problems 1 or 2. Technical debt is a very broad topic, but if you have and count memory safety issues as a tech debt thing, then maybe it would help you. These problems are almost entirely unrelated to one another.
C++ is not a extension to C, sadly it has become a jack of all trades trying to include everything from every language ever created. Yes you can screw up memory management in C, you can even use pointers badly and trample things. However it is a very efficient language and when I started my 6mhz computer would start in under a minute to something I could actually use, my current upteen GHz laptop running Linux takes 5 minutes, windows is half a year. If you can't do your job properly and deal with pointers and memory then go and find a job shovelling shit which is where you should be. There are a myriad of static analysis and related tools to help if you occasionally make mistakes.
This is pretty cool - any components (functions, ..) in your program that use *only* this subset of C can be _proven_ memory safe! (It doesn't need to be _compiled_ to rust.) You can put this converter into a static analyzer and get the same benefit. There are probably many, many many aspects of programs that are like this _already_ without refactoring a codebase so that it fully compiles to rust.
As an extra added plus, a static analyzer can do more -- when there *are* type casts, the static analyzer can analyze the before- and the after-type usage of the variable and see if both are memory-safe, as much as reasonably possible. Without even refactoring!
This is a great win for static analysis, if implemented that way -- otherwise, maybe it's just another instance of, "Oohh, shiny..!"
Maybe existing static analyzers already do it. I feel like they're probably not used frequently enough.
I just realized I could actually read the article in question without registering for anything other than registering my existence and information. That is all I've ever wanted to do, as well as to cite links to the articles I read. Can't have a good argument without citations now, can we?
Despite what John Cleese insists.
I think it is important to note that their inputs were verified C code bases. That likely means they were stricter and more consistent about their use and abuse of data structures, following known patterns of behaviour instead of getting too creative (like the C++ internals used to in the preprocessor days.) I'm not sure how relevant their research is going to prove in practice to common code base standards out there.
I don't understand this obsession with memory safety. The rule should be that if you can't design and/or housekeep your code well enough to prevent inappropriate accesses then you shouldn't be writing in 'C' in the first place. The fact that people have written such code in the past and likely continue to do so is nothing to do with the language -- and certainly nothing to do with it being x years old.
From the earliest days of computing people have been trying to develop "programmer proof" languages, ones where common programming and logic errors are just not possible. That's a great idea and should be encouraged. But there's also a need for low level programming which is what a systems language like C is used for (which is -- tada!! -- why operating systems are written in it). There is a case for a system like languages to simplify writing higher level OS code, the sort of thing that might once have been scripted, and this is where a language like RUST should prove useful. But blindly stating that "such and such, being old, has to be replaced" and then using vague terms like "memory safety" just underscores how many people don't really understand how operating systems are structured.
Why do Unix bigots always think the whole world revolves around C, and it's the only language that exists? Why?
Not all operationing systems are written on C!
Most of the world's financial services industry runs on operating systems written in IBM assembler or PL/S.
ICl, Unisys/Burroughs and Moscow Precision had them about the same time Unix was conceived.
They have/had Algol-based kernels, Algol-based application software. All memory safe to the degree possible.
Much better approach than the wild west of C "one kernel exploit and its game over".
Also see JavaOS and Singularity OS, conceptually similar.
https://de.m.wikipedia.org/wiki/JavaOS
https://en.m.wikipedia.org/wiki/Singularity_%28operating_system%29
The OBERON OS is also written in its own memory safe language.
"I don't understand this obsession with memory safety. The rule should be that if you can't design and/or housekeep your code well enough to prevent inappropriate accesses then you shouldn't be writing in 'C' in the first place. The fact that people have written such code in the past and likely continue to do so is nothing to do with the language -- and certainly nothing to do with it being x years old."
You obviously have not worked in professional development in C.
Not sure now, but a couple of decades ago, people with 0 background on programming (scientists, other profiles), many of which would not care about reading the ANSI C bible would program in C, and some of those tools would be used widely, still today.
People would very often use non sense structures, pointers de-allocation, etc ... that would issue severe compiler warnings but would still manage to provide working execs, often by luck (input is as usual, and not too long etc ...).
Very likely, your windows 11 still contains a lot of code developed by such people. Those are the bugs exploited by hackers today. Like windows fonts being abused to gain privileges.
In the light of this, this RUST and also mini-C to RUST initiative makes a lot of sense. What about whole Win11 being fixed to mini-C then to RUST ? I think we'd see a MASSIVE fall of exploits here !
The Rust compiler can't handle crappy C code without spitting crappy Rust code, so they created a safe-C that slaps the programmer for lazy memory management or it won´t compile or something, so it can be promptly converted into safe Rust.
So it has nothing to do with Rust. Rust doesn't help any C programmer to get rid of bad code habits at all. There could be any language on the other end, like the mentioned Java, but the lazy C code can't be fixed by itself on the compiler to be safe, regardless.
Mr. Babbage would be proud of his answer to this day!
"Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?"
So the creators of this amazing idea are described as "Computer scientists" and "Boffins." But I fear the truth is more worrisome ... they might be academics. I find this worrying because my definition of an academic (which I have always found to be true in practice) is: "A person who invents a special kind of hammer, and then spends the rest of their life looking for a special kind of nail to hit with it." And I think that pretty-much sums-up what's going on here.
"While C and C++ code can be made more memory safe through diligence, static analysis, and testing,"
If the goal is Memory safety ...
C and C++ can be made more memory safe through the use of valuidated memory safe libraries, and the conscous choice of algorithms, including avoiding dynamic memory allocation, if necessary.