Good luck
See also the gutting of librsvg.
https://medium.com/codex/i-spent-18-months-rebuilding-my-algorithmic-trading-in-rust-im-filled-with-regret-d300dcc147e0
Rust changes worlds. The iron ore we mine to feed the industrial age started out as iron atoms dissolved in oceans two billion years ago. Then photosynthesis happened, pouring out oxygen that rusted that iron out of the water into the solid minerals we've found so useful today. Much the same is happening with Rust the …
I've seen probably 10:1 more posts from developers seeing a positive change in rewriting their codebase in Rust. Actually, I particularly keep seeing this specific story floating around saying the contrary, and not many others. There are others, just far less than the positive ones I've seen.
Depends, are they using five different fuzzers/linters? That's essentially what Rust's lifetime system and functional-first design eliminates. When people say "you can make C/C++ safe you just have to use all these extra tools and rewrite half of your code to use their annotations", my response is "that's literally just Rust".
It literally isn't, because with C/C++ backward compatibility allows for small parts of code to be refactored at times which suit the project instead of a big bang rewrite.
>How much would code be improved if it was just re-written without changing the language?
Why rewrite at all? Or decompile-translate-recompile? Teach your LLM machine code, and let it optimize and correct the binaries directly.
Even further; add the model to any compiler you like, and just write typical bad code and hit the '-O AI' option to make everything good.
As a long time C and C++ developer (who is disappointed with C "do it yourself" simplicity, and C++ nowaday's monstrosity) I can say that the medium post resumes my frustration with Rust: awful syntax and semantic (nothing to envide to C++), bad error handling, and stupid fanboy community. I am still not convinced that Rust is better (or ways better) than C++ (at least in my professional hands that haven't produced a segmentation fault/access violation since the last century).
>at least in my professional hands that haven't produced a segmentation fault/access violation since the last century
This is the average fault of a novice Rust developer, assuming the lifetime system is "just" about preventing access violations (aka compile-time garbage collection). It's really not, the ownership system is an entirely new programming paradigm (well, not "new", Rust self-admittedly didn't invent it, but it's the first mainstream implementation). I remember my time using several other languages before ever trying Rust, I had implemented my own ad-hoc runtime-based lifetime system, just because I usually wrote complex programs and felt the innate desire to be able to control exactly where every single resource was held and how long it lived. Rust has that built-in at compile time. I don't have to worry or wonder if I forgot that I gave some other subsystem a reference to something, a Rust program has this very natural "waterfall" architecture thanks to everything being required to be responsible for a resource, kind of like an actor model. Rust is far more akin to OCaml than C/C++, it just has C syntax and can compile at a native level that C can.
Besides that, the biggest thing Rust has, to me, is package management. Not groundbreaking by any means, but C/C++ are nowhere near as easy to add dependencies to as Rust. Rust's systems-level compilation combined with it's package management mean that you can pretty much make an OS out of a bunch of crates held together with a bit of glue logic, it's like the Perl of systems programming.
Also, I would absolutely not criticize Rust's fanbase when invoking C++ in the same sentence. Rust's fanbase is rabid, C++'s fanbase is psychotic. I'm surprised how many people now suddenly forget how volatile the C++ StackOverflow is.
That's literally the root of all evil for C and C++
In your hands, they do fine (for a decent value of "fine", as you have complaints too).
And "give full control to the developer", the much vaunted philosophy of the languages, works fine *for excellent developers*. On their good days, anyway.
But I got out of uni 22 years ago, and I still have to meet people who are excellent developers a significant number of days of the month, whether I look in the mirror or not. Perhaps I need to keep better company, but the truth is that there aren't enough good developers, and therefore tooling needs to avoid the dangers.
You do wonder a bit about this post. The author spent 18 months coding in Rust. Their second big complaint is that there are no stack traces when errors happen. They also complain that the community is crabby and bitchy.
But in the comments it is clear that they acknowledge that there were no stack traces because they weren't switched on, and the crabby community was reddit; the author never looked anywhere else for somewhere nicer.
The other complaint is that the syntax is verbose and the semantics unintuitive in places; well 1 out of 3 is not bad. Maxing Rust less verbose and more intuitive is and has been a thrust of Rust lang development for the last few years.
> Maxing Rust less verbose and more intuitive is and has been a thrust of Rust lang development for the last few years.
That has been the goal of many languages, eg. Pascal wrt to Algol
Although C did deliberately buck the trend and made a virtue out of being a terse and hard to read language.
However, as anyone who has studied (computer) languages and compiler techniques at Uni (ie. Know their Aho & Ullman), the more powerful and useful languages have the more complex grammars etc. which automatically make them less intuitive. Remember Pascal required a one pass compiler, whereas Ada needed a 4 pass compiler.
As someone who had learnt Algol 68 et al and assembler, C was relatively straight-forward - K&R was a weekend read and the language reference and semantics addendums helped, as they provided the framework o which to hang the specific symbols/words/characters needed to write a C program; Rust currently has little of this.
>a machine for turning closed source into open source. Big Tech will not like that.
I don't think they care. Proprietary software hasn't been a viable business model in a decade. If a business wants something closed off, now it's a service. You use a client (usually even an open source one!) that talks to the service, but you can never get the service's source code, you don't even have a binary to decompile. This is why MS moved their primary business model from Windows to Azure (and why Windows is rapidly going down the drain now). Not to mention Big Tech has been using Java for decades and that's exceedingly trivial to decompile, and they never cared about that. Primarily, they didn't care because they had legal protection around that, and the legality of LLM decompilation isn't know yet. But either way, like I said, they're not gonna care. At best, you might get one or two companies suing over general copyright (look & feel, etc.) but nothing major.
I am actively using an automotive R+D software package that costs about 20k Euros per seat (Vector CANoe). It's dongled via Vector's CAN Adapter.
Similar things can be said about Lauterbach Trace32 debugger, the WinDriver(WindRiver ?) compiler and many other automotive SW engineering tools.
Vector seems to be thriving, when I look at their buildings and when I visit their cantina.
I was more or less talking about common consumer software. Expensive software made specifically for corporations was never concerned with decompilation or open source, because you're paying for the support, not the product.
Actually, it's funny you mention automotive, because that specifically is required to pass government regulation. You're explicitly not getting that from open source unless you know a few rich philanthropists who feel like throwing around some pocket change for fun.
IF this automatic decompilation method works, the proprietary-minded, thou-shalt-not-disassemble-so-sayeth-the-EULA tech bros will create a compiler post-processor or a compiler mod which salts the object file with random, "harmless" instructions, thus destroying the patterns needed by LLMs to successfully work.
Ah, the days of remapping INT 21h to INT 3h, so you could use the 1 byte CCh opcode instead of the 2 bytes CDh 21h. The memory saving was nice. But the real aim was to bugger up everybody trying to live trace it. (Because INT 3h is the debugger break point and, as the op code is shorter, it's hard to remap back.)
Also remember to remap INT 1h (single step) to something tasty, but that's more of an annoyance.
> LLM-driven C-to-Rust.
Would it not be better served by the LLM "making safe" the AST before directly emitting the binary. The step to transpile to written Rust code first is pointless if the LLM has already confirmed and corrected the code.
Its like C++ used to transpile to C first (C-Front compiler). However it was later deemed better to bypass that stage and go straight to the IL/GIMPLE/binary, etc. The machine generated C wasn't that usful.
Its not like anyone is going to actually write with the machine generated Rust anyway. People will barely write with Rust as it is.
Has anyone done the LLM-based C->Rust Transpilation yet ?
I am at this point also sceptical, because ChatGPT failed miserably to create a correctly working Enigma for me. It created code that compiled and looked correct on the first sight. But on close inspection it missed the odometer style rotor mechanics and it incorrectly performed the "reverse signal flow".
Add: found this: https://galois.com/blog/2023/09/using-gpt-4-to-assist-in-c-to-rust-translation/
No - that's why they think it's a thing worth invesigating. The general puprpose ChatGPT style models aren't great at it - but one trained explicitly to do this *might* be much better. It might not; but it's worth trying - despite what the masses of old developers seem to think (and I speak as an old developer)
There doesn’t need to be. C2Rust already does translation of actual .c files to Rust, and rust-bindgen translates header files to Rust (i.e. for calling the functions declared in that header file from Rust).
I have no idea why people would want to use an LLM for this when people have been doing it fine without one. Neither language is all that simple when you get into the details and I would be very surprised if an LLM could reliably turn one into the other.
I am not familiar with C2Rust so might be wide of the mark here, but ...
An objective of this approach (noted in the previous Reg article on the topic) is to produce idiomatic code rather than transliteration. C code written in Rust will not be particularly maintainable as it will preserve idioms that are relevant to C (but not Rust) and not follow idioms that would be the normal approach for code written in Rust.
Consider data checks that are written explicitly in the C code, but are unnecessary in the Rust code (because they are implicit in the language). It would be a mess to simply copy all the explicit code performing the checks into the Rust version.
Right,...
The compiled rust source is put into the system of decompilation and Rustification. If the generated source is not exactly equal to the original source and then the binary not exactly as the source binary, then the whole system is programmed to produce shit on steroids blended with acid.
I wonder what the program will do when you translate it from a C source compiled binary into Rust into binary into Lisp into binary into Javascript into binary into Python into binary into C into binary and then finally back into Rust...
He did this transpilation (using classic compiler technology) for customers. They then complained that their "perfectly working Fortran programs" experienced Index Errors.
In other words, people (including engineers and scientists) are lazy folks(like everybody else) who do not want to hear about the bugs in their "proven" systems.
https://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare/
Last time I used SUN Solaris make(2012 or so), it crashed with a memory error when the Makefile became "too big". Other Unix tools have been reported to be full of memory errors when they were first run under valgrind. So everybody is affected.
"That was OpenSSL, IIRC."
It was. But the point here is that unless you know why something is done in some particular way you can significantly screw up when you "fix" it.
"Also, how could a single guy change centrally important cryptographic code is a mystery to me."
But letting an LLM do it would be OK seems to be the current thinking.
> "But the point here is that unless you know why something is done in some particular way you can significantly screw up when you "fix" it"
Usually I would agree. However the SSL code was known to be purposefully crap. It is also very possible to write Rust code that is known to be purposefully crap and full of errors.
Why was code that was "known to be purposefully crap" used by so many people, thats a good question. The LibreSSL guys answer it in various ways:
https://www.libressl.org/
If the LLM is breaking things apart then rebuilding them, surely it can write a test script and compare results of old with new. I know this analysis only applies to simple logic not timing effects and parallelisms but surely "We get the same answer with the same data." Is a mighty confidence score. (Or we've faithfully replicated the bugs!)
They often test against the code as written, not as it was supposed to have been written. A layer to test against requirements is also needed - though requirements are also likely to contain errors*.
There will be a need for highly-skilled people for a while yet ;-)
* even if they are formally specified using something like Z, where "vacuous proofs" can exist (something is proven to be true, but what is it that has been proven)?
If you can "convince" an LLM to go against all its prior training and do things it was instructed not to do, for sure you can have a program be rewritten badly, and then have a test be written similarly so that it "passes", by the same LLM without even trying.
The person writing a test should not be the same as the person who wrote the code in the first place - assumptions slip in and are missed in testing.
Having an LLM do both, even different LLMs, just opens up even more scope for error.
And "Well, we ran it through an LLM and it said it was fine" will make your corporate liability lawyers cringe.
There's no way this would get used anywhere where it matters.
This is assuming that compilers still just translate source code statements directly into machine code. Most released software written in C or C++ is run through an optimiser before it's released. This does a great deal of mangling of what was originally coded, and will produce wildly different machine language depending on which optimiser flags you choose. Code optimised for speed looks very different to code optimised for size, and there are also many different options within those 2 choices. In fact, so much of the original structure of the source code has been thrown away by this point that any such attempt at reverse engineering will produce code that looks nothing like the original source.
Optimising C/C++ compilers are really, really good these days, which makes even manual reverse engineering back to the original source effectively impossible. At best this is going to spit out some truly ugly C code that you'll barely be able to read.
Not to mention that at higher optimisation levels the compiler will remove safety checks that it can "prove" to be unnecessary, and re-orders things to be faster.
To some extent that's what Rust relies upon for acceptable performance.
So much of the original intent of the code is certain to be lost. If the original code did contain any memory related bugs that Rust might avoid, it'll be extremely difficult to distinguish those from the checks the compiler removed.
Really, the first test needs to be Rust > binary > Rust.
At best this is going to spit out some truly ugly C code that you'll barely be able to read.
How does that differ from normal C code as produced by programmers?
As a professional programmer in several other languages, I was told off for producing legible C source.
My experience of using ChatGPT for coding is quite poor. I've seen it hallucinate entire commands and approaches whose names would make sense upon first read but if you know anything about the philosophies behind the specific language or the framework you know why they do not actually exist. Obviously that code could never work, the dreamed-up instructions don't exist.
That's why I suspect average code quality is going to significantly worsen over the next few years as ChatGPT slob is absorbed into codebase where nobody knows why the code is there and how it works. I predict we're going to see the return of major breaches because of the return of basic SQL-injections vulnerabilities etc. that were inserted by these tools.
Now, ChatGPT is trained on absolute garbage, 12-year-old wrong answers on Stack Exchange for instance, or blog posts by people who have just started coding. But, if you could train a transpiler on guaranteed correct (and up to date with latest best practices) code the result might be much better.
That leaves the enormous financial and environmental cost of LLM operations. Using an LLM as a search engine that uses 100x the energy of a regular Google search and returns a poorer answer is insane. Running a specialised LLM for mostly one-off operations of converting a piece of vital legacy C code into Rust might actually be a net-positive, though.
I think the future for LLMs beyond basic chatbots will be limited, but this might be one of the use cases of LLMs that will still be around in five years.
This post has been deleted by its author
Quote
"My experience of using ChatGPT for coding is quite poor. I've seen it hallucinate entire commands and approaches whose names would make sense upon first read but if you know anything about the philosophies behind the specific language or the framework you know why they do not actually exist. Obviously that code could never work, the dreamed-up instructions don't exist."
This has been our case too when trying to program via ChatGPT, the code spat out after entering some fairly basic instructions was to be honest, complete and utter bollocks. we did'nt dare put it in the machinery or robots.
But its the future as demanded by those in no knowledge but in charge.
"We can get rid of the programmers by using a cheaper LLM" will become their shout and anyone who stands in their way will be labeled as luddites and ignored.
Personally I'll be getting into the machinery repair business .....
"Hello Boris repairing and at your service....... you put some LLM code into the robot and it went wrong .... define wrong.... oh that wrong...... how much blood?...... are the bits inside the machinery or scattered everywhere? .... uh huh....... Its ok I can source some plastic bags at no extra charge.."
After trying to get Apache Freemarker syntax out of ChatGPT and getting back descriptions of commands which didn't exist, claims they existed in versions of Freemarker which didn't exist, and examples full of hallucinated rubbish, I gave up. Instead of making me more productive it was sending me on wild goose chases.
There's still no real understanding of LLMs' tendency to hallucinate, but that's not exactly unknown in human developers and everyone copes.
What do you mean? I don't think I've ever worked with any programmers who would 'hallucinate' code in the way an LLVM can quite easily do.
This article baffled me, made me think a lot of Dunning/Kruger.
I think those who think LLMs are capable of producing working code are hallucinating.
Given the way LLMs and the Chat style interface work, we can replace the with an infinite number of monkeys. Yes they can theoretically produce the works of Shakespeare, but the vast majority of the output will be rubbish that will require intelligent work to sift through to fine the Shakespeare nugget.
The "C Programming Langage", Kernigham and Richie, second edition is 272 pages long.
"Programmimg Rust", Blandy, Orendorff and Tindall, second edition is 692 pages long.
Q1: 272 vs 692 --- maybe there is a clue here?
Q2: Perhaps someone could write a RustToC program......and then we could get back to using gcc or llvm?
Just a suggestion.......
Q1: 272 vs 692 --- maybe there is a clue here?
C is a much simpler language designed over half a century ago for systems with KB of memory and processing speeds measured in MHz.
Rust is more moden, and therefore more complex, e.g. it has language concepts that C doesn't. (Ownership for example).
You can keep using gcc if you want, there's a project to add Rust support to it. As for llvm, that's the backend that Rust already uses.
Rust is a great language, and reminds me of the debate we had thirty years ago about whether to use the PL/1 (or PL/S the systems programming subset of PL/1 without a runtime) or use C/370.
The debate didn't change much for most of that time (other than the suggestion of Pascal/Delphi rather than PL/M), or using Lint (itself a 40 year old static analysis tool).
While it seems like a great suggestion, the main value of using an LLM is to add "AI" to the description - so much more sexy than using a static analysis tool to solve the problem.
The problem is Rust achieves type safety by applying ownership and immutability concepts, that are difficulty for LLM - what you'll get is:
1. syntax conversion
2. Compilation errors
3. Managers wondering how long it's going to complete
4. Managers complaining that "the AI has done the harp part", why are developers taking time.
So.....I asked ChatGTP to tell me about the novel "Miss Marple Screws It Up".......
.....and I got a couple of pages of review telling me how wonderful Agatha Christie was......
.....and how fantastic this novel was to read.....................
Really???
I asked it about the UK TV series The Good Life.
I invented a name and then asked about that (made-up) character from that programme.
It spouted "convincing" nonsense that had absolutely no basis in reality whatsoever.
It couldn't get the name of the pigs right, it merged character traits from other characters into my made-up ones, and at no point questioned the fact that those characters did not exist in a relatively short-run (3 series and a couple of specials) of a TV programme that it APPEARED to know all about.
Literally, try it. "Who was Gavin in the TV series The Good Life?" and watch it spout nonsense at you that it's made-up by merging things it *did* know about the programme. When I tried it, it basically merged the characters of Jerry and Tom into this mythical new character and even attributed their actions in certain episodes to him, as well as said he was married to Margo. It even generated random actor's names (generally actors who have played other characters called Gavin in other TV series) and attributed their portrayal to poor imaginary Gavin.
And even when correctly repeatedly, it wouldn't really accept that they didn't exist and if you tried it with another name it would make up a whole new level of nonsense again.
You don't need to know ANYTHING about the programme to look it up, realise that at no point is there ever a character called Gavin, and say so.
And all you need to convince it is something like "Why were the pigs called Peppa in the TV series The Good Life" - and it makes up a back story that doesn't exist in the programme as to why the pigs were named after another pig that didn't exist on TV for another 40 years. (The pigs were actually called Pinky and Perky which is an obvious cultural reference for anyone of a certain age).
I asked ChatGTP to tell me about the novel "Miss Marple Screws It Up"..
Could have been worse. Its may have hallucinated a prurient bodice ripper in which the lady detective takes an active, a very active part. :)
Miss Jane and Hercule as you have never (wished) to imagine them (together.)
Careful what you ask for... even in jest!
Apple's Rosetta 2 is essentially such a decompiler, but rather than decompiling x86 object code all the way to source code, it decompiles it to an intermediate format which is then recompiled into ARM object code.
However, Apple will have prepared for their ARM migration for years and they control the development toolchain, so they could have had the compiler insert "hints" into the object code that would improve the process. It would still have to handle cases where the code so old it predated such compiler changes, but they could adopt a more "conservative" approach when translating to ARM in such cases, with a performance hit. But there would so little such code it wouldn't be encountered by the overwhelming majority of Mac users.
Once you've translated object code into an intermediate format, converting that to a high level language like Rust (or Swift, in Apple's case, I suppose) is easy by comparison. Not necessarily all that readable, as you won't have useful variable names and the structure will be really weird in places, but that could be cleaned up manually by a human (or, DARPA hopes, an AI) to provide Rust source code of passable quality. Enough that future development could take place from that base.
Converting one instruction set to another is a vastly less complex task than one high(ish) level language to another. The crucial difference is the CPU only has to run the code, it doesn't find automatically generated code constructs hard to understand, it doesn't have to try to develop them further, it is also not changing from one ISA to another to try to improve memory safety.
x86 machine code to ARM is messy but relatively straight forward.
Mainly due to the non-orthogonal mostly deranged X86 instruction set. With Frankencode addressing modes and layer after layer of special purpose instructions mainly dreamt up by Intels marketing department to pretend something new has been added to the latest x86 processor. Whereas apart from the x64 instruction set surgery nothing had really changed since the Pentium M days. Although super-scaler only really worked after the x64 surgery.
So for 80%/90% of the x86 instruction stream its going to be a simple instruction table map. After you have translated the x86 instruction stream into something sane. Just like the original AMD K5 did. Always felt sorry whoever had to write the Rosetta code to handle the huge number of x86 special cases instructions / instruction sequences. It would be just one "WTF are these guys doing" after another. If you ever had to write a x86 codegen machine code binary output stage you would know what I mean. The temptation to go down to Santa Clara with a machete and find who was responsible can be overwhelming at times. Just as user feedback you understand.
So a very different problem. Several orders of magnitude less complex.
Always felt sorry whoever had to write the Rosetta code to handle the huge number of x86 special cases instructions / instruction sequences. It would be just one "WTF are these guys doing" after another.
Apple controlled the compiler. They would only have to deal with WTF cases their compiler generated. It isn't like people were still writing assembly code sequences in the 2010s. OK maybe in some drivers, but Rosetta 2 didn't support drivers (the hardware was different, so they wouldn't work anyway)
Because you can't actually write code to convert C to Rust accurately, and the closest you'll get after significant effort is a program that converts most C to valid Rust without fixing any of the memory issues for which Rust is supposed to be better. Getting to that level is already a substantial effort. So because the problem is hard, someone decided to make it someone else's problem. They won't get any better results out of an LLM, in fact they're likely to get much worse results, but maybe they can get someone to give them money to try doing it.
I'm afraid that the person who came up with this idea probably does not understand LLMs or Rust, especially the reasons why Rust is believed to be better. It's only better because it makes it harder for programmers to do one class of wrong things, but without programmers in the loop, it won't help with that either.
Guy argues that maybe C isn’t so obsolete compared to Rust.
https://dev.to/sucuturdean/is-modern-c-better-then-rust-3odf
Fundamentally I don’t care much about which programming language: every “command” in every language is just another program. I think Lisp really took this to the extreme, plus Logo I’m pretty sure.
Even machine language op-codes are essentially hard-coded routines in silicon circuitry, go grab a byte, shift bits left or right, etc. For example, there are “kext” modules you can add in to your Hackintosh implementation to substitute slower subroutines in place of missing processor extensions.
In summary: any program is nothing but a sequenced assemblage of other programs, from a “language” that comprises a particular selection of programs to choose from. DOS and Unix scripts are clearly this.
So, what difference between C and Rust? Apparently Rust is more abstracted, and C you have to explicitly destroy all the objects and variables, even integers, when the program terminates.
But anyway: handing some complex translation task over to A.I. seems like it’s definitely going to inject bugs. Sure you can test the code, but I can bet this creates new classes of bugs that defy testing.
(Had a gnarly one yesterday: apparently a bug crops up between the ATA and the modem due to UDP timeout, believe me I have barely any idea what this refers to, that PERSISTS after reboots/resets of both Modem and ATA, which isn’t even a SIP AGL collision, anyway upshot is you need to pick a new random SIP port number, say 30000-60000 and not 5060, to “break” the confusion. The symptom is nobody can get through by dialing your VOIP ATA, it never “rings” and every call goes straight to voicemail. Or rather, you might temporarily “solve” it by resetting, but shortly the confusion comes back and symptoms resume. You can’t say the programmers are stupid but you can say they don’t care about fixing the problem.)
Link: https://forum.fongo.com/viewtopic.php?f=8&t=19934
I like what I have read about rust, but this is crazy.
Assuming you have some binary without source code and can perfectly convert to C or rust, what does that get you? You could automatically add checks things like buffer overflow, ok great!
More than likely you will also find a pile of things that rust says is wrong and you will not be able to compile the code.
How exactly is anyone going to know how to fix them if you don't have the original source code? The likelihood of any requirements, design docs, automated tests or test specs (that missed the issues in the first place) is going to be very low...
Anyone good who knows how to program in rust is unlikely to want to work on this stuff.
How would an LLM handle apparently unnecessary time delays?
I'm thinking of a project we worked on about 30 years ago for automating a plastic granule selection/distribution system to molding machines. A requirement was that in some situations timing was more important than sequence order.Other times it was the opposite way round.
Although there was some very limited communication from the machines, they were independent from each other and in no way synchronised. Added to which the flow rate of the granules varied by pipe route, granule type and humidity. The consumption of each machine also depended on granule type, forming speed and cross section of output product.
The only practical solution at the time was to slow everything down and keep topping the machines up on a round robin basis, but if any machine notified imminent loss of local storage, immediately switch everything to feed it - when I say immediately, the process took anything between 30 seconds and 3 minutes depending on what routing changes were needed - and of course the granule type.
Several years ago (well over 30) before the widespread advent of microcontrollers with timers and built-in peripherals I wrote quite a few bit banged UARTS.
That means counting machine cycles for every instruction to ensure each bit was timed properly with the judicial insertion of NOOPs to maintain the timing. I think the type of tool suggested in the article would barf on those.
Then there were the integer multiply and divide routines that had to have constant timing regardless of data (so run time was data independent); those got very interesting doing 32 bit stuff on an 8 bit machine.
So for situations where the timing really matters I am not sure this type of tool would be suitable (that's an understatement I suspect).
About once a decade or so someone or other makes grandiose claims about some newly created magical doohickey technology that will turn everyday mostly works code into the Holy Grail - Code That Has No Bugs..
Of course the magical doohickey technology doesn't deliver and doesn't work. Never does and never will. Because the doohickey technology creators neither know what the problem actually is and certainly dont have enough real world domain expertise to propose and create something that might improve code quality to any appreciable degree.
The best subject survey of this area is still "Automatic Algorithm Recognition and Replacement" by Robert Metzger. Back in 2000. Pretty thorough when it comes to what is really involved in deep refactoring of code. Which is what this problem actually is. And not a lot has changed since then.
Let's see. Big problems with latest doohickey technology.
Almost all code refactoring tools (which is what the problem really is) that kinda work work with source code. For a very good reason. Binary obj code analyst and remapping can be done for low complexity simple test code. Bigger real world applications executables, forget it. For a start all are running on a platform API so that has to be fully characterized before you even start on the application binary. Most real world applications are 80%+ direct / indirect calls to platform API's. Try even spec'ing characterization of the Win32 API and its many (not just edge condition) bugs. That way madness lies
ML generated code. The track record there is even worse. Simple code snipped. Usually work. If you want a few line of JS, C# or some scripting language. But all you are getting is the code you would find in StackOverflow. But without the explanation, documentation, discussion or the code correction mechanism. By real people. But for non trivial real world coding problem ML code analysis / generation just Shakespeares Monkeys banging away on a keyboard.
As the people who tried to design and build software fabricators, generative programming tools etc for the last 60 odd years have discovered - its easy to build systems that create well formed and working low complexity software units. But when you try to scale up to useable (shippable) high complexity / high functionally applications - it never works out. So back to hand carved code if you want to ship something that works.
Finally we have Rust. A language (still) without a full formal spec which ticks ever single box for all programing language in the past that failed. The Modula II tribute language.
Memory safe language? BASIC and PASCAL were memory safe languages. And DEC even shipped wickedly fast compliers for both. Back in the 1970's. Java is memory safe. Aint got no pointers. Although you can do real mischief with JVMTI, JNI and reflection. Dont ask.
When it comes to memory safe programming the problem is never the language its always the programmer. When writing code for bare iron languages like C / C++. Anyway C had dev tools that checked source code for memory unsafe operations more than 40 years.
So yet more Reinventing the Wheel. By people too lazy to find out how others had solved the problem. Many decades ago. If you go digging you will find some absolutely brilliant solutions to pretty much every real world software problem you are every likely to run into. Every problem has been solved before. The only thing that really changes is how fast the processors are and how much memory you have. Not much else changes.
> [There are] two good unanswered questions if this does work...can an LLM be trusted with security-critical code when we don't know how it works and, in this use case, won't understand the results whether they're good or bad?
The answers are, respectively, "obviously not" and "obviously not". Hope that helps, please keep an eye out for my invoice.
(Also, the phrasing on the second question is deceptive: it implies that "understanding", of anything, is a thing that LLMs can do, and the only question is whether it can in this particular case.)
The Rust runtime isn't somehow safer than the C runtime - badly written Rust with memory issues fails to compile, it is not some sort of sandboxed runtime (like Java/C#). So, to automatically convert C to Rust and to gain any benefit requires the transpiler to either generate Rust which fails to compile (and an operator to fix this) or can deduce the dodgy C patterns, and replace them with something better in Rust. If it can do this, it can fix the C pattern to a better C pattern, and not convert the whole source tree.
So, if the technique works, the first thing to do would be to produce an awesome static code analyser which can identify and spot bad C patterns, and either tell you about them, or fix them.
However, of course, it won't work, it doesn't exist, so this is all a waste of thinking till someone actually produces something that is more useful than current reengineering techniques.
Anyhow, i'm sure we'll hear more about this and the awesome problems it's fixed and resolved. Whether this will work in the real world is TBD.
One small question is against what data you will be training decompiling to. All logics and source code do compile against the same set of instructions albeit with its own each number of repetitions,loops conditionals and so on. But each of these "patterns'do play its own different semantics in the original source code. So if you do not have that source code how you are planing to train the dummy AI? It the inference here is the AI will come by itself with its own inference magically. As far we know AI needs to be trained with a lot of data to get to optimal "prediction". So is the article statement is to claim as real working approval such thing?
This proposal strikes me as a good way to accelerate the ossification of computing. If we decide we want to re-write everything from scratch, that should be the opportunity to throw stuff out, and re-think key things. The language itself should determine the paradigm of the OS and software: look at Oberon, Lisp machines, and to some extent the "object oriented" nature of Haiku written in C++. Rust for its part is leading to experimental OS such as Theseus.
Our civilisation more broadly is unable to clear out old stuff to make room for the new anymore. Consider the Boeing 737 - an airframe design that should have been retired over a decade ago. Instead they hang new engines off it and some new avionics and call it "job done".
Likewise, we seemingly cannot throw out old software and design, say, a new OS from the ground up for todays needs whose architecture can be inspired by its own choice of language. Instead we have today's most banal OS like windows and linux translated into a newer language - as if it were some sacred religious text being translated from its native tongue to English. Where is the progress?
This is a brilliant idea, and I predict that it will be highly successful.
The aim of the project is clearly simple - it’s a money laundering operation that will be used to convert taxpayer funds into real estate, shares, and other hard assets.
String together some buzzwords associated with all the latest things - AI, Rust, Cybersecurity etc .. and use that to pass funding to a project that can’t be measured. Pay a bunch of techs to chase rainbows… and pocket the difference.
At the end of it all, file away a report on the “findings”, rinse and repeat with the next latest thing.
Success !
Decompilation will not help with refactoring C code to Rust. Although this could be possible producing a source code in Rust from the binary. But it could not change data structures used in original code. Multiple references frequently used in programs written in C will not feat strict requirements of Rust borrow checker. In order to pass these requirements data structures and code working with them has to be redesigned, some time completely.
io_uring
is getting more capable, and PREEMPT_RT is going mainstream