back to article DARPA suggests turning old C code automatically into Rust – using AI, of course

To accelerate the transition to memory safe programming languages, the US Defense Advanced Research Projects Agency (DARPA) is driving the development of TRACTOR, a programmatic code conversion vehicle. The term stands for TRanslating All C TO Rust. It's a DARPA project that aims to develop machine-learning tools that can …

  1. Rich 2 Silver badge

    Missing the point?

    If you can develop tools to translate C to Rust then if the C code DOES contain memory errors then either (a) you’re going to copy those errors over (which presumably is impossible because the constructs to do so aren’t there in Rust) or (b) the tools will catch the errors and complain. If it’s (a) then you don’t gain anything. And if it’s (b) then why not just fix the bugs in the C code rather than risking rewriting it?

    1. Richard 12 Silver badge

      Re: Missing the point?

      The only way this can possibly 'work' is if it simply adds "unsafe" around the entire program.

      Thus converting to Rust, but achieving nothing whatsoever.

      Of course, in reality what it will do is spew out something that vaguely looks like Rust but won't even compile, let alone do what the original code did.

      1. Jonathan Richards 1 Silver badge
        Joke

        Re: Missing the point?

        So, that would be "something almost, but not quite entirely, unlike C"?

        1. Will Godfrey Silver badge
          Happy

          Re: Missing the point?

          and definitely not the British Rail variety

      2. fg_swe Silver badge

        WRONG

        Depending on the C code style, it could be that it can be nicely translated into Rust. Rust then adds the runtime checking code that sometimes is required to ensure Memory Safety.

        In other cases, the C code will use crazy pointer arithmetics and the like and manual translation into a sane and secure style will be required.

        1. Dan 55 Silver badge

          Re: WRONG

          Oh, look who's turned up to sell their snakeoil version of Typescript for C by spamming an article on memory safety.

          1. fg_swe Silver badge

            Re: WRONG

            TypeScript copied the Sappeur idea of strong typing and transpiling in a much weaker typed language. Sappeur was first.

            1. Dan 55 Silver badge

              Re: WRONG

              It's just a shame you removed it from SourceForge so there's no proof of that any more.

          2. Anonymous Coward
            Anonymous Coward

            Re: WRONG

            Does that make them a spammeur?

    2. JamesTGrant Bronze badge

      Re: Missing the point?

      My first thought - if it could do as stated, wouldn’t it be better to say ‘here’s all the code - can you see any memory access vulnerability problems?’. Then, if you care, you could fix ‘em. No need for the thing running the code to support a different runtime.

      It’d still be frickin’ amazing to have a static code analysis tool that could model every possible combination of interactions between concurrent threads and their associated memory access behaviour. Actually - that sounds just as magical.

      1. fg_swe Silver badge

        Not Magical - Sappeur

        The Sappeur type system forces the software engineer to clearly separate thread-local and thread-shared data structures. Thread global data can only be accessed via mutex-protected methods.

        See http://sappeur.di-fg.de/manual.pdf, section 9.2.

        1. Nick Ryan Silver badge
          FAIL

          Re: Not Magical - Sappeur

          Thread global data can only be accessed via mutex-protected methods.
          So what you are admitting then is that this magic system just breaks the code because without ever being able to understand the intent of the code then it will never be able to magically recreate it in some other private, closed environment.

          1. fg_swe Silver badge

            Re: Not Magical - Sappeur

            Type systems indeed sometimes force software engineers to use a different, often safer approach.

    3. Anonymous Coward
      Anonymous Coward

      Re: Missing the point?

      Same thoughts here .... It would make more sense to even actually build C compilers that under the hood guarantee memory protection but compile the old code as is. The old code has its fit for specific use case sand results and unless one is thinking vof testing every simple code path and and implications it is a mute exercise. Just build new compilers with the same C constructs and memory allocation functions but this time with memory railguards and protection. It the era of virtualization I do not think a little overhead is an issue. But changing the whole semantics of the code is a different level of risk.

      1. 1947293

        Re: Missing the point?

        To some extent that already exists - the fsanitize options in Clang and GCC can add a significant amount of runtime error detection. The problem is a) the performance cost is not trivial and b) having code crash under the sanitizer is substantially less appealing than detecting problems at compile time.

      2. fg_swe Silver badge

        Impossible

        See http://sappeur.di-fg.de/WhyCandCppCannotBeMemorySafe.html

        1. Skwn

          Re: Impossible

          As currently is yes but if built from scratch to provide the same C syntax I do not see any hindrance to building such compiler. Would not be different from building any other memory safe language. Most C coders do have their own memory management library functions above C to manage their coding requirements. Most of the exploits are coming from direct usage of direct C memory management functions.

    4. Blazde Silver badge

      Re: Missing the point?

      If you can develop tools to translate C to Rust then if the C code DOES contain memory errors then either (a) you’re going to copy those errors over (which presumably is impossible because the constructs to do so aren’t there in Rust) or (b) the tools will catch the errors and complain.

      A typical memory vulnerability would actually end up under a 3rd case: c) You copy the errors over and when they're encountered at runtime your program will panic and exit safely instead of becoming exploitable.

      That's a big win if you turned an arbitrary-code-execution situation into a denial-of-service issue. (And the denial-of-service issue will crash your program right the bug is, rather than trillions of instructions after it, so it'll be easier to fix).

      Some portion of the memory safety just has to happen at runtime because you can't magically static analyse everything as mentioned above.

      The other factor is that 'fixing bugs' in a way that truly satisfies the Rust borrow checker sometimes will entail huge design-level changes to your program's data structure. Essentially: enforcing a clean ownership model close to what good quality C/C++ code should have but very often doesn't. Even if an automated tool can identify what needs to be done on that front (I'm sceptical) then porting those changes back to the C code would only leave that one careless commit away from having broken memory ownership again. The Rust compiler will continue to enforce it in future (at least, I suppose until some numpty forks the compiler and introduces a 'turn off borrow checking I just need my code to compile' convenience feature which becomes popular. Which is probably inevitable).

      1. fg_swe Silver badge

        Re: Missing the point?

        Thanks for your well reasoned post. The C and C++ memory models are very much impossible to make memory safe.

        1. G40

          Re: Missing the point?

          How so?

        2. that one in the corner Silver badge

          Re: Missing the point?

          > The C and C++ memory models are very much impossible to make memory safe.

          The assembly language and raw binary opcode memory models are very much impossible to make memory safe.

          Therefore we should stop running anything.

          OR we could put a layer on top to help.

          Like, say, any of the extant C/C++ libraries that do provide memory safe versions of everything you need to do.

          1. Anonymous Coward
            Anonymous Coward

            Re: Missing the point?

            Actually you can build the new C compiler like you build for any other language C#, Compiled java etc. You can build a new memory safe C compiler using the current C languge. Just during the syntax and semantic analysis you just have to enforce the memory management for the current C memory functions. It probably may not be efficient as writing it in assembler language. But most probably mire efficient and that most new high level languages including Rust and more importantly not breaking existing code.

    5. captain veg Silver badge

      Re: Missing the point?

      I would say that the point is future maintainability, both because it reduces the possibility of some noob introducing new bugs, and also assumes that Rust will eventually take over for system progamming, leaving C-knowledgeable greybeards akin to latter day Cobol coders.

      -A.

    6. Roland6 Silver badge

      Re: Missing the point?

      The trouble is C has a formal language definition, Rust doesn’t; although some are trying to after the event create a formal language definition.

      Until there exists a formal language defintion, against which Rust code can be evaluated and against which a training set of C to Rust can be prepared, the translation any language to Rust is going to be problematic, with or without AI.

      1. LionelB Silver badge

        Re: Missing the point?

        In practice, I guess the compiler(s) define the language. Clearly not ideal.

        1. that one in the corner Silver badge

          Re: Missing the point?

          > In practice, I guess the compiler(s) define the language.

          Sadly, that *is* precisely the formal model that Rust currently goes by.

          > Clearly not ideal.

          That is a well-used formal model for a language and one that has been used often enough in the past. And it is a model that works well enough for a certain class of languages (e.g. any of those used for interactive training slash text-based adventure games: so long as your implementation matches "the master copy" all is good).

          But it is *not* a model that should *ever* be applied to a language that will be used for low-level production work, most definitely not one that promises to do the memory management for you.

          Without a formal language spec, and preferably one that includes the formal maths proof of its claims, Rust is still very much in the "here is our demonstration piece, if you like it we will take it to completion" phase of life.

          But too many people, shamefully, are pushing Rust as something that is ready for major use by everyone TODAY.

          I truly wish it were. But it ain't.

  2. b0llchit Silver badge
    Alert

    Recursive code fixes

    Another challenge is that C allows code to do things with pointers...

    That means you need to understand the code and abstract the algorithm before you can make any correct translation. That definitely excludes all current forms of ML/AI/LLM.

    And then, "errors" or "bugs" in the original sources will also be transliterated into the new language making the problem worse. You probably need to go through hoops to translate the original source and the result is most likely worse than the source. At least it will be less maintainable.

    Or, we will just "invent" a new program to fix the old program that needs a new program to fix the old program to fix the old program that needs a new program to fix the old program to fix the old program to fix the old program that needs a new program to fix... [recursion limit exceeded, core dumped]

    1. cornetman Silver badge

      Re: Recursive code fixes

      I dunno.

      It seems to me that the only way that this could work would be for the translation tool to recognise idiomatic code segments and translate them to suitable correlations in the language at a sufficient level of abstraction. This could only really work in code that is pretty high level at the moment and a lot of C and C++ code is in that space. So, recognising array processing loops etc with all of their potential issues regarding wandering into undefined areas could be left behind by a corresponding "safe" iterator that was idiomatic to Rust and could leave a lot of that unsafe behaviour behind.

      I think that commentators here are a little bit too quick to dismiss this idea in its entirety. It is certainly an interesting idea and even the researchers quoted in the article are up front about the potential pitfalls and that there are low-level cases where this wouldn't work. However, there is a huge quantity of code out there that could possibly benefit from this type of analysis.

    2. Dr Paul Taylor

      Re: Recursive code fixes

      All this sounds like trying to solve the Halting Problem using AI, aka snake oil.

      1. SCP

        Re: Recursive code fixes

        A claim that any arbitrary program could be automatically converted from C to Rust would certainly be in the same league as snake oil, but just as the Halting Problem is a statement about any arbitrary code it does not mean a great deal of very useful work cannot be done. Automated conversion of C to Rust could be usefully achievable for a large body of code. "Perfection is the enemy of good" - Voltaire

        A key aspect of translation (mentioned in the article) is getting idomatic code rather than transliteration - this is necessary to make the code maintainable. The LLM approach may prove more adept at this aspect (in the longer term). It will be interesting to see what can be achieved when LLM approaches are coupled with other analytic techniques (such as Abstract Interpretation). It would be interesting to see what an LLM trained on formal specifications could achieve.

    3. fg_swe Silver badge

      Not Always True

      There exists the possibility of a nicely written C program without any insane aspects. This program still contains exploitable memory bugs. A clean translation to a memory safe language will then defend the program against exploits that use memory errors. So your sweeping assertions are not right.

  3. Dan 55 Silver badge
    FAIL

    The software industry keeps digging its own grave

    Translate C to Rust... maybe. With an LLM? Are you absolutely fruitbat insane?

    1. Inventor of the Marmite Laser Silver badge

      Re: The software industry keeps digging its own grave

      Fruit Bat Insane

      FBI

      I seem to recognise that from somewhere.

      1. b0llchit Silver badge
        Black Helicopters

        Re: The software industry keeps digging its own grave

        Very good, sir! Please stay where you are. The Fine Black Infantry(*) will be there shortly to escort you to your final destination.

        (*) MIB mandated suit in black, of course

        1. fg_swe Silver badge

          Re: The software industry keeps digging its own grave

          You are confusing them with the KGB.

    2. matjaggard

      Re: The software industry keeps digging its own grave

      I don't understand why all the commentards are so against this concept. Human + AI make a pretty good combination for some tasks and I don't see why this wouldn't be one of those tasks. AI can likely translate 90% of the code to safe Rust, humans can review places where it fails or where it outputs unsafe code (the advantage of Rust being that unsafe code is labelled as such). This is an easier task for AI than static code analysis because where the AI fails in code analysis there's nothing to indicate to the human where the failure might be.

      1. Richard 12 Silver badge

        Re: The software industry keeps digging its own grave

        Because it simply cannot possibly work - well, unless someone proves P=NP.

        LLMs are better described as "stochastic parrots".

        They are very good at producing human-parsable text that looks perfectly reasonable. They're also pretty good at producing human-language transliteration and even approximate translation.

        Often sufficient for a human to understand what the original text meant - eg "tomato and cheese flatbread" is definitely wrong, but a human can figure out that could be pizza.

        They have absolutely no understanding of the input they've been fed, are imprecise and inaccurate by design, and even the largest models can only be given a few hundred, perhaps a thousand tokens before the output is total garbage.

        Writng software requires ridiculous levels of precision and accuracy, because the compiler is an algorithm - not a human who will overlook and subconsciously attempt to autocorrect mistakes.

      2. Dan 55 Silver badge

        Re: The software industry keeps digging its own grave

        AI can likely translate 90% of the code to safe Rust, humans can review places where it fails or where it outputs unsafe code (the advantage of Rust being that unsafe code is labelled as such).

        It won't fail, it will just use incorrect syntax which won't compile. Or, worse, silently alter the logic of the code and leave it to someone to find the problem later.

        If it fails when transferring shell scripts to Python or one version of SQL to another or returning the full syntax of a command you give it, which is what I've tried LLMs out for, you'd have to be out of your mind to use it for converting a full C project to Rust.

  4. druck Silver badge

    Just one question...

    ...have they lost their fucking minds?

    1. ecofeco Silver badge

      Re: Just one question...

      Have you seen the rest of the world lately?

      The psychopaths are in charge.

    2. Roland6 Silver badge

      Re: Just one question...

      No they have just written a bid for a large amount of “AI” research monies which they will let probably get if no one on the bid appraisal side has any real understanding of programming language translation and AI.

  5. Mike 125

    examples

    Example of code which from my understanding can't be guaranteed inherently safe in any language: In/output generally, e.g. network buffers.

    Output: Move date from native representation (i.e. defined by the local machine), to network packet representation (i.e. defined by the 'to-the-wire' network protocol).

    Input: Do the reverse (even more dangerous).

    Now do it portably. Hmmm... ok...

    Now do it efficiently- because that's what will be demanded. Hmmm... ok...

    Now do it in an 'inherently safe' language in 'safe' mode. By my understanding, that's impossible.

    So it'll be labelled 'unsafe'. Fine. So why not write those unsafe parts in C, which can already do the job supremely well. And then do all the rest in whatever f'ing language you choose?

    The hard part is 1) understanding that some parts *cannnot* be made safe by the language alone, and 2) recognising where the safe<>unsafe transition lies.

    Once that's understood, (assume LANGUAGE_X is mandated), I don't really see how 'LANGUAGE_X_SAFE' + 'LANGUAGE_X_UNSAFE' helps the codebase, AI assisted or not.

    It may make it worse. People who understand these issues (i.e. the right people for the job) will be forced to use 'LANGUAGE_X', which they probably hate(!)- because it's not the best tool for the job, instead of C which they know very well... is.

    But as I've probably said before... let's C.

    1. fg_swe Silver badge

      Completely Wrong

      Entire operating systems have been written in Algol, C#, Java and Rust. They do need a certain amount of unsafe code for certain operations such as setting up a new process image. But all the things you mention can be done in a memory safe language. There is ZERO reason to parse data incoming from the network in C, except maybe "execution speed".

      https://en.wikipedia.org/wiki/Burroughs_Large_Systems

      https://en.wikipedia.org/wiki/ICL_2900_Series

      https://en.wikipedia.org/wiki/Singularity_%28operating_system%29

      https://en.wikipedia.org/wiki/JavaOS

      For example, there exist lots of Java based web servers and the seem to be doing quite well. No need to use C for that.

      1. Nick Ryan Silver badge

        Re: Completely Wrong

        There is ZERO reason to parse data incoming from the network in C, except maybe "execution speed".
        You have, very unfortunately, summed up exactly why software performance is so poor these days. Software should be optimised by optimising the code, not by buying new damn hardware because an absolute clown of a developer cobbled together code in an incredibly inefficient manner thereby requiring a substantial hardware upgrade to maintain the same performance as before.

        1. fg_swe Silver badge

          Re: Completely Wrong

          You can write quite efficient code in modern memory safe languages; the problem these says is uneducated, self trained programmers developing software.

          Combine that with feature creep and even less qualified managers and programs becone 10000x less efficient than possible.

  6. An_Old_Dog Silver badge
    Devil

    Poppycock!

    "... the software engineering community has reached a consensus," the research agency [DARPA] said, pointing to the Office of the National Cyber Director's call to do more to make software more secure.

    1. Offices neither call nor say anything. They are not sentient entities, they are merely rooms, usually filled with desks, common verbal misusage notwithstanding.

    2. "Office of the National Cyber Director" != "the software engineering community".

    3. A call to do more to make software more secure is not, in itself, an explicit endorsement of converting, either manually or with LLM assistance, software written in C or C++ into Rust.

    4. LLMs: Garbage In, Garbage Out.

    5. LLMs: just another (hyped-as) cure-all of the moment. They're good for everything from too-high programmer salaries to low code quality, high transaction latencies, and liver spots. Buy now!

    6. Why convert to Rust, vs converting to a different memory-safe programming language, such as Ada?

    7. Concensus is irrelevant in matters of fact. The natural number "one" is less than the natural number "two"*. That's an immutable fact, closed to interpretation, PR spin, or "what the concensus is."**

    * I'm writing about the numbers themselves, and not about any internal computer representations of those numbers.

    ** There are too damn many Golgafrinchan Ark B people on this world, including many who write computer programs.

    1. b0llchit Silver badge
      Coat

      Re: Poppycock!

      The natural number "one" is less than the natural number "two"

      I beg to differ! Two plus two is five, if you must insist, and pi is a whole number according to some legislative forces, just like cold weather proves the absurdity of a warming climate.

      Now, please step aside to let the venture capitalist make some more money from our socialist taxes for the poor.

      /s

    2. Doctor Syntax Silver badge

      Re: Poppycock!

      "Concensus is irrelevant in matters of fact."

      Indeed. Anyone who thinks otherwise should go and read Feynman's review of the decision to launch Challenger.

      1. Ian Johnston Silver badge

        Re: Poppycock!

        "Consensus is irrelevant in matters of fact" is a pointless tautology, since facts are essentially those things for which consensus (yes, that's how it's spelled) is irrelevant. All you do is shift the argument along one; to whether a particular concept requires consensus or not.

    3. katrinab Silver badge
      Alert

      Re: Poppycock!

      "4. LLMs: Garbage In, Garbage Out."

      Even if you fed it only the finest quality training material, you are still going to get garbage out, because it is looking at statistical relationships between words rather than actually understanding the meaning of them.

    4. Ian Johnston Silver badge

      Re: Poppycock!

      7. Concensus is irrelevant in matters of fact. The natural number "one" is less than the natural number "two"*. That's an immutable fact, closed to interpretation, PR spin, or "what the concensus is.

      Is the integer "minus one" more or less than the integer "minus two"?

  7. Anonymous Coward
    Boffin

    Design an advanced MMU to protect against memory safety bugs in the software.

    Memory safe code in Rust would still not protect against malware. The solution is obvious. Design an advanced MMU to protect against memory safety bugs in the software.

    a. Enhanced isolation between user-space and kernel-space in hardware.

    b. Extended isolation of individual processes in hardware.

    c. ASLR implemented in hardware.

    Don't tell me all the reasons it can't be done!

    --

    $83.4 billion: total profit (2022): ASUS, Acer, Dell, HP, Lenovo, Microsoft

    1. fg_swe Silver badge

      Nonsense

      1.) Rust(and other proper memory safe languages) DOES protect against the 70% of CVE exploits which are memory insafety bugs.

      http://sappeur.di-fg.de/Sappeur_Cyber_Security.pdf

      2.) An MMU can never be as fine-grained a protection as a proper memory safe language such as Rust, Sappeur or Java. Rather, MMUs operate on large chunks of memory, typically 4kByte or more.

      3.) Memory safety should be enforced on the Type System level by the compiler. Very powerful things can be done there.

    2. An_Old_Dog Silver badge
      Boffin

      Re: Design an advanced MMU to protect against memory safety bugs in the software.

      @t245t:

      * A MMU, no matter how designed, cannot do the job alone. For success, the CPU, MMU, and memory hardware must be designed in an integrated way.

      * You're fighting the consequences of data also being interpretable as instructions.

      * Split Kernel/User memory space has been around at least since the 1970s.

      * Split I+D (instruction and data) memory space has been around at least since the 1970s. I've recently seen it referred to as W^X (write or execute).

      * Previous and current MMUs prevent Program A from reading (setting aside predictive execution side-effects, which are a consequence of CPU design, not of MMU design) and writing Program B's memory space without permission from Program B.

      * OSes which implement ASLR do so by manipulating current MMUs. No "new" MMUs are needed for ASLR.

      I think what you really want are some of the features available in the Intel iAPX-432. Sadly, Intel couldn't get it working well enough, soon enough.

      1. fg_swe Silver badge

        CPU Never Sufficient

        Some integrity and strong typing checks can only be done by the compiler.

        Also, an optimizing compiler can remove many bounds checks in properly written code.

  8. sitta_europea Silver badge

    More than three decades ago I developed a business system using dBaseII.

    A few years later I wrote some code to convert it into C. Mostly because dBaseII was very slow, and dBaseIII was slower *and* riddled.

    I'd expected it would probably work out around 80/20 automatic to manual conversion, but as it happened it worked out way better than that - I'd guess around 95/5.

    The resulting system worked extremely well, and I continued to develop it.

    But I always worried about memory safety, so I wrote a few memory-safe routines which I used instead of some of the standard library functions - in particular l wrote a protected version of malloc().

    After a period of - ahem - improving my code, system halts because of things like out of bounds memory accesses simply stopped happening.

    Thirty years later my code is still running businesses, and it's that long since it last halted for a memory problem. It's never been compromised.

    If you want memory safety, from my experience I honestly don't think you need to do a lot more than code a few new library routines. Call it 'safelibc' or something like that.

    Turning to AI for this reminds me of that old chestnut about the guy who decided to solve his problem using regexes... now he has two problems.

    1. Anonymous Coward
      Anonymous Coward

      So I wrote a few memory-safe routines ..

      @sitta_europea:

      > so I wrote a few memory-safe routines which I used instead of some of the standard library functions - in particular l wrote a protected version of malloc().

      Interesting. Ages ago, I recall reading in Dr. Dobb mag a similar memory-safe wrapper around standard functions that added little overhead to the standard functions. iirc the wrapper overloaded the main functions so the main code read as normal.

      1. williamyf
        Alien

        Re: So I wrote a few memory-safe routines ..

        SEcond that, I read a similar article (probably NOT by the same author) in BYTE magazine.

        There must be a reason why neither microsoft, nor borland, not wacom, not intel, not GCC have adopted any of those as an extension of their libraries ... probably the trisolarians are preventing us

      2. Nick Ryan Silver badge

        Re: So I wrote a few memory-safe routines ..

        Yes, it's impressive what can be done when developers are truly knowledgeable. With good design and planning, the boundaries between safe parsed inputs and optimised internal code can be put in place meaning that the performance hit is as negligible as possible. Unfortunately, there is a serious dearth of truly knowledgeable developers out there and lots of "shiny chasers" who think that the more overhead in a system and the more magic fairly dust liberally sprinkled the better the code is. These tend to be linked to the same snake oil sellers who try and sell magic AI generation or translation of code, or magic overlays on system code that can magically fix errors and so on.

    2. fg_swe Silver badge

      Wrong

      Just because you THINK it does not have memory bugs, does not mean this is true. When "well tested" Unix Userland tools were first run with valgrind, tons of bugs were discovered. More bugs might be in them, just not discovered by the input data constellation.

      Also see

      http://sappeur.di-fg.de/Sappeur_Cyber_Security.pdf

    3. AndrueC Silver badge
      Meh

      Or use C++.

      • Does away with the horrors of malloc()/free().
      • The STL hides pointers reasonably well and gives memory safety (unless you disable checking).
      • Design with RAII so that object creation/usage/destruction is automatic.
      • Create copy constructors private by default.
      • Define parameters const by default.

      The result is pretty safe code. As long you have a team of developers who understand and adhere to the rules or some kind lint-alike that that can catch rule violations.

      But the chances of having such a perfect team and/or trustworthy tools (or the approval from manglement to buy those tools) is low. Therefore switch to a safer language that enforces the rules for you and your team.

  9. martinusher Silver badge

    Memory safety is a design issue

    The only people who come across memory safety issues on a regular basis are those who work with a heap or other dynamic memory pool. This is an essential component in C++ programming since object instances can't be static**. This type of program also makes free use of automatics -- variables declared on a stack that assumes that the stack is indefinite, and so infinite, length. Away from this sort of environment programming is both more prosaic and so more controlled. Such programs will translate easily into Rust because they don't do anything that is likely to invoke Rust's signature features. Lots of productive work but in reality nothing actually getting done (in other words, "just another day at the office"!).

    (**Not going to nitpick here. Too early in the morning.)

    1. Ace2 Silver badge

      Re: Memory safety is a design issue

      “Lots of productive work but in reality nothing actually getting done (in other words, "just another day at the office"!).”

      Wow, that’s my last week in a nutshell.

      1. fg_swe Silver badge

        Ada

        The most successful fighter aircraft flight control software project(measured in fatal losses) STILL uses Ada. No loss of airframe so far. Hundreds of a/c flying for more than 20 years now. Certainly Ada is not magic pixie dust, but it surely is a major contributor to safety.

        1. Roo

          Re: Ada

          Ariane flight v88 ring any bells ? :)

          You can write FORTRAN in any language.

          1. fg_swe Silver badge

            Re: Ada

            They did not even perform a cursory HIL Test for Ariane V. This is standard for control units sine 2010 or so. HIL test would have found the Ada exception and the bug would have been fixed without much talk.

          2. fg_swe Silver badge

            Re: Ada

            Actually Ada worked flawlessly in Ariane V first flight. It reported a variable overflow, instead of marching on. As written above, modern testing techniques will trigger such exceptions. Then software engineers can investigate the root cause and fix them. Which is exactly what you want from an engineering POV.

            Compare that to C++ or C, where variable over- or underflows will go undetected until "funny behaviour" results. (Yes, I know you can bolt on range checking in C++, but Ada has it built-in)

          3. An_Old_Dog Silver badge

            Re: Ada

            Using Ada, vs FORTRAN, does not guarantee your program will be bug-free -- it's not magic pixie dust! But Ada gives the programmer a lot of assistance of the sort which FORTRAN does not.

            I can easily write FORTRAN which will do math which produces the results of 2 + 2 = 5. You'll have to work at it in Ada. (Hmmm ... I should try that in C and in Pascal to see what happens.)

            If you're writing FORTRAN-in-Ada, your manager and coworkers should be calling you on that. (I'm not knocking FORTRAN or cheering for Ada, just making some observations.)

    2. Rich 2 Silver badge

      Re: Memory safety is a design issue

      I’ve been writing C code for a good 35 or so years now. I long-ago abandoned using heap storage, except at program startup - ie, if I MUST use it then I allocate at the start and never release it. I work mostly on embedded stuff so not using heap storage at all comes naturally, but I follow the same philosophy above when writing stuff to run in a PC. I can’t remember the last time I had any memory errors.

      Try it - it’s really not difficult when you think about it.

      1. fg_swe Silver badge

        Still Not Memory Safe

        What you describe is the standard approach in any safety critical industry such as Automotive, Aerospace, Medical and Rail. Dynamic memory is impossible to make hard realtime or even "available at all times".

        BUT - even with statically allocated memory you can and WILL have index errors, using C or C++. You can also have bad pointers, which were meant to be pointing to the static memory sections.

        With Rust or Sappeur you allocate whatever memory you need in a startup phase and after that you can be sure there will be no memory errors any more.

      2. Richard 12 Silver badge

        Re: Memory safety is a design issue

        Not using heap storage is literally impossible in larger applications, because the user provides the input of arbitrary size.

        I've seen a lot of embedded C programs that actually make and manage their own fake heap, pretending that's better that the toolchain malloc/free options.

        Of course, idiomatic C++17 is memory safe already, and it's a lot safer to port a codebase function-by-function. Big-bang translation is absolutely guaranteed to go wrong.

        1. fg_swe Silver badge

          Re: Memory safety is a design issue

          Heap allocation is neither hard realtime nor provably available. Consider heap fragmentation.

          As pointed out in other comments, C++ has become a bit safer, but by no means fully memory safe. Just consider the accidently thread-shared global object problem.

      3. Anonymous Coward
        Anonymous Coward

        Re: Memory safety is a design issue

        I have used for long time dynamic memory allocation on C Code for long time now. But I always use well tested routines that I use for memory management. These routines do use linked list with search capabilities with kind of key value paradigm. They also have capabilities to mass allocate by units of info or structure. They also have POSIX lock/unlock option for thread safety. And finally they clean every allocated space checking for the pointer not to be null. After started using these kind of routines long ago and a practice to write a matching clean up for every allocation in my higher code including those where pointers can travel across functions I no longer have nightmares with C pointers.

  10. Eclectic Man Silver badge

    ISO standards and tests

    "proper adherence to ISO standards and diligent application of testing tools"

    Umm, seriously? If we all had programmed in according to ISO standards and tested things properly (and I am as guilty as anyone here), we could still use BASIC and the all powerful 'GOTO' command without error.

    Whilst noting the, somewhat less than complimentary, comments above, I cannot help feeling that a tool which checked for things like buffer overflow or other memory use errors, and corrected them automatically would have saved me personally considerable time in the past. I have been highly critical of some applications of AI in the past (indeed on other posts in the Register), I would be interested to see how this translation is done before dismissing it as complete twaddle.

    CAVEAT - I am a very limited objective 'C' programmer, my only experience of 'Visual C++' was painful, I used BASIC for Computer Science 'O'-level (yes I really am that old), dabbled with Pascal, and have had some 'fun' with Lisp and Prolog, am allergic to COBOL and have never even tried Rust.

  11. Apprentice of Tokenism
    WTF?

    Huh? Are we caught in the Ada loop again?

    1. jake Silver badge

      Yep.

      Last December, Five Eyes decided we should all dump C++ in favo(u)r of rust.

      Now DARPA is jumping on the bandwagon.

      Government agencies are always late to catch onto fads.

      So it's official. Rust is no longer hip. Like Ada, it's going to start fading into the background noise of legacy languages.

      Time for the next fanboi-driven language du jour to make its appearance.

    2. Bebu Silver badge
      Windows

      Are we caught in the Ada loop again?

      I was thinking that too.;)

      Ironically if Ada were adopted as was originally intended then most of the memory problems that Rust is said to protect against wouldn't have arisen.

      Equal in irony is, looking at the latest Stack Overflow 2024 developer survey as reported by el Rego/Devclass, Ada doesn't get a mention but I notice that Go is half a nose ahead of Rust but C/C++ is 50%.

      I cannot help thinking if using regexes to solve a problem gives you two problems, then using AI/LLM to solve a problem is likely to give you an uncountable multiplicity of problems. :)

      1. Rich 2 Silver badge

        Re: Are we caught in the Ada loop again?

        Why don’t we all use Ada? Remind me

        I used it VERY briefly many years ago. I know the original compilers were stupidly expensive but that problem went away several decades ago. Is there something “wrong” with Ada? Or is it just not sunny enough?

        1. Binraider Silver badge

          Re: Are we caught in the Ada loop again?

          Good language, but where's the sexy libraries to go with it?

          Most Ada users aren't interested in the shiny.

          Personally I've taken enough punches from third party library support being broken to know that I can't be arsed to deal with them anymore though in some lines of work they are unavoidable.

  12. DaemonProcess

    pointers to functions

    One historical feature of CPUs is that a register can hold an address, which can be used for data or for code: Z80: jmp (hl)

    In C you can declare a variable to be a pointer to a bit of data or a function mimicking the old CPUs.

    We haven't needed to write self modifying code in 50 years, either.

    So the point I'm making is that wouldn't it be nicer to have a compiler which does not act so dumb and instead of printing warnings actually calculates the possibility of creating bad addresses and erroring out at that point? It may be better to have a front-end to a compiler which intelligently examines the code we have in the language it was written in, instead of translating bad code.

    I hate seeing warnings when I compile other people's code and hate being told to ignore them.

    1. Boris the Cockroach Silver badge

      Re: pointers to functions

      Well you are looking at some fairly hard to analyse code there, especially when you'd use that sort of thing for a jump table.

      For example (my Z80 code is rusty(ha) and may not be upto spec)

      Called with B register loaded to the jump table entry needed

      LD HL , 16384 ; Jump table base address

      Label 1: Add HL, 2

      DJNZ Label 1

      JP (HL)

      The compiler will need to know things like is there a limit to the B register value entering that section of code

      But a better solution would be making sure that B could not have an illegal value

      Such as

      Bit 7, B

      JP NZ, (HL) ; tests if B is greater than 127, if so jump to the first entry of the jump table which would be error handling of some sort.

      Of course the real fun would be in working out the maximum value in the B register if the length of the jump table could be varied too depending on the results in the rest of the program.

      And the compiler is going to catch them?

      1. DaemonProcess

        Re: pointers to functions

        Yeah nice response!

        djnz is a great example - you can interfere with the value of the bc register pair mid loop and cause chaos.

        Therefore the checks need to be done on the source code. Hence back into the old Ada vs. C debate now with rust taking over from Ada.

        The proposal would require taking the ML up to 6 sigma.

    2. Richard 12 Silver badge

      Re: pointers to functions

      Actually, the vast majority of code that currently exists is self-modifying.

      Executing Javascript, or even C# and other p-code based software is using the self-modifying features of the underlying hardware.

      So we can't take that away without killing Javascript entirely.

      Some might say that's a price worth paying, of course.

    3. Nick Ryan Silver badge

      Re: pointers to functions

      In CPU terms there is no difference as the data is just a number. What the number is used for is important.

      The key point that many do not understand is that it is critical in efficient code to be able to manipulate jump addresses for code. The reason is jump tables and references where there are a various options the code needs to take and it is incredibly inefficient to operate this as a serious of comparisons, with the inefficiency growing the more options there are. With a jump table there is a single operation performed and, for example, if there are 20 options then this is just a single (standard CPU) instruction, if this is implemented inefficiently as a series of comparisons then there would be 40 instructions in total, with only the first option being the equivalence in efficiency and with every subsequent test adding to further inefficiency (the last option would have to go through many comparisons, although for huge sets a b-tree would make it much faster).

    4. Skwn

      Re: pointers to functions

      Let's allocate a mem for a string that also includes a space for the terminator null pointer and memset the allocation to nulls. Now use strncpy to copy a string of smaller size that the allocated space. Now because the space is pre-memset to nulls it is a valid instruction step. Most compilers would spew empty warning about it for no reason. Now if there is real memory address tracking embedded in the compilers even at a cost of pushing out entire virtualization of the physical memory to do such tracking and all during compile time then you should have a robust memory management system right from the compile time and not worry about during runtime.

  13. Anonymous Coward
    Anonymous Coward

    Safety critical

    Last time I touched pure C was to write drivers to SIL IV.

    Good luck translating that to rust with the register accesses.

    Good luck proving SIL IV level of testing has been reached...

    C/C++ is as safe or as unsafe as the developer wishes.

    Rust in peace.

    1. fg_swe Silver badge

      Destructive Reasoning

      Of course there must be a small part of the code which is unsafe. Your embedded periphery meddling (A/D converter, PWM, clock setup etc) cannot be covered by Memory Safety. But that does not mean the 99% of OTHER code should not be memory safe. Locking down 99% of code is definitely a very serious gain of safety and security. For example, you will discover dangerous memory errors during extensive Unit, Software and HIL testing.

  14. annodomini2
    Black Helicopters

    Safety Critical 2

    Most Safety Critical systems don't use dynamic memory management.

    The MISRA standard specifically forbids it, assuming it's being used.

    I'm not saying C/C++ is perfect, far from it.

    But to me it does scream "Use this, it's better!" Ignoring the convenient (for them), but convoluted back doors we've put in the code to access your system if we need to.

    1. hammarbtyp

      Re: Safety Critical 2

      Misra does not explicitly forbid anything. What it says is "here be dragons" use carefully

      There are good use cases for dynamic memory usage, gotos, etc. it's just you need to aware of the possible pitfalls and make efforts to put extra care

      A good static analysis tool can spot most of these. The bigger challenge is concurrency. This is where rust wins from the outset and static analysis can't help you

      1. SCP

        Re: Safety Critical 2

        SPARK Ravenscar profile!

        As with single-threaded code care needs to be taken in the design and there are restrictions - but formal analysis of useful code is possible, and capabilities are continuing to evolve. Having a language that inherently protects against certain types of error helps greatly in alleviating the burden of analysis.

    2. fg_swe Silver badge

      WRONG

      Memory Safety and dynamic memory management are NOT the same. Even in totally statically allocated systems (most realtime/embedded systems in Auto, Rail, Medical, Aerospace are of this type) you can still have Index Errors and Bad Pointers, if you use C.

  15. Anonymous Coward
    Anonymous Coward

    And another thing …

    I'd point to the large world of open source code, and just as well, all the code used across the defense industrial base.

    Yup, The defense industry is sure to want to dump its Top Secret C code into a LLM that will do who knows what with it, and subsequently spaff it out to who knows whom.

    1. An_Old_Dog Silver badge
      Joke

      Top Secret Code

      And that's the beauty of it! The LLM will probably morph the Top Secret Code in ways which adds bugs. Or, it might leave the code untouched. Since The Enemy won't want their missiles, jets, subs, or defense radars run by possibly-extra-buggy code, they'll have to carefully review that code.

      By the time they've finished the review, and re-coded, and tested where needed, to remove the LLM-introduced bugs, the device the code controlled will be obsolete!

      The Enemy would have been better off just carefully designing and writing their own code, but they were seduced by the "get something for free!" mentality.

      I think that problem and solution apply to non-military contexts, too.

      1. Anonymous Coward
        Anonymous Coward

        Re: Top Secret Code

        When using a compiler for SIL IV we had to turn off all optimisations.

        The compiler and version had to have been tested to the n'th degree to make sure all permitted constructs and usages had been tested and that the generated assembler was as expected...

  16. Dostoevsky Bronze badge

    Who knows...

    ...it might even work.

    My experience with LLMs—probably a year out-of-date by now—is that they did a decent job of translating the usual business logic. But weird memory access or pointer math was a bit too big a byte to chew. Haha.

    1. heyrick Silver badge

      Re: Who knows...

      I asked ChatGPT to translate a chunk of ARM assembler into C.

      It did a very brain-dead line by line translation that was essentially the assembler code rewritten as something a C compiler ought to accept (only it didn't due to various errors) and due to limited remembrance of what it had already done, there were several fundamental logic flaws in the generated code, clauses that no longer existed at all, and other clauses that ended up executing the wrong bits of code due to this.

      I estimated that fixing the mess would have taken longer than reading the assembler, writing down on a piece of paper what it was actually doing, then writing some new code in C to do that.

      All of this AI/LLM stuff is a great toy, but note well that there is no intelligence whatsoever involved.

      If you ask for a picture of a little girl riding a flying bear, you'll get a picture of something that looks sort of like a small female human who may or may not be on top of or somehow oddly merged with something that looks like a bear.

      Likewise, if you ask for programming language X to be translated to programming language Y, you'll get something that sort of resembles Y but might be somehow oddly merged in ways that defy logic (and the compiler).

      Wouldn't it be so much simpler to recompile the existing C code with something that adds memory checks (which, arguably, should have been a mandated option in C99 since this is not a new problem)? Yes, it'll be a little bit slower, but it ought to be clear by now that safe and fast are mutually exclusive.

  17. Doctor Syntax Silver badge

    Some of us remember what happened when somebody sanitised the memory access of the SSL random number generaor.

  18. Howard Sway Silver badge

    In theory, C is a small, simple language

    In practice, well take a look at the International Obfuscated C Contest and decide whether you think an AI converter is going to do a good job of translating any of that into something legible...

    Then there is the small matter of wrongly thinking that the converter is going to guarantee memory safety. Vast swathes of C code are anything but memory safe, and often use the worst kind of "clever-clever" programmatic cheese to save (in theory) a processor cycle or two. If the converter is going to be anywhere near accurate, I'm presuming that all that unsafe stuff is going to have to be converted accurately, yet made safe somehow.

    1. Paul Kinsler

      Re: Obfuscated C

      I wonder if a use of ML here could be as a de-obfuscator; presumably it wouldn't work very well but as a first step to e.g. renaming variables to match an inferred use, attempting to explain weirdly compounded constructions, etc; it might be a time-saver...

      1. TheMeerkat Silver badge

        Re: Obfuscated C

        > I wonder if a use of ML here could be as a de-obfuscator

        Converting to a different language always obfuscate code.

  19. abend0c4 Silver badge

    Been there...

    My second job in IT (my first was trying to persuade naval architects that the ICL FORTRAN they'd been using wasn't going to work on their new IBM system: even converting programs between ostensibly the same language can be next to impossible when they rely heavily on "undefined" behaviour) was trying to convert a series of programs written in a proprietary dialect of BASIC into something more portable and maintainable. That was equally intractable: the contortions required by the constraints of the language meant that any direct translation would be equally obscure - and noone had bothered to document what the code was actually supposed to do, so it was at best risky to infer a particular interpretation of the code. I seem to recall that some of the code depended on string values that could be dereferenced as their equivalent variable names...

    C has almost infinite scope for abuse, so the best of luck to whoever is sufficiently unfortunate to get to check the AI's working.

    1. Paul Crawford Silver badge

      Re: Been there...

      I translated some old Fortran (pre-77 syntax) in to C to make it more portable, and while the f2c tool did most of the donkey work, I had a horrible mash of C code to deal with and it took weeks of effort to understand what the old code did in order to try and re-factor it in to something sane in C. Hint - Fortran allows multiple entry points to a subroutine, not just the multiple exit points we know and love via 'return' today, working out why they did that and how it all worked was not easy as the code and related documentation (which covered the maths reasonably well) was rather terse.

  20. Anonymous Coward
    Anonymous Coward

    This just sounds like the Rust marketing department wasting our time again.

    They should spend less time lobbying and more time improving the compiler to have direct interop with C without needing to use bindings generators (SWIG/bindgen) or manual maintenance work. It would also reduce the absured number of dependencies people pull in from crates.io damaging the concept of Rust further.

    1. chololennon

      Too much PR...

      > This just sounds like the Rust marketing department wasting our time again.

      > They should spend less time lobbying...

      You're right.

      Also, the fanboy attitude of the Rust community is ruining the language reputation. The language is not perfect (and of course, it is not the only language around), but they insist that Rust is the only way to go in every forum/article/post/etc. The situation reminds me of the 90s, when Java came on the market.

  21. Anonymous Coward
    Anonymous Coward

    Too late, too little

    Memory errors haven't been a problem in code I shiipped for around 10 years. The analysis tools for C (dynamic and static) are good enough that's no longer a problem.

    I will concede, getting to the point where code AND test suite is free from those classes of errors was an effort but possibly less effort than rewriting it all in Rust.

    So once the memory errors are gone, what's left ?

    The real problems perhaps ?. Garbage code, slow and bloated code and code with security issues. That would actually be useful :)

    1. fg_swe Silver badge

      Claim

      "Memory errors haven't been a problem in code I shiipped for around 10 years."

      Mr Ivan out of Tshelyabinsk might have a word with you after he stared at the decompilation of your code for 30 days.

  22. Anonymous Coward
    Anonymous Coward

    New Silver Bullet

    Use Armor-All™ to restore that shiny, steely, new-metal glow to your Rusty code!

  23. razorfishsl

    It's nonsense.....

    since there are cases where you can NEVER predict the behavior of a program.

    Threaded or I/O code in-particular....

    1. fg_swe Silver badge

      Also Not Really Correct

      A well designed program written in a memory safe language can indeed perform safe multithreading. It just requires the language to MODEL multithreading in the type system. As opposed to C and C++, which simply bolted on multi-threading to a single threaded memory model.

      E.g.http://sappeur.di-fg.de/manual.pdf Section 9.2

      1. Rich 2 Silver badge

        Re: Also Not Really Correct

        Why do you seem to have to mention “sappeur” (whatever the hell that is) in almost everything you post?

        1. Anonymous Coward
          Anonymous Coward

          Re: Also Not Really Correct

          It's their invention that will Save The World.

          If only they can convince someone, anyone, just one person, please, please use it I beg you...

  24. Kevin McMurtrie Silver badge

    What kind of volunteers...

    Someone needs to guide and validate the AI output. Experienced developers who know the old codebase probably wouldn't want anything to do with a big Rust translation initiative. That means work will be done by low-paid government staffing pools, Rust fanatics, and agents of hostile governments.

    Good luck with this making things safer.

    1. Bebu Silver badge
      Windows

      Re: What kind of volunteers...

      [this]work will be done by low-paid government staffing pools, Rust fanatics, and agents of hostile governments.

      If the Rust fraternity is anything like other language chauvinists these Rust fanatics would be little more than language lawyers that could split the finest syntactic and semantic hair with the greatest of ease but are incapable of coding the simplest of routines.

      There was a 1990s Dilbert cartoon featuring a meeting with one of these type's brain exploding when presented with real code.

      1. fg_swe Silver badge

        Re: What kind of volunteers...

        "language chauvinists"

        Can I have that with some feminist pickle ?

  25. G40

    When will it end?

    The massive lack of understanding betrayed by edicts like this nonsense from DARPA is staggering. Confusing extreme legacy C code with any contemporary C++ is like mistaking a horse for a TGV. Please stop now.

    1. fg_swe Silver badge

      No

      C++ still has essentially the same Memory Bug Potential as C had.

      http://sappeur.di-fg.de/WhyCandCppCannotBeMemorySafe.html

      In real-world, large scale C++ based projects they usually run test cases with valgrind. A very slow way of detecting memory bugs in test code with test input. Still does not defend against other, well-crafted hostile program input.

  26. captain veg Silver badge

    correct and idiomatic

    "The current generation of tools still require quite a bit of manual work to make the results correct and idiomatic"

    I still remember this nugget from K&R:

    void strcpy (char *s, char *t)

    {

    while (*s++ = *t++)

    ;

    }

    Good luck with that.

    (If anyone knows how to get this commenting system to respect the [pre] tag, do tell.)

    -A.

    1. fg_swe Silver badge

      Fast

      ...Food Code.

      Quick, dirty, unhealthy.

  27. Will Godfrey Silver badge
    Boffin

    A question

    What language is used to write the Rust compiler?

    1. fg_swe Silver badge

      Re: A question

      Any Turing complete language can in theory be compiled by a compiler written in itself.

      Rust uses a serious chunk of C++, but this is done for pure economics reason.

      Rewriting llvm in Rust would prolly discover loads of memory errors inside llvm.

  28. Binraider Silver badge

    Writing new would rather seem to be more appropriate than converting?

    If someone complains about the size of the codebase to rewrite, take advantage of the rewrite to shrink it and optimise.

    For all of rusts laudable ideas, it needs one of the big vendors buying into it with one of their flagship products to really make it fly. Combine that with some actual standards rather than moving goalposts and they will get traction.

    I'd swap, but I can't afford the non standardisation risk. (Mostly time to rewrite).

    1. fg_swe Silver badge

      Google already does a lot of Rust work in Android. With great success, it transpires.

      1. Binraider Silver badge

        Downvotes didn't come from me - though I don't know if the relatively containerised situation that is an Android apps is a good test? Each app is "free" to use whatever version of rust it wants to use internally to compile to whatever.

        Transfer code from one fork of rust to the next, as long as it lacks a standard, and it will break.

        I can't refer to a given block of code as "Rust 23" in the same way that a block of C++ 2017 has a particular meaning.

  29. Howard Sway Silver badge

    I hope there's a complete C preprocessor included in the AI too....

    The C preproceesor is of course legendary for the ways it can be abused, and setting an AI loose on source code without running it though the preprocessor first could be disastrous....

    #define TRUE 0

    1. fg_swe Silver badge

      Anti-Modular Feature C Preprocessor

      More solid engineering approach is:

      1.) Do NOT include a macro processor into the language

      2.) Use a separate macro processor such as m4 to perform specialization of generic code.

      3.) Expand the macros to disk. This makes crystal-clear what the macros do and debuggers can show what is REALLY going on.

      1. Richard 12 Silver badge

        Re: Anti-Modular Feature C Preprocessor

        So prohibit cross-platform and configuration by design?

        Good luck convincing anyone to do that.

        Incidentally, every single C and C++ toolchain can dump the output of the preprocessor to disk. That is literally how the first toolchains worked.

        (Well, actually it was tape but close enough.)

        They don't do it by default anymore because RAM is orders of magnitude faster than disk.

        1. fg_swe Silver badge

          Re: Anti-Modular Feature C Preprocessor

          Dont really get your point. M4 can run on any computer providing a Posix Api with or without shim ala cygwin. I assume it runs on zOS, too.

    2. sitta_europea Silver badge

      Re: I hope there's a complete C preprocessor included in the AI too....

      "The C preproceesor is of course legendary for the ways it can be abused..."

      First time I ever saw somebody (ab)use the C preprocessor the guy had this comment right at the top of his code:

      /* PREPROCESSOR ABUSE IS NEAT! */

      I guess that was around 1977.

  30. ecofeco Silver badge
    Facepalm

    Oh this will end well

    Not.

  31. Anonymous Coward
    Anonymous Coward

    History Repeats ... like spicy meatballs !!!

    DARPA will spend the GDP of a small country to chase a goal that cannot be reached !!!

    The people that know will be ignored, as usual, and DARPA will change the goalposts somewhere along the journey to create something that is almost the original desired result *but* much less useful or usable ... of course, it will also be 3 years too late.

    The company that wins this 'little' RFT will fund its expansion on the back of DARPA's largesse.

    P.S. By time this appears there will be something even better than Rust [suggested name FeO (a rarer form of rust)] !!!

    [Solution to problem: ... How about a Rust to [FeO] Translator ... rinse & repeat ... ]

    :)

  32. ghp

    In my first contact with AS, I asked it to write me a routine to calculate a factor without using recursion. It wrote a little program using recursion. Pleasantly surprised, I told it "you're using recursion!". "No I'm not", it replyed.

    May the Lord have mercy on us.

  33. Ian Johnston Silver badge

    If it is possible to translate C code into a safe language and then compile it into a safe binary, should it not also be possible for a C compiler to produce a safe binary from the same code?

    1. SCP

      Yes. Use an AI based compiler that generates binary that does what you intended rather than what you wrote - or more accurately does what it "thinks" you intended.

      In practice it would be better to keep the compiler more directly focussed on correctly implementing what has been written (or rejecting code that is incorrectly expressed) according to the language rules - and C has rather permissive rules. Analysis for design/implementation errors in the program source code are better undertaken by other tools, and these tools can cover different aspects of "correctness" - from basic lint like tools to deep formal verification like Polyspace/Absint.

      An advantage of shifting source code from one language to another is that of future maintenance - particularly if the original language repeatedly leads to code writing errors that are precluded by the new language. (For sure, this is not the only consideration in choosing the source code language to be used - and often the choice made is a compromise).

      You could attempt to preclude classes of error by subsetting the source code language (this is often done in high-assurance software development) but this also has drawbacks.

      1. Ian Johnston Silver badge

        I was thinking of rolling the "translate into a safe language and compile that" into a C compiler. In other words, if C can be translated and compiled safely, it's a safe language. If it can't then nothing will make it safe ...

  34. Cliffwilliams44 Silver badge

    So, AI is not the fairy godmother!

    IBM pushing to use AI top convert COBAL to JAVA!

    DARPA pushing to to us AI to convert C/C++ to Rust!

    In the words of Timmy Turner. "What could possibly go wrong!"

  35. Cem Ayin
    Joke

    Training material?

    I'll admit that LLMs are not exactly my area of expertise, far from it. So my understanding of the matter may be completely wrong; anyway my current understanding of LLMs is roughly this:

    LLMs compute, during their training phase, the probabilities that certain output sequences of words somehow "match" given input sequences of words, with "matching" meaning that a human reader will recognize the output sequence as a valid, sensible response to the input sequence. To achieve this, they need to process several gazillions of such pairs of sequences as have been previously established to "make sense" to a human reader, which is preceisely where there is the rub... (And, of course, LLMs don't "understand" a thing about those input or output sequences, which is why they'll happily recommend eating stones as part of a healthy diet or resolutely confirm that a rucksack will, on average, perform no worse than a parachute in arresting a free fall from an altitude of several thousand feet if that happened to be in their training input, their understanding of irony or sarcasm being somewhat limited.)

    If this is so, then in order to train an LLM to translate C program code to functionally equivalent Rust program code - including cases where e.g. code doing crazy pointer arithmetics has to be painstakingly refactored rather than just "translated" - would require a gazillion or two of such pairs of code, which is not currently to be had for love or money. Ok, ok, the requisite training material *could*, in theory, be had for *money*, by hiring competent programmers to perform the translation job manually, but you'd need *an awful lot* of people, time and money, possibly more than could ever be saved by the resulting LLM, which is where the Ouroboros will bite its tail.

    But hey, maybe this was precisely the idea:

    "Oh, man, if only we could get rid of all that crappy C code in our mission-critical softwares!"

    "Well, all you need to do is re-write the code, look there's already a project to re-implement POSIX shell utilities in Rust, we just need a lot more of such projects."

    "Bah, you know as good as I that we'll *never*, *ever* get the funding for such an undertaking, code refactoring just isn't 'trendy' enough. These days, all the investment money, as well as public funding, goes to stuff that you can label 'AI', who cares about mundane tasks such as refactoring the basic foundations of our IT infrastructure!"

    "Did you just say 'AI'? Hey man, I think I have just conceived a cunning plan..." XD

  36. Sammy Smalls
    Coat

    All your code belongs to us/US...

    Sure, pump all your code through a US agency to be improved. Maybe not so much TRACTOR as Harvester.

  37. Blackjack Silver badge

    [DARPA suggests turning old C code automatically into Rust – using AI, of course]

    And who is gonna check that code afterwards? Because AI is if anything more error prone that humans.

    1. Nick Ryan Silver badge

      And who is gonna check that code afterwards? Because AI is if anything more error prone that humans.
      LLM and text generation tools don't do anything original, they merely take input of previously written human generated content, look for patterns and tag it. Then when outputting the LLM output component tries to apply the appropriate tagged elements and patterns and dumps it out.

      So the input is whatever, possibly junk, a human wrote.

      The LLM middle layer decomposes this and tags it.

      The output remixes the elements and outputs it, with no understanding of anything ever. An LLM has no domain of knowledge or capability to make a model in order to compose an output, therefore it's just output by rote.

      For simple scenarios it works, but the more complicated things become, and they get complicated very quickly, the more useless an LLM tool is. For example, ask an LLM tool to generate an image of an animal and there is a good chance that it will have the wrong number of legs, arms, eyes, etc or that they will be put in impossible angles or positions. Why? Because the LLM has no domain of knowledge that the animal has a given structure, four limbs, with defined joint positions, and so on... therefore what comes out is a dystopian horror show.

      1. Blackjack Silver badge

        That's not AI then, is the whole automated text code replacement you can do in many ways. Calling AI to something you can do with a simple script is ridiculous.

        1. Richard 12 Silver badge

          They call it AI for marketing reasons.

          It's also not a script. Scripting generally produces far better output, but requires a lot more direct human effort to create.

          Large Language Models just need to be fed huge amounts of human-generated text, which can be trivially nicked off the Internet on the assumption that nobody will notice.

  38. O'Reg Inalsin

    Follow the template

    It will unfold like this -

    1. Fire all the experienced programmers - who incidentally have all gone through a rigorous Defense Industry security screening.

    2. Except the one expert in ChatGPT prompts who needs to tell ChatGPT to "just do it".

    3. When that fails, execute Plan B: outsource the work to cheaper contract workers over the internet, giving them each a subscription to ChatGPT.

    4. When it turns out that the hired contractors are riddled with North Koreans posing as Americans from Nebraska, go into blame deflection mode and claim victimhood.

  39. JonKernPA

    Adding any Test-driven aspects?

    I wonder if TRACTOR is considering first writing a test suite to cover the original codebase's functionality (if none exists).

    Then TRACTOR BEAM the tests to Rust as well, and therefore have a closed-loop means of ensuring the translation is a functional equivalent...

  40. halfstackdev

    Re-encoding human behaviour using Tractor AI

    Given that human thoughts and behaviours can essentially be digitised.. we should force all Citizens to take a surgically implanted AI chip that will intercept all neural actions … rewrite them in Rust on the fly .. and then re-output them onto the neural pathways.

    The proven power of Rust to detect and correct mistakes is legendary.. so why not apply it to humans ?

    No more traffic accidents, food cooked to perfection every time, trains running on time every time, sports stars will shoot a perfect goal with every kick, no more awkward social interactions, no more romantic rejections, pass any exam, take over and land an airliner when the pilot has a cardiac arrest, win a F1 race on your first attempt, win every hand at blackjack, and always roll a 6 on every dice… etc etc.

    End hunger, war, poverty, disease, addiction, nasty tweets, and all forms of incorrect thoughts.

    We can do this !

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like