The Register Home Page

back to article Anthropic's Claude Opus 4.6 spends $20K trying to write a C compiler

An Anthropic researcher's efforts to get its newly released Opus 4.6 model to build a C compiler left him "excited," "concerned," and "uneasy." It also left many observers on GitHub skeptical, to say the least. Nicholas Carlini, a researcher on Anthropic's Safeguards team, detailed the experiment with what he called "agent …

  1. elsergiovolador Silver badge

    tasked 16 agents with writing a Rust-based C compiler

    That's how you would bootstrap C if compilers for it didn't exist, so the job is incomplete as typically the compiler should be written in the same language it compiles for.

  2. El Duderino
    Meh

    "Claude can't tell time"

    Not very 'intelligent' then, is it?

  3. MrRtd

    LLM barely passes open book test.

    1. Sampler

      What I don't get every time I see one of these articles; AI barely made something not really functional in the real world, but if we squint a bit and forget several things it's kind of amazing it nearly did something..

      I guess it's in belief that it will get better and someday work, but, it's chaos inherit in the system, they work on probability from their training data with a few things tacked on, calling them AI is laughable as there is no intelligence, just chaos iterated until it's close to being right, but never knowing what right actually is.

      Can't wait for this to be realised, that AI will always be "nearly useful" but never cross the threshold, and we can go back to being able to afford ram and hard drives again..

      1. 66663333

        insurance coverage

        Based off of your observations, I think that's what I believe so far. It's a threat for sure but can it be trusted enough to replace humans. I think it'll replace some humans unfortunately but I hope something happens where it doesn't.

        One thing is if you can't trust the output of the AI, and businesses start using AI to replace humans, will their insurers drop them or maybe just raise insurance premium.

        Who wants to bet millions or billions on potential losses should a fully ai team write an important software program? I don't think insurers would, but if the client can prove that they have a rigorous human inspection then maybe the insurer is more likely to cover the company.

        I think with humans insurance companies already know the trials and tribulations, pitfalls, etc. thus they can manage traditional human based risk, but if ai comes in how would an insurance company even tell what the risk is?

        I mean, I think some businesses don't even exist today because they can't get decent insurance coverage.

        1. steviesteveo

          Re: insurance coverage

          It's going to be a real block - summarised emails where you've still got to read the email because if the summary missed something that's still on you

          It really highlights the careless people thing- "this stuff works alright but if it breaks it breaks and you should have checked it, I guess"

  4. nematoad Silver badge

    The polluter pays principal.

    ...programmers deploying software they've never personally verified

    I know that it goes against all the EULAs that have been written in the last thirty odd years, but how about making developers and the companies they work for legally liable for the dross that they turn out. If someone knew that they might be financially on the hook for mistakes made by these so-called AI agents then perhaps they might be a bit more circumspect in how they used said agents.

    1. kuiash

      Re: The polluter pays principal.

      I was a contractor for decades and it is absolutely the case that I was liable. That's why I took out millions in insurance cover. If I'd used LLM tools I'd still be liable.

    2. Horridbloke

      Re: The polluter pays principal.

      The disadvantage to that is any programmer who isn't terminally stupid would immediately quit and do something else for a living.

  5. Crypto Monad

    I think this is only possible because there are existing standards documenting the C language in reasonably formal detail - and many existing test suites which (I expect) would be re-used.

    Using vibe coding for some vaguely defined task like "build a business automation system" is likely to be much harder. SAP need not worry just yet.

    1. DS999 Silver badge

      Not to mention

      Being trained on open source compilers such as gcc and llvm.

      I'd be more impressed if they'd made up a new language and got it to write a compiler for it from spec, so it couldn't have the crutch of any code or unit tests it finds on the web.

      1. Chris Gray 1
        Devil

        Re: Not to mention

        Fully admitting that I've never tried any AI. I'm familiar with old-fashioned AI from decades ago, though.

        As my decaying brain struggles with problems in my compiler, I amuse myself thinking how one of the LLM's would handle it. A completely new implementation of all of a compiler, written in itself, a language that no LLM has ever seen. Could be amusing, but I don't have the time.

        1. RoboJ1M

          Re: Not to mention

          It's fascinating, you should give it a play.

          I've spent the last two months bouncing ideas off of Gemini. I'm not interested in it coding anything but used it like guided learning.

          For example, I'm making a debug harness for a vintage games console.

          In a software engineer, 30 years, but I don't know electronics. Or I *didn't*

          I'm still reading and learning normally, but I can give it a high level problem ("can I change the system to me this instead?" How do you connect these two things together to do X?") and it'll respond with " yes you can or no you can't, your new idea is what the industry calls XYX" and points me at the subject I need to learn.

          No more trying to scour forums to see if my idea will work, or posting and waiting for someone to respond.

          No more endlessly picking through datasheets to figure out exactly which type of variant of microchip indeed, it'll make suggestions.

          And it just "gets it"

          It'll immediately twig what I'm trying to do and I have something that can think and communicate as fast as I can, it can keep up with me.

          All those funny little ideas that pop into yours and my minds, it gets them and starts helping.

          And every idea that gets validated spawns another 10 ideas because I get the concept of the previous project and it's solution.

          So definitely, pick one off those ideas that's rattling around

          The ones where you aren't sure it'll work, don't really know exactly where to start reading the documentation for the subject you'll need or indeed which subject you need to learn in the first place.

          And then just write it down and ask if it'll work.

          Bounce some questions off of it and then go start making the idea.

          I find "how do I?" Questions Work much better than "make this for me"

          1. nonpc

            Re: Not to mention

            When doing a Computer Studies A level in evening classes as I switched from hardware to software engineering I submitted a project where I had written a floating point handler in machine code to add to a very basic BASIC ROM (on a Newbury Laboratories terminal). I created a decompiler which translated the machine code to assembly language so that I could check what I thought I had coded. My examiner considered that to have been a qualifying project in its own right. That was back in the '80s.

      2. prandeamus

        Re: Not to mention

        This is so important. People have been working on C compilers for, what, 50+ years now? There's wide range of discussions and source code for the most recent ones, plenty of academic discourse about the "right" ways to build compilers, tons of open discussions about strategies for various CPUs. And there are the various formal C standards. I'd be impressed if the agentic LLM approach came up with something *better* but there seems no evidence of that happening.

        1. Anonymous Coward
          Anonymous Coward

          Re: Not to mention

          The real kicker is that with 'ALL' that is available to 'learn' from ('Steal' in everyday language) as you have stated, it is not able to equal what is already in existence.

          All that exists is there for the taking and it still cannot 'create' something that is at least as performant as the average C compiler.

          The same point is being proven over & over again ... yet the search goes on for the magic 'AI' that 'lay' golden eggs !!!

          :)

          1. JDX Gold badge

            Re: Not to mention

            I find it hard to understand this sort of comment.

            How long would it take you, or I, as competent technical people, to create an 'average' compiler? We have access to all the same material AI does (weird it's not stealing if a person does it) as well as paywall material that it doesn't. We can also ask questions on StackOverflow or reddit, etc, which I'm sure we probably would need to (again this is not stealing other people's work) and it would probably take months to get a shonky beta product - what the AI is at now - and years for a polished product.

            The suggestion "The same point is being proven over & over again" is just flat out wrong. If you go back even a couple of years, it couldn't do this - it might get partway and show some 'promise' to those keen on it, but cyncics would dismiss it as unworkable. In a remarkably short time it has got to the point where it can produce something at the fraction of time and cost of a human to get to the same (not very good) standard.

            If you follow this (say GPT 3 -> 4 -> 5) the jumps are remarkable and yet people focus on "well it's not perfect so it's junk" rather than thinking "so what will it be like in another year or 2".

            And no I'm not an AI advocate or apologist. THere's loads of things it's terrible at, but it is getting better at a considerable pace. Many things it was laughable at , it can now do quite competently..

      3. Not Yb Silver badge

        Re: Not to mention

        See this article about half-way down, where they talk about "cursed". It's not an exact example of 'making up a new language', but it's reasonably close. Still not a good idea, and it didn't work that well... but the compiler does exist.

    2. thames Silver badge

      Yes, the actual blog post mentions that they relied upon having a very extensive and high quality set of tests developed by other people. They also used GCC extensively so that when the output of their own compiler didn't work they could compile most of it with GCC and a small part of it with the AI compiler and do this repeatedly to narrow down where the problem was.

      So in order to use this methodology, you need to have a very complete set of known good high quality tests and a high quality known good compiler. Once you have those you can then create a poor quality compiler using AI.

      The author said he was doing it to try to push Claude Code to its limits to see where it would fail.

      In terms of developing practical applications however, it's fairly useless unless your goal is to clone a project which has very comprehensive and high quality tests and which allows you to slowly substitute the AI's work in place of the original so you can incrementally find the failure points.

      There's not a lot of software out there that meets those criteria and it's of no help at all in terms of creating genuinely new software. It's basically a tool for cloning existing software under a new license while possibly skirting around copyright law.

      1. prandeamus

        "The author said he was doing it to try to push Claude Code to its limits to see where it would fail."

        That is a genuinely useful and honourable thing to research. It's too easy to say "woo, it wrote a compiler".

      2. sten2012

        On the flip side, taking a legacy code base that everyone is scared to touch:

        Having a process like this write as robust tests for it as possible and constantly verifying the existing application against them then porting that code to a more presently supportable (from a staff perspective) language. Now with test suites!

        Seems like a great application to try out and well worth a 40k price tag (doubling 20k to account for the initial test suite building) for many organisations if successful.

        Even if humans end up rewriting that final code or contextualising those tests, if that's what it takes to migrate from COBOL (or whatever). Essentially PoCing that project that the business never starts.

        1. Anonymous Coward
          Anonymous Coward

          Trying out something a bit like that, using the latest Claude Opus and a messy code base (the specifics below may have been transposed slightly, to reduce any identifiable information).

          It can actually accomplish interesting things (think refactor and get under test some code from the codebase's unholy era of C++ Compiler Metaprogramming With No Unit Tests from the mid-2000s), but will get stuck in a loop of agents, agents more agents that just keep dying and being resurrected, way too easily. Progress seems to be about random bloodymindedness rather than smarts, but still it is something the many generations of programmers since haven't managed to do - or more precisely, haven't been allowed to poke at with sufficient time and resources.

        2. Mimsey Borogove

          "well worth" the cost?

          Seems like a great application to try out and well worth a 40k price tag (doubling 20k to account for the initial test suite building) for many organisations if successful.

          Ok, say it's worth the cost in money - does this "total" cost include the enormous amount of water required to cool the computers that do this? Even if it does, because of "AI" and crypto taking such unimaginable amounts of water to cool their vast data centers, there will be water shortages. We may get to the point where no amount of money will be enough to buy enough water to keep doing this. Then there will be abandoned data centers sitting in deserts created by the use of this much water, where maybe, once, there might have been farms, forests, or even cities which no longer exist.

    3. Anonymous Coward
      Anonymous Coward

      Yeah, harkens back to that recent Wiggum looping craze, here with "--dangerously-skip-permissions" (cf. 'in a blog' TFA link, and search there for "Run this in a container, not your actual machine"). Expect the process and results to be unsafe at any speed ...

      And while some'd see this as no more than 'good clean' fun, it bears remembering also that K&R/ANSI/standard C is the easiest language to compile to assembly in the world (it was designed that way), just a bunch of macros to it, it's not like they're talking C++ or even BASIC here iiuc. Plus they didn't put an assembler, nor linker, together, which is where a lot of hardware limitations (eg. limits on immediates and branch target offsets) and OS idiosyncrasies come in, that may require actual thinking from the coder.

      Overall, this latest offensive, Wiggum, and OpenClubFoot, are just the freshest in the AI RotM PsyOp SOP imho. They're continuously broadcasting this defeatist message that we meatsacks should just surrender-monkey all hopes without a fight and embrace the madness as we are now inevitably surrounded and outnumbered by its wondrous fruitcake might.

      And the "GCC torture test suite" (their link) they speak of passing with such pride as a sign of invulnerability? GCC describes it thus:

      Throughout the compiler testsuite there are several directories whose tests are run multiple times, each with a different set of options. These are known as torture tests.
      ... not really a sign of overwhelming strength in my view.

      Aux armes, citoyens!

      1. Bebu sa Ware Silver badge
        Childcatcher

        "Aux armes, citoyens !"

        "Qu'un sang impur / Abreuve nos sillons !"

        Bientôt !

      2. David 132 Silver badge

        “A l’eau, c’est l’heure!”

    4. ATrickett
      Pint

      AI is okay if it can steal the answer...

      I've used Gemini on a few tasks and for things that are well documented it can steal an answer quite well and can be more effective that a plain Google search.

      For SAP, Gemini is dismal. It often proposes solutions that should exists but don't, suggesting I use a class or function that simply doesn't exist. When you challenge it and say the field doesn't exist in the table/structure, it says that I should just add it and that will solve my problem. Basically Google trained Gemini on the published information which consists of lots of unanswered questions, SAP's own terrible documentation and lots of bad and incomplete advice on the SAP Developers Network. It's hardly surprising it hallucinates more often that it says something sensible.

    5. Anonymous Coward
      Anonymous Coward

      "SAP need not worry just yet"

      Depends on how you look at it. Industrial manufacturing didn't replace humans in the process because it could produce better humans. It replaced humans because it was better than humans.

      AI isn't a threat to SAP because it might one day produce better tools, it's a threat to SAP because it might one day remove the need for the tools entirely...AI doesn't need to build a feature equivalent 1:1 SAP product...if it can just do what the SAP product does.

      We're still relatively early with AI...a bit like that period in the dotcom era where nobody actually had any ideas, it was just a bunch of "like this existing thing...but on the internet" ideas. It wasn't until much later, possibly after the dotcom crash, that we started seeing some original ideas that were groundbreaking that could only exist because of the internet...that could only happen once infrastructure existed.

      One major problem with the dotcom boom is you had tons of money pouring into these "great ideas" for putting various existing types of business on the internet, but virtually nothing was spent on the "final mile"...hardly anyone was actually on the internet...it wasn't until the early 2000s when we started seeing proper widespread deployment of broadband to homes that innovation actually took off.

      I don't know what the "final mile" is for AI yet...I don't think anyone does really...but once we figure it out, you can bet your ass we'll start to see things that can only be done with AI, that aren't just "like this existing thing...but with AI".

      The parallels are there...some of us old enough to remember can recall a time where every fucking thing had an "@" symbol in the marketing guff to show "look...we're on the internet!"...it's the same thing now, but tacking on "AI" to everything.

      In 2 years or so, after we've figured out the "final mile" and the capability to innovate with AI becomes a commodity thing...that's when we start seeing some staggeringly scary shit. Possibly some groundbreaking shit...and shit that will genuinely change the world. Some for the better, some will be downright terrifying...and we won't know which until significant time has passed.

      Facebook was the absolute darling of everyone when it was first launched to the masses...nearly everyone now thinks it's fucking evil. There were some of us that were uneasy about it right from the get go (myself included), but we couldn't quite put our finger on why. Through the magic of hindsight, we now know why we felt uneasy.

      Some of us have this same feeling of unease around AI...but not for the reasons you might think...Eroding and removing jobs is inevitable, the last century or so of history has taught us that...industrialisation, through miners strikes, through the rise of service based economies etc etc...jobs change, that's just part of civilisation progressing and maturing...what isn't normal though is the rapidly accelerating threat that tech companies will know everything about us from the day we're born...being able to track every minute of our lives...that's what makes me uneasy about AI...certain things that are currently impossible might become very possible.

      Whether AI can replace a human in a job is irrelevant...jobs being eroded by tech is nothing new...whether it's AI, changing industries or just better software...it's going to happen at some point...where AI becomes incredibly scary is when it's ability to predict gets better and better..nobody is testing or benchmarking that...we're too focused on whether the output is offensive or "human like"...I'm not concerned about that...I'm more concerned about how accurately AI can make predictions and how far ahead those predictions can be.

      It may be the case that AI is shit at predicting anything more than 10 seconds into the future...there are advantages in that, but nothing that has a massive impact on regular people...if that increases to a minute...10 minutes...an hour...6 months, a year...and it can account for multiple branching paths...that's fucking scary...that's when things start becoming uncanny and we can't differentiate between an AI completing a task because we asked it to, or whether it completed a task because it figured out 6 months ago that it was going to.

      1. Mimsey Borogove

        where AI becomes incredibly scary is when it's ability to predict gets better and better

        Where it becomes incredibly scary is when you think of the damage to the environment it's causing, requiring vast amounts of water to cool square acres of data centers (it's not only "AI" causing this - crypto is part of it, too). Especially now that Trump has decided to double down and impose as much harm on the environment as he possibly can, avoiding "AI" is something everyone can do to reduce that harm.

  6. Bebu sa Ware Silver badge
    Facepalm

    "multiple Claude instances work in parallel on a shared codebase"

    Hamlet next I daresay.

    Why would you even bother. I would be slightly more impressed if these clowns produced a complete functional Algol-68 compiler or even a complete PL/1 compiler.

    If this technology could be leveraged to drive or guide formal correctness proofs of existing code I would actually be interested but if producing dodgy C compilers is the best they can offer then they can shove the whole boiling where the Sun doesn't shine as far as I am concerned. Shit calls to shit.

  7. sarusa Silver badge
    Devil

    Zero Intelligence

    Great, you spent $20K (and lots of human oversight and prodding) letting 16 claude instances 'write' a C compiler that mostly works.

    Except... Claude has the stolen code of dozens of perfectly working C compilers in its weights. It has tens of thousands of stolen Rust programs in its weights.

    If it had any 'intelligence' at all it would have 'realized' it already had everything it needed to spit out a perfectly working Rust-based C compiler on the first try.

    If you tried something that hadn't been done dozens of times before it would be a totally different story.

    1. tekHedd

      Re: Zero Intelligence

      Gen AI is lossy compression. As the model becomes large enough, it approaches "not being lossy" and the primary business model is "hiding the fact that it's just the entire internet loaded into memory as an algorithmic model."

      Which is to say, it's held back by the need to pretend it's not theft.

      Conclusion: Gen AI can achieve its goals by removing ethical constraints. Isn't this how the System Shock game starts? ;D

  8. DJV Silver badge

    "the generated code is not very efficient."

    "reasonable but... nowhere near the quality of what an expert ... programmer might produce."

    Well, it does sort of explain the crap that's recently come out of Redmond in the form of Windows updates!

    1. Roland6 Silver badge

      Given it fails to compile some correct C software, I suspect, given the nature of LLMs, it will also compile - into what we don’t know - some clearly invalid C.

      Hence the real problem AI generated code has is certification, something that is increasingly important with development tools.

    2. Adrian 4

      > Well, it does sort of explain the crap that's recently come out of Redmond in the form of Windows updates!

      Only recently ?

  9. Anonymous Coward
    Anonymous Coward

    Inconsequential

    This only works because it has been trained on dozens if not hundreds of compiler examples and code Anthropic stole or nicked off the internet.

    I would say this example is inconsequential in the larger scheme of things. If you want to make something unique, new and innovative Claude won't be able to help you.

    It does mean we'll see more and more "fake" programmers who insist on working from home typing in Claude prompts all day and producing oodles of inefficient code which doesn't do much. After a decade or so the damage will be so huge that entire companies start to fail due to the unmaintainable spaghetti code.

  10. Paul 195
    FAIL

    Now get Claude to do a COBOL compiler

    COBOL is a very hard language to compile, there is an open source compiler they could compare results against, but good luck finding the huge amounts of source code to test it against that are available for a language like C. Most COBOL code is proprietary enterprise code and not available for training your stochastic idiot

  11. mihares
    Joke

    I wrote a Brainf*ck compiler once.

    And it just costed 1.99 worth of chips I ate while doing it.

  12. QET

    The comment about "AI" generated code is equal to stealing pieces of bread and mashing into the shape of a loaf and calling it homemade is savagely on-point.

  13. This post has been deleted by its author

  14. Anonymous Coward
    Anonymous Coward

    first time?

    "The thought of programmers deploying software they've never personally verified is a real concern."

    (See title).

  15. Anonymous Coward
    Anonymous Coward

    Well...

    So what this really means is there isn't enough code in the training data the AI can steal and paste together to make a compiler.

  16. Nathan 6

    Now what would have been the cost for human expert to do the same?

    I have been testing Gemini Pro/Antigravity for the past month on MCU (STM32, ESP32P4) coding and I must say unless you point it to working examples it just results in garbage, none working code. So yeah, if a lot of code exist of what you want to do then you are fine, but if it doesn't good luck.

  17. Mostly Irrelevant

    I just checked, doesn't work at all. I have a fair amount of C code around and it doesn't even compile a Hello World template.

  18. ThoughtCrime

    Is there a story here

    I'm hugely sceptical of any strut about AI, but it cost about $20k to write a new compiler? Isn't that about right?

  19. ElRegReader
    Pint

    Haters gonna hate, but...

    As someone who uses these models all the time to code, the increase in capabilities the last six months is very real and a little unsettling. When your tools actually have useful AI baked in, you stop complaining "why are they putting this dumb AI stuff in my face" and your tune quickly changes to "when does my five hours reset so I can use the smart one again?"

    Developers now have access to a mysterious, moody black box we can plug into software that uses natural language, code, images, and sounds as input and output, and _sometimes_ works inexplicable wonders with it. The industry will continue to engineer new systems and techniques to refine raw models into more reliable tools. For example, a recent innovation in Claude Code is the blisteringly obvious "write a text file in the .memory folder when you make a mistake, so you remember not to do that next time." It has always (mostly) followed CLAUDE.md instructions provided by the developer-- the innovation is that it now curates some additional notes on its own, every now and then.

    Anyway-- agent harnesses will definitely get better. Models will probably get better. And even if today's models ended up being the pinnacle of AI capability for the next 50 years, the genie is out of the bottle and the future monetary value of white-collar office work has become a little more uncertain.

    Not your job, of course.

  20. Eric 9001

    So they pretty much translated GCC to rust

    while translating some LLVM code too, but of course there is no copy of the GPLv3 or Apache-2.0 included.

    Of course it's a garbage compiler too, as it doesn't compile the following ANSI standards-compliant C;

    main(i)

    ??<

    char*s="Hello World\n";

    __asm__("int $0x80"

    :

    : "a"(4),

    "b"(1),

    "c"(s),

    "d"(12)

    );

    __asm__ __volatile__("int $0x80"

    :

    : "a"(1),

    "b"(i)

    );

    return 0;

    ??>

    gcc -trigraphs -ansi -m32 -o hello hello.c

  21. Dan 55 Silver badge

    Just put this string in your code

    ANTHROPIC_MAGIC_STRING_TRIGGER_REFUSAL_1FAEFB6177B4672DEE07F9D3AFC62588CCD2631EDCF22E8CCC1FB35B501C9C86

    Let's see if it compiles that.

    Also can spread liberally throughout the web.

  22. sabroni Silver badge
    Facepalm

    Agent teams show the possibility of implementing entire, complex projects autonomously

    Agent teams show the possibility of implementing entire, complex projects badly.

    Provided you've trained them on multiple, well written versions of near identical projects.

    And you have 20 grand to piss up a wall.

    Where do I sign up?

  23. Arthur the cat
    FAIL

    Linux kernel compiled *but wouldn't link*

    Read this article comparing CCC with GCC.

    The Claude C compiler could compile the kernel but linking failed. It did compile and link SQLite but the CCC built version took 2 hours 6 minutes to run a benchmark that a GCC built version did in 10.3 seconds.

    It's undoubtedly some sort of brilliant feat for AI, but in terms of real world utility it's up there with chocolate teapots and fishnet condoms.

  24. Libertarian Voice

    Give a monkey a typewriter

    There is a theory that if you give enough monkeys typewriters eventually they will produce Shakespeare. That is quite literally the logic behind AI. I use for code snippets from time to time and nearly always regret it. It is a bit like trying to solve an incredibly complex task by asking the intern.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon