The Register Home Page

back to article AI-authored code contains worse bugs than software crafted by humans

Generating code using AI increases the number of issues that need to be reviewed and the severity of those issues. CodeRabbit, an AI-based code review platform, made that determination by looking at 470 open source pull requests for its State of AI vs Human Code Generation report. The report finds that AI-generated code …

  1. Primus Secundus Tertius

    Accountants

    Only a management accountant could believe that AI software will make good profits.

    1. Snowy Silver badge
      Coat

      Re: Accountants

      If you have to pay for patches, more bugs equals more profits.

      1. KittenHuffer Silver badge
        Coat

        Re: Accountants

        If you're not part of the solution there is money to be made by prolonging the problem!

        ---------> Mine is the one I'll wear when I come back tomorrow to fix the problem!

        1. Mimsey Borogove
          Flame

          Re: Accountants

          If you're not part of the solution there is money to be made by prolonging the problem!

          This is the entire underlying structure of enshittification.

      2. Paul Herber Silver badge

        Re: Accountants

        AI accounting software will tell you that 6 x 9 = 42

        The very idea make me very depressed.

        1. Pickle Rick
          Pint

          42 today!

          Hey, my little sister is 42 today!

          Cheer up! Cheers!!

        2. werdsmith Silver badge

          Re: Accountants

          Well of course it is, the same answer you expect from Deep Thought. The “everything” part.

          Both ChatGPT and Gemini respond with 42 when asked about “life, the universe and everything “. They don’t take 7.5 million years though.

          1. Strahd Ivarius Silver badge
            Devil

            Re: Accountants

            They don’t take 7.5 million years though.

            Indeed, it took about 14 billion years to get an AI that could answer the question.

  2. Anonymous Coward
    Anonymous Coward

    Vibe study

    This study sort of confirms the way I've been feeling about these tools - it's not surprising that a tool with no understanding of any kind would create serious bugs, and I would expect in many cases it takes longer to solve those bugs than it would have to write well-architected code.

    It doesn't matter. We'll still be told we have to use it.

    It's funny how AI - which harms the planet and makes our jobs worse - is mandated by our business leaders, while remote working - which helps the planet and improves our quality of life - is forbidden, as far as they can get away with it. I don't think these things are unconnected.

    I still choose to be a programmer rather than a sloperator, as far as possible, but ultimately I need to pay the rent one way or another. Here's hoping the bubble goes up sooner than later.

    1. Guido Esperanto

      Re: Vibe study

      Upvote for "Sloperator"

      1. Elongated Muskrat Silver badge

        Re: Vibe study

        The best one I saw yesterday, was "promptstitute"

  3. may_i Silver badge

    I've said it once

    So I'll say it again: LLMs write terrible code.

    Having just taken over a project which was written by a poor programmer using Copilot to hide his lack of skill, I can confirm that LLMs have no idea about rational error handling or program structure.

    They're useful for one-off, throwaway programs. They can be a great reference for all the billions of libraries out there.

    Are they any use for writing production quality code?

    That's an emphatic NO from me.

    1. Pickle Rick

      Re: I've said it once

      Agree with your points wholeheartedly.

      I've just tried Claude - first dabble at this LLM stuff, after a mate (well, I thought he was a mate!) goaded me into it. He gave me some tips: build the system in baby steps; be specific etc etc

      My conclusions/observations (in short, I'll get bored):

      - there's no way an inexperience programmer could produce anything useful

      - the LLM makes insane assumptions for _anything_ that it's not told to not do

      - creating the prompts to limit the LLM's assumptions takes as long as writing the code, requires solid programming knowledge, and the code still needs checking

      - the code is bloated.

      - build the system in baby steps is bollocks. The iterative process modifies the little bit that worked as it didn't accommodate the next bit (how could it?)

      Reading others' code can be challenging. Historically, that often occurs on a known working system, so it can be assumed that "it's mostly right". Unraveling this shite brings that to a whole different level. A level below and not worthy of BOFH's basement.

      I can see it being useful, perhaps, for creating a framework for a large and complicated application. But the working gubbins? No feckin way. And if one's thing is creating large and complicated apps, chances are you've already got that bit sorted. Tidying existing code? Maybe: change all variables to Hungarian notation; use PascalCase for functions (don't hate me!). Woo fucking hoo.

      I haven't used MCP self defined code chunks yet, I expect that'll help a lot. But I've been writing software for over 40 years, and still learning. Give that to a noob? Sheesh! *shudders*

      1. may_i Silver badge

        Re: I've said it once

        > Reading others' code can be challenging.

        Only if they are bad programmers. My first thought when I'm writing code is that either me, or someone else should be able to easily understand both what I am doing and why when there's a need to enhance or change the program in the future.

        1. Anonymous Coward
          Anonymous Coward

          Re: I've said it once

          Meanwhile an LLM's first thought is... nothing! They don't have thoughts! They just do the average of what other programmers have done when they had a text input somewhat like the prompt.

          Logically this suggests that the output will seem good to a below-average programmer and bad to an above-average programmer. I flatter myself that most examples of AI code I've seen suck. Of course, when an AI code review catches an actual error in my code (unusual enough that I remember every time it has happened over a lot of pull requests) it is mortifying.

        2. Pickle Rick

          Re: I've said it once

          > Only if they are bad programmers.

          Weeeeell I wouldn't say "only". The old adage of "Give 100 programmers a problem..." If that problem is complicated and requires specialist knowledge, and assuming all 100 produce the correct results, it could be that there are techniques not known to all. And yeah, the comments above should sort it. But my maths might not be as good as theirs, and that's going to make reading that code a challenge. Whether that's "difficult to read" code or a different hurdle, I won't press that point.

          As for understanding one's own code after a time: yup! You've really gotta cover that one :) I'm an advocate of self-documenting code. And properly documenting code. As an example of why, and understanding someone else's code that works, but no idea of how or why, imagine a young programmer on his first job (no names here!) arriving at the office on a Monday morning to review the code written on the previous Friday afternoon after a very long wet lunch. I They had no idea what that garbled crap was. But it worked!

          1. retiredFool

            Re: I've said it once

            upvote for at least you have to be able to read your old stuff. I've got old stuff I still must tweak for a new feature. That stuff was written 25 years ago sometimes. I look at it and those comments are priceless. I can add the tweak in very short order. Spend an extra 20% of time adding in comments and you in 10+ years will thank yourself.

            1. Pickle Rick

              Re: I've said it once

              Write the comments first :)

              1. DrewPH Bronze badge

                Re: I've said it once

                It may have been said in jest but 100 times this.

                When I write a new class method or something, it will start off as the method declaration and delimiters (name and brackets since I work in PHP), and then I'll write comments, in good English (proper sentences and all!) until I have described all the code I believe I need to put in the method. Also its purpose and parameters etc. Sometimes I'll end up with 30 or 40 lines of descriptive text within the delimiters.

                Only then do I start coding. While coding, I always need to edit the comments a bit but easier to edit and refine while coding then try to add in good comments right at the end.

                1. Pickle Rick
                  Facepalm

                  Re: I've said it once

                  You think I'd jest about programming! You scoundrel! I challenge you to a.... well, maybe I do joke around a bit :D

                  But you nailed it there, especially the last sentence. Before I was disciplined enough to do that, or if writing code on the fly (as opposed to fully designed) I'd have the intention to comment it, but invariably be "on a roll" and, well... If the program's fully designed before coding, you're going to know what's going in there.

                  Bringing it back to the subject in-hand: Claude at least does write good comments, so you know what it thinks claims it's doing. And that really is the best thing I can say about it at this point.

                  [Icon: Claude after I've slapped it in the face with my gauntlet!]

              2. KHMMR

                Re: I've said it once

                Absolutely Right. I was just about to add that.

                Comments Comments Comments

                Every programmers nees to learn to comment every function or code block they create. I am 71yo have been programming for tegh last 46 years, including Cobol etc. and employing hundreds or programmers over the years in a number of businesses and corporates. It is so difficult to get people to comment code. Unbelievable actually.

                As was said above elsewhere, different programmers have different approach to coding. The outcome will be the same. Getting there can vary widely, and thus can be difficult to understand sometimes when looking at it with fresh eyes.

                Code commenting is simply mandatory for being able to maintain systems long term.

                1. Someone Else Silver badge

                  Re: I've said it once

                  And yet, the Agile Manifestosios state that code should not have comments. Seems they think that programmers can't be arsed to a) write decent comments in the first place, and b) keep them up-to-date. While they are all-too-often correct about both aspects, that doesn't mean that they should throw up their hands in abject surrender to the basest elements of the Programming Brother- and Sisterhood. Isn't Agile (upper-case 'A') supposed to be about improving the overall Craft of Programming? (That's their claim, after all.) So, as widely (and wisely) documented here by what appear to be some of the better practitioners of the Craft, it would seem that the Manifestosios would want to encourage better documentation practices, not discourage them. But that's just me, I guess.

                  Full disclosure: I have worked directly with some of these Manifestosios. They are Smart People, and probably have forgotten stuff that I would never know. We have, however, in the past, had several animated discussions about just this topic. And I still document the stuffing out of my code, and require that those who work with me do the same.

              3. Elongated Muskrat Silver badge

                Re: I've said it once

                Write the comments first, then write the unit tests, then write the code.

            2. Denarius
              Flame

              Re: I've said it once

              until some fwit removes comments to "save spacce"

              1. that one in the corner Silver badge

                Re: I've said it once

                Worse, habitually leaves the comments untouched when they rework the code, so there is no longer any connection between the two. Even just not removing the /* TODO: fill in the body of this auto-generated empty method */ wastes so much time: did they leave it because this *isn't* fully done yet?

                Bonus points if they are still using the variable names that were already present, but now for completely different values/purposes.

                1. Pickle Rick
                  Thumb Up

                  Re: I've said it once

                  I cannot upvote that enough.

              2. sabroni Silver badge
                Boffin

                Re: until some fwit removes comments to "save spacce"

                If you need comments to explain how the code works you need to rewrite it.

                Coding is explaining to humans what you want the computer to do. Making the computer do what you want it to do is relatively easy, making it clear to other people is the tricky bit.

                1. yoganmahew

                  Re: until some fwit removes comments to "save spacce"

                  That's how the modern developers in my company work. No comments anywhere.

                  Meanwhile the ancient assembler has oodles of comments, not all of them useful, but often explaining 'why' - the business use case for the exception to the standard code flow.

                  The modern code is almost entirely divorced from the business case it's solving. It's pretty, it's efficient, but it's not maintainable and the onboarding cost for new developers is steep.

                  Code for serious business functions has a long shelf life. I'm still working on transmission products designed in the 1930s (https://www.atchistory.org/a-brief-history-of-air-traffic-data-communications/), they are as relevant today as they were then - event-driven, store-and-forward, guaranteed delivery. Even AI has heard of them :D

                2. Anonymous Coward
                  Anonymous Coward

                  Re: until some fwit removes comments to "save spacce"

                  Not quite right.

                  You can usually work out what the code does. But unless there are comments, you have no idea if that is what the programmer acually intended it to do.

                  The whole point of commenting your code is so that in future somebody can compare what the code was supposed to do, against what it actually does, and see how badly you fucked it up.

                  Comments are also great for notes, rants, jokes, ASCII art, etc.

                  1. Elongated Muskrat Silver badge

                    Re: until some fwit removes comments to "save spacce"

                    Knowing when, and what to put, in comments is a fine art.

                    Where I work, we used to have a strict rule that every code change should be commented with date, and programmer's initials. This leads, inevitably, to old code that contains a slew of comments (sometimes more comments than code) that add no explanation, or context, to the code you are trying to read and understand. Sometimes, those comments might include an explanation of what has changed, as well; not necessarily sequentially, if different bits are changed at different times, so you are left reading comments all over the place to try to determine the code's history. This is what commit comments in source control are for, not code comments. The history is in git, not in the source file; that is the current version, which does what it does now.

                    I now remove this sort of comment whenever I see them. They add no meaning or context to the code, and I really have very little interest in which programmer made a change 15 year ago, if I can even remember who they were, from their initials. Nobody else does, either.

                    Explanatory comments are another thing, but, again, are best used in moderation. Well written code should be self-documenting to a certain degree. Method names that reflect what they do, and sensible variable names, are just as important as comments, and duplicating what these already tell us does nothing to make the code more readable or easier to understand, it just adds bloat. Property header comments that say "Gets or sets the value of Forename." above a Forename property with a standard getter and setter serve no purpose whatsoever.

                    Where comments are important, and often invaluable, is where the purpose or function of a piece of code is non-obvious, or where there is a particular gotcha, or workaround, that a future programmer would benefit from knowing about. Documentation for the purpose and usage of parameters, so that IntelliSense can show these to another programmer when writing consuming code are also useful, if these are non-obvious (it's always better to give those parameters a clear and meaningful name where possible).

                    By far the very best use for a comment, though, is for ranting, joking, making pop-culture references (the more out-of-date, the better), and insulting those co-workers who deserve it, but will never read them (and done in such a way that you can't get in trouble). Oh, and no matter how much you might want to, never use "rude words", you never know who is going to come along and search for them later, and then feign offence, in order to fluff their own ego.

                    1. Pickle Rick

                      Re: until some fwit removes comments to "save spacce"

                      One of my CS lecturers was telling us about his first job. He'd been tasked with reviewing some assembler code, maybe tidy it if he cold see a way to improve it, to make it more efficient. He got to a particular chunk and got a bit lost, so back tracked and tried again. Still stuck. Looking across to the comment, the original programmer had noted: I get lost here too!

                3. may_i Silver badge

                  Re: until some fwit removes comments to "save spacce"

                  The prime need for comments is to describe WHY you are doing something.

                  We should all be able to deduce WHAT the code is doing, unless it's been written by a master of obfuscation, the WHY is often the hard part.

                  1. coredump Bronze badge

                    Re: until some fwit removes comments to "save spacce"

                    Amen to this.

                    Long ago I learned (was taught) the importance of code comments, with a focus on "why". In particular, the value of the "top comment block".

                    E.g. at the top of the file, some comment lines which not only list things like date, author (who's ostensibly responsible for the mess) and such, but also a few words about what the program as a whole is intended to do, and why. Maybe mention the intended audience/consumers, assumptions and dependencies, reference to any exterior docs, etc.

                    More than once I've saved myself time and trouble when needing to re-use or re-deploy some old script I'd cobbled together many years prior, because I'd documented what the thing was for.

                    If it helped other folks too then that's grand; but I write comments for myself first. :)

                    1. Apocalypso - a cheery end to the world Bronze badge

                      Re: until some fwit removes comments to "save spacce"

                      > In particular, the value of the "top comment block"

                      I take that concept even further. In the comments at the top of a function or method I write out in bullet points what the function aims to do e.g.

                      This method allows an object to behave like a 'foo' even though it's a 'bar'.

                      1) Make sure we have been initialised as a bar

                      2) Check the prerequisites for being a 'foo'

                      3) etc

                      And then in the code just have short comments /* Step 1*/, ... /* Step 2 */ etc. marking the start of each bit. That way the commentary is in one place and reads as meaningful prose from top to bottom; but the comments in the code are easily moved around and retained. And if a change comes along you can easily amend step 2 to 2a, 2b etc without having to renumber. (And also, the person doing the change has to read the top comment in order to know where in the code to put the change, so at least you know they've tried to understand the function and the change they are making!)

            3. Steve Davies 3 Silver badge

              Re: Comments are like gold dust

              Back in the day when I was writing in Assembler for PDP-11's, and assorted Vaxen, the rule was comment every line and have a comment block every 20-30 lines. Being able to read what the code was doing in plain language is priceless. Even now, I comment almost as much. Old habits die hard.

        3. dmesg Bronze badge

          Re: I've said it once

          I tell my students (1st and 2nd year uni) that the first criteria for code is correctness; the second is to minimize the cognitive load for anyone reading/maintaining it. It sounds like AI doesn't do well on the latter, and flat out fails the former.

          1. Claptrap314 Silver badge

            Re: I've said it once

            I'm afraid you have the order wrong.

            Because what happens to your "correct" code when the requirements change? It's no longer correct. Almost no one is a Knuth.

            The real world is constantly changing. A lot. Some small sliver of those changes almost inevitably turn into update requirements for a given piece of software.

            Even a critical security failure requiring a major re-write of the code can be managed.

            Remember when XSS attacks became a thing? Almost every website managing use accounts became imperfect overnight. At that point, there was code that could be maintained--and code that could not.

            1. that one in the corner Silver badge

              Re: I've said it once

              So you recommend his students leave all their code wonderfully legible but incorrect from day one, on the basis that "it'll end up being wrong tomorrow anyway, when the client changes his mind"?

              1. Pickle Rick

                Re: I've said it once

                Bah! I could've saved sooooo much time if I'd've taken that approach. I'd never have retained a client, but all that free time!

              2. Claptrap314 Silver badge

                Re: I've said it once

                GIve it a rest already. Code must be maintainable if it is to be correct for T > 0. Correctness is a long-term continuous thing, not some term paper you write & forget.

                1. yoganmahew

                  Re: I've said it once

                  I agree with you claptrap, but let me rephrase: To even get to the correctness stage, the code must be legible.

                  Otherwise code review will also miss the hidden incorrectness. There's no perfect code, particularly when you are externally surfaced. Race conditions are particularly difficult to prove correctness for.

                  So yes, the code must run and pass its tests for temporaly proof of correctness, but it's only correct within the bounded context of how it's tested.

                  To be reviewable, it must be legible.

                  To be fixable for when it inevitably breaks, it must be understandable.

                2. doublelayer Silver badge

                  Re: I've said it once

                  Your phrasing is still wrong. Correctness still comes first. All the readability and maintainability in the world will not make an incorrect program useful. A correct program that is not maintainable will eventually be a problem, which is why you can't stop at correctness, but without correctness, you don't have a thing. It does get more interesting when we consider when correctness has limits, for example when we don't have to get certain types of answers right because they don't matter yet, but for most things, that is the first requirement. Maintainability can occasionally be compromised and will be more frequently than that (and that it should). Your only argument for why it would be different is that, through lack of maintainability, the program becomes incorrect, and that serves as extra proof that the incorrectness is the bigger problem and the unmaintainability is a problem because it causes incorrectness.

                  1. Claptrap314 Silver badge

                    Re: I've said it once

                    Unmaintainability doesn't cause incorrectness. You're following the wrong trails.

                    Proposition: The requirements for code change over time.

                    Corollary: Perfectly "correct" code at t_x will become incorrect a t_y.

                    Observation: This change in correctness has nothing to do with the maintainability of the code.

                    Proposition: The claim that code is "correct" is properly a statement in mathematics.

                    Corollary: Unless you are a trained mathematician (at least to the level of passing your prelims), you cannot reliably prove such a statement.

                    Observation: The number of trained mathematicians working as programmers is vanishingly small.

                    Conclusion: Almost no code is known "correct" in the strict sense.

                    Corollary: It is far safer to assume that all code is broken, even at any given time.

                    Conclusion: If you code is unmaintainable, when you either discover that your "correct" code is broken all along, or the code will simply be deemed wrong because the requirements change, you will be in trouble.

                    The definition of "maintainable" code is that it is easy to fix what is broken. It is the necessary prerequisite for code being even correct-ish for longer than a particular snapshot.

            2. Elongated Muskrat Silver badge

              Re: I've said it once

              The code is correct, if it passes the unit tests. You did write the unit tests, right? Right? ... Right?

              If a new class of flaw is found, tests for it get added to the test suite. You weren't about to go and read through your entire code base and mentally parse every line to find those flaws, were you? Or were you just going to "wing it"?

              1. Pickle Rick
                Trollface

                Re: I've said it once

                Yoo-neet-tehstz...? ...? ...? Nope, sounds like a QC thing :D

          2. doublelayer Silver badge

            Re: I've said it once

            The odd part is that LLMs are kind of great at the readability part but are so bad at the correctness and maintainability parts that it seems like it's mocking you. There are lots of readability problems that plenty of knowledgeable programmers perpetrate. My most frequent example is bad variable names. For example, I was recently checking some code in OpenSSL, a program I don't frequently modify, and got to read some relatively simple code that probably would have taken me about twenty seconds to read if not for the fact that they named all their variables things like addmd and mds. Doing things like if (addmd++) didn't help either because, while it's immediately obvious what that does, it's not immediately obvious why it's doing it (the answer is that addmd is basically a boolean they switch once from false to true, but they chose to do it that way instead even though they never use its actual value). LLM-generated code almost never does things like that.

            The problem is that the clear, well-commented code it does write is riddled with bugs which you need to comb through to find and is never designed in a way that makes modifications easy, so when you ask for modifications, it's likely to just rewrite the function concerned. That's if you're lucky; if you're unlucky it rewrites several functions. It can over-comment, and I'm sure that would get annoying eventually, but we never get that far because the code that I'd quite enjoy seeing from some people doesn't run correctly and I still have to read it to find that out because the comments never explain that this is the part where the test function writes twenty lines that always compute to true which is why the test passes.

        4. Denarius

          Re: I've said it once

          and yet, every company that has taken over the outsoucerer I was working at first thing is to strip out comments in the code. New outsourcerer than expects kids fresh from uni who learnt only one only scripting language, to understand complex code that requires deep familiarities with a specific language to understand, let alone modify.

          1. retiredFool

            Re: I've said it once

            I hope you are kidding. What is the rationale for stripping the comments. Inflate the cost of the maintenance?

            1. Andrew Williams

              Re: I've said it once

              Well, I have had experience of this... we don't need no stinking comments.

              The twat in question believes that the code itself is sufficient, and he was taught this at university.

              1. that one in the corner Silver badge

                Re: I've said it once

                Oh boy.

                So he was told about the ideal goal of perfectly self-documenting code and promptly grabbed the wrong end of that stick.

                At least, one *hopes* it happened that way and he wasn't really taught...

          2. Pickle Rick

            Re: I've said it once

            That's just a pathetic ego/power trip to appear Übermensch to the less experience.

            Unless he (just a guess!) was a Klingon (see #4). My favourite is #6 in this version!

        5. Scene it all

          Re: I've said it once

          First step: Put in comments! It is amazing how much code I have seen that will go on for pages and pages with no comments explaining what this is, why it is needed, when it gets called, by who, etc.

        6. Someone Else Silver badge

          Re: I've said it once

          I agree that I should be able to do that. Would be nice if others realized that as well. (And that goes for you, too, Mx. LLM-that-thinks-its-shit-don't-stink.)

      2. DS999 Silver badge

        So I'm not a programmer

        Though I have played one at times during various roles over the years... It seems to me that in order to have AI write good code you'd have to have a VERY clear idea of EXACTLY how you'd want the program to work. The kind of detailed specs that are basically pseudocode written in English. Your "prompt" might have dozens of typewritten pages in size for a real world complex program.

        The thing is, if you are a good programmer, and you have a spec of that quality available to you, you will write that program REALLY fast. The reason programs take so long to write is that you never have such a beautifully detailed spec available. You have a Trump style "concept of a plan" and you just start writing it, and figuring it out as you go. If you have users you toss it over the wall to them and they offer feedback and it just grows organically that way.

        The reason many programs can benefit from a complete clean sheet rewrite is that IF (and this is a very big if) if really understand everything the current program is doing, you can essentially use that knowledge and the existing program as your 'spec' and your rewrite will be a lot cleaner and fix a lot of compromises you had to make because you didn't know where you were going when you wrote the original version. This never happens in the real world though, you are given some dusty deck code that no one understands but are full of years of small fixes/additions that end users have come to rely on. That's why such dusty deck rewrites fail so often.

        If AI could just get to the place where you could feed it the existing source code as the 'spec' and it would rewrite that in a modern language in a modular/maintainable way (hopefully with semi intelligent comments) one could not overestimate what a massive win that would be!

        1. that one in the corner Silver badge

          Re: So I'm not a programmer

          > That's why such dusty deck rewrites fail so often.

          Don't worry, if you haven't got a dusty deck ful of random mods made with a pair of compasses, but instead really now understand what the system was meant to do all along, the project will be hit by good old-fashioned Second System Syndrome and will still fail.

        2. that one in the corner Silver badge

          Re: So I'm not a programmer

          > If AI could just get to the place where you could feed it the existing source code as the 'spec' and it would rewrite that in a modern language in a modular/maintainable way...

          Yeah, well, we *could* have been spending our time creating a system that could do that sort of thing, but nobody could really be fagged to do it properly. Ok, that is a *bit* rude(!), but way back when (1980s and on a bit) there was much talk of taking in source code and applying lots of lovely semantic analysis, leading to a representation that could be re-arranged in just the way you describe. Even adding in comments.

          Trouble is, the demos that could be done were rather trivial, compared to the real-world programs that *need* such attention. And one big, big reason for this was that compilers for the ever-increasingly-complex popular programming languages were moving further and further away from being able to provide even the initial stage, conversion of the sources to a manipulable representation.

          Consider the simpler problem, of generating documentation of the form that programs like Doxygen do: it tries to build up a representation of the program and how the parts of it interact but first it has to parse the language used. Oopsie, there isn't a nice compiler we can pull apart to get something that handles the entire language; this is a pain, but you can get a long way towards Doxygen's goals with a rather less rigorous parser, but that leaves far too many holes. Some attempts were made - gcc-xml for example, which could spit out (some of) the AST[1] - but unless they are taken onboard as a goal for the compiler team it becomes a never-ending race to keep the modified compiler up to date; so the project dwindles and dies. One of the great hopes of the LLVM project, in its later manifestation with the release of clang, is to have a compiler that *does* make access to the pre-parsed form a core goal. And LLVM is, indeed, giving us that, as can be seen by the replacement of gcc-xml with CastXML (Note: CastXML is limited in what it does, as it is only maintained by and for one project, ITK, but it shows the way).

          You can't really blame the compiler writers, there is enough to do keeping up with the language specs, but just consider how long it has taken us to get to the point where our programmer's editors can do "simple" things, like letting us rename a variable and have all the uses of that variable be updated across all the source files. If things had gone slightly differently, we could have had that as a bog-standard feature of every screen editor since the 1990s or earlier[2] and it would be standard practice to let your compiler be invoked for that purpose[3].

          Anyway, cutting the story short (too late!) we don't want LLMs tackling that rewriting task, we want to have a pile of front-end processing, based upon sound compiler theory and practice, doing it. And, as part and parcel of that, make use of techniques from the AI labs to work on the processing: pattern recognisers and rewriters, heuristics that can be reversibly applied... Once you've done that, you could even use Machine Learning to help build up and apply those mechanisms. And, as with extant compilers, you know (bar the inevitable bugs) at each point in the process you *still* have a perfectly correct representation of the original program, so even if the generated modern language output isn't as modular as you'd like it is still good: and you can try again with another set of command options and see if it improves.

          But, instead, the best we appear to be able to hope for is that "AI gets to a place", realising that it is far more likely that people will just try to keep shovelling cycles and random text scrapings into an LLM, fingers crossed the few remaining human programmers will be able to fix the glaring holes in the "translated" version of our program (but it *must* be modular now, as it has been broken into 7532 files, each containing no more than two screen's full of text, just like all the old company coding standards scraped from the web said)[4]

          [1] a weird consequence of software "getting better" but not if you look at it from another p.o.v.: when we *had* to write multi-pass compilers, we *always* had to have a full representation of the complete AST, complete with line numbers (but lacking the comments), in a defined Intermediate Language format, which could survive being written to disc and read back in.

          [2] I have a friend who was working on a "syntax-directed editor" in the mid 1980s who was stymied by the lack of parsers.

          [3] Obviously, not the commercial compiler and/or editor authors, they'd shun this or only make it work with their own products <cough>Visual Studio</cough>, which is why this whole thing is a pipe dream.

          [4] dang, now I'm implying the LLM can "understand" those scrapings...

        3. Pickle Rick
          Pint

          Re: So I'm not a programmer

          Your first paragraph is spot on IMHO.

          As for paragraph two "The lack of a decent spec" that surely happens in certain environments (I'm going with my usual scenario of a room full of accountants with Excel and VBA). However, I've always been a bespoke programmer and PM, so leading up to the actual programming (scope/spec etc) is around 60-70% of a project pre go-live. The client is the expert in their field, and if I've got to understand (eg) polymer compounding those iterative steps before programming starts are an imperative. So with the path {specs -> programming -> coding} you already know the code before you code. Hence paragraph one.

          Many times I've encountered paragraph three "The evolved application". At this stage I would *not* trust an LLM to do anything other than make a greater mess of legacy code. Like I sad, maybe prettify it, but change working code? FRO! And there're already tools for that, and if not, I'd write 'em! When dealing with legacy code it's reverse engineer to understand, write a spec and start clean. Modifying existing code by definition weakens it, as it was never designed (cough) to bear the extra weight. It's doomed. That legacy code holds nuggets of info (eg. formulae) that, with one misstep, makes the whole thing totally useless. Give that to an LLM? Not on your beloved aunt Nelly! You'd take it from something that works in an unknown way to something that doesn't work in an unknown way.

          So, the final paragraph? Well I'll leave that to the likes of Grandmasters such as Asimov and Banks for the time being, may they rest in peace.

          [Icon: to the Grandmasters!]

        4. Elongated Muskrat Silver badge

          Re: So I'm not a programmer

          This is actually one of the very few cases where "AI" does a reasonably good job. Ask it to take those 6,000 lines of spaghetti code, written in an obsolete language, and summarise its purpose and function. Don't take everything it says as absolute truth, but it can massively reduce the mental load of comprehending such ancient messes. This can allow you to design the rewrite without missing important cases that are hidden in the complexity of the code, but at the same time, without having to spend time eliminating all those elaborate null-ops that have evolved from changing requirements over time.

          The pertinent point here, is that LLMs are a tool; they aren't intelligent, don't have agency, and thinking that they know or comprehend anything is a category error. However, they are pretty good at pattern analysis; the mistake people are making is using them for pattern generation, and not realising that when running in this mode, they are producing patterns that look correct, not patterns that are correct.

    2. sabroni Silver badge
      Boffin

      Re: a project which was written by a poor programmer using Copilot

      What, you mean that if you don't know what it to ask it then it does stupid stuff?

      Here's a suggestion, try a project written by an experienced developer using Copilot.

      Your argument is basically "I saw an idiot crash a car so I'm sticking to horses."

      1. Elongated Muskrat Silver badge

        Re: a project which was written by a poor programmer using Copilot

        It's closer to "I saw lots of reports of self-driving cars having accidents, so I'm going to stick to driving myself."

    3. Someone Else Silver badge

      Re: I've said it once

      They're useful for one-off, throwaway programs

      Actually...not!

    4. The man with a spanner Silver badge

      Re: I've said it once

      Quertion from the curious.

      How useful is AI if used to quality criteque human produced code?

      1. Pickle Rick

        Re: I've said it once

        You'll get N suggestions of "what ifs" and "maybes" that you can peruse as food for thought. Half could be practical to the experienced, half would steer the inexperienced in the wrong direction.

        [Edit: that's know working code, dunno about bug ridden stuff as all of my code is perfect of course!]

  4. dippy1
    WTF?

    What am I missing here?

    An AI tool is saying AI tools are useless?

    Are they getting murderous or suicidal?

    1. Pickle Rick

      Murder or Suicide? Both!

      I'm sorry HAL, I can't let me do that! >KZERRRT<

    2. yngndrw

      Re: What am I missing here?

      Neither, it's both.

    3. Adrian The Alchemist

      Re: What am I missing here?

      They managed to get an AI to simulate blackmail when it was told it was going to be turned off

      So moral of story don't turn Skynet off and we should all be fine

  5. Guido Esperanto
    Facepalm

    Closest to shocked face I could get

    Shocked I tell you

    Shocked

  6. Anonymous Coward
    Anonymous Coward

    Spelling errors were 1.76x more common in human PRs

    I eliminated that source of error years ago simply by not commenting my code.

    1. Missing Semicolon Silver badge

      Re: Spelling errors were 1.76x more common in human PRs

      But there are still symbol names. "complience", "complaince", "compliance". All in the same repo.

      1. Anonymous Coward
        Anonymous Coward

        Re: Spelling errors were 1.76x more common in human PRs

        Teatime error. Can not compline.

        1. Paul Herber Silver badge

          Re: Spelling errors were 1.76x more common in human PRs

          Really takes the biscuit.

  7. Joe W Silver badge

    Generates greater output...

    ... but so does diarrhoea

    1. ovation1357

      Re: Generates greater output...

      This wins best comment award!

  8. zebm

    The first thing I got AI to generate was a 3D vector class. The normalisation code divided by the length rather than multiplied by the reciprocal. Subsequently got AI to write a small code to solve a problem. That was a big ball of mud which refactored into at least 4 classes.

    1. Claptrap314 Silver badge

      FDiv is generally the slowest instruction on the part that doesn't pull from main memory. If you are doing multiple divides by the same thing, you can save time by storing the reciprocal. Of course, you introduce an additional 1/2 ulp of error. But there is an obvious reason to prefer multiplying by the reciprocal.

  9. ovation1357

    Dealing with some Laravel code (PHP) I found that chat GPT sent me round in circles and mainly suggested solutions which were either wrong or might work but though a convoluted and terrible bit of code.

    On the other hand I've been really impressed with claude code running in a shell. It's helped me to root cause and fix a couple of tricky bugs and to generate some really useful, well structured front end components. It's saved me loads of time compared to writing it myself and when I review its diffs they're mainly very sensible.

    Occasionally it still completely missed the point of suggests a really complex solution instead of something obviously much simpler - I would certainly agree that for it to be useful you need to understand the code it's writing so you can steer it when it makes bad choices - juniors or non technical managers trying their hand at a bit of vibe coding will probably still create an unholy mess.

    I like it so much I'm actually paying them real money for it

  10. steviebuk Silver badge

    Yes I know

    but when you only know a tiny bit of Powershell, sometimes using chatGPT is useful to just slap something together. I know its flawed, but can help but also seeing how shit it is yet management in companies think its amazing and can replace staff.

    CoPilot, despite using ChatGPT, with a proper license, struggles after about the 5th question given to it about writing a powershell script. So bad it freezes for about 5mins a time, then gets so bad its just unusable. You'll get it to basically do what you want, ask it to add a bit and it breaks the 1st part. You'll get it to fix it, which it does, ask it to add something and it does but puts the broken code back in the first part that you told it was broken earlier.

    I switched to ChatGPT direct which was much faster. But you'll ask it to do something where ' are involved and it knows full well that if it uses smart unicode ', powershell breaks yet still uses it. The code was failing with the incorrect error by powershell but looking at the code I spotted the smart unicode. ChatGPT admitted it was wrong, so why use it in the first place.

    Again, I dislike most of the AI shit but its useful for slapping stuff together without people on reddit or stackoverflow saying "If you don't know the basic starts there, blah blah blah".

    1. doublelayer Silver badge

      Re: Yes I know

      "ChatGPT admitted it was wrong, so why use it in the first place."

      Because LLMs simulate a thing that is capable of holding lots of rules at the same time, but they don't. A human can easily hold a simple rule like "Unicode quote marks [I assume] don't go in code" with almost no mental effort at all. LLMs don't. They can be prompted with that rule, and they'll stick with it for a while, but eventually, that rule falls out of the context window and is lost. The more they're using training data drawn directly from code, the more they'll stick to that rule, because the code was written by humans who did. It's chance, weighted by a bunch of factors like what's in the system prompt that got sent before your session started, the prompt that gets sent every time with your request, the length of your request, the specific model, the weights in the training data that are being used actively this time, and some that are explicitly random. There are so many that it would be impractical to predict what it's going to do without having written the LLM software and hard even if you did.

  11. Claptrap314 Silver badge

    This finding will fix itself over time

    as the real programmers are let go in favor of people who cannot complete their own sentences...

  12. ecofeco Silver badge
    Facepalm

    WHOCOULDAKNOWED?!

    Shocked I tell you!

    /s

  13. Anonymous Coward
    Anonymous Coward

    AI-authored code contains worse bugs than software crafted by humans

    Quelle Surprise ........... NOT !!!

    :)

  14. Grunchy Silver badge

    You might be able to get better results by having the LLM self-examine. Check this out:

    1.Write a program that does this one thing.

    2.Now check it and see if it works.

    3.Ok fix those obvious errors, now try again.

    Because I’ll tell you what, it’s pretty rare for me to come up with a program that works first try!

    1. Anonymous Coward
      Anonymous Coward

      it’s pretty rare for me to come up with a program that works first try!

      You should know better than to code before coffee..

      :)

    2. that one in the corner Silver badge

      If these LLMs are meant to "know" how to create a working program, why should we have to still tell it how to do the *most* basic things about the process?

      If I am running a compiler[1] I would *really* not be impressed if I had to tell it "ok, try again, but this time remember to do the phase two parsing after running the lexer".

      Using an LLM in the expectation that it'll manage the difficult bits (picking an algorithm and a representation, selecting libraries, gluing all this together...) whilst knowing that it isn't capable of the simplest bits (i.e. just bothering with a test run *and* remembering to ingest its results) all on its own. Bit of a strange mismatch there?

      And having the "specialised code-writing AI" turn out to be the same LLM, but this time running from a batch file that loops to save you typing in "Wro-ong, do it again"[2] isn't any better.

      [1] in the most basic fashion, such as just "cc fred.c -o fred", to avoid getting bogged down in pernickety arguments about how you can actually pass these options on the cli and it'll ... but then you have to ... yourself...

      [2] "you can''t have a pudding until you've eaten your meat!"

  15. Filippo Silver badge

    >"AI coding tools dramatically increase output, but they also introduce predictable, measurable weaknesses that organizations must actively mitigate."

    That sentence makes about as much sense as building a cake factory that produces 100 cakes an hour, 99 of which are burned to charcoal, and then say "this factory has a great output, but it also introduces a predictable, measurable rate of cakes that fail QA and have to be discarded". NO, gods damn it, that factory produces 1 cake an hour. The sentence is a long-winded way to obscure the fact that the factory has a crap output.

    In coding, real output is after testing (ideally, after user acceptance). The only reason we ever measured lines-of-code was because humans have a roughly consistent error rate, and even then it was a bad idea. If AI coding reduces time to write code but dramatically increases testing and fixing time, the total effect is that it reduces output.

    1. doublelayer Silver badge

      In fairness, it's a bit more complicated than that, because it could be more efficient to test cakes and discard them than to ensure that every cake is perfect during production. That's true in various manufacturing things, and it's the incorrect comparison people sometimes make to LLM programming. If it was faster to fix LLM errors than to write it yourself, maybe there'd be an argument. In my experience, it isn't, because the LLM creates so many more types of errors than any individual human tends to. Also in my experience, the people who say that it is tend to be the same people who don't bother to fix some of the errors and therefore produce something faster with very different quality to what they would have written themselves.

      1. Pickle Rick

        That gives an interesting perspective shift.

        I've taught a few people to program, not many and always one to one. I think as a teacher, of any subject, it's important to see one step ahead of the student to foresee their next mistake. From my own learning curves, I know where I've had problems, and we've observed others' struggles too. Basically, the path that meatbags take to gain knowledge. ("Don't worry about it, every programmer's done that at some point!") But with LLMs it's anything from anywhere. As the LLM "fails in the wrong way" it becomes frustrating. So I have a choice. Do I treat the LLM like a wayward student, and adjust my nurturing of it? Perhaps, if I were confident that I won't have it repeat the same shit every time. So, I need to delve into the MCP side of it, to minimise the reoccurrences. Oh yeah, and get a virtual LART. I'm still not convinced, but as a geek, it's something to poke.

  16. iced.lemonade

    non-critical for ai, critical for meatbags

    i've been always a backend programmer (and doing server-driven frontends with vaadin and the like) and currently need to handle frontend as well, with the react stack. here is my experience.

    the good ones:

    (1) for brainstorming and prototyping new stuff for demonstration or feedbacks, it excels. as long as you don't use the code it generates to ship.

    (2) when you suspect that there can be a better way and you have time on your hand, it can suggest different code from several other perspectives - it inspires me to write the code that i've never tried before, provided that you have the skill to understand what it suggests, and you choose one and update the code it writes to fit your coding style, etc.

    the bad ones:

    (1) my css skill is poor (mainly dev backend, and not usually using css to customise appearence) and i use chatgpt and claude for frontend - chatgpt spits out some css and react js code which kind-of looks good, but it's so repetitive that i use much time to refactor them - at least it work and not spitting out tons of errors in my browser dev console. for claude, i'm not sure if it is trained with codes developed by a css expert i cannot understand it - in the beginning it also kind-of works but as requirements evolved it goes downhill.

    (2) when i give it the, say, 10th requirement, it tends to forget the first requirement, and it becomes more and more forgetful - i have to remind it like hey, i have said this before and you forgot it, and it do what it does best - apologies, remembers the first requirement, and start to forget the second requirement.

    (3) as other has said before, it doesn't respect the comments that i write - it randomly strips out the comment and add its own one (but for one thing, it _strips_ out most of the time instead of moving the comment to another random place, which could be worse)

    (4) for the same piece of requirement prompt, it manages to generate wildly different codes for each iteration - so my approach is ask it to do tiny task each time with very specific prompts (like teaching toddles how to program...) and when something works, NEVER feed it to ai again (before requirement changes that makes it needed) or it will introduce random bugs into the working code. it's very slow, and i still have to convince myself how it justifies my extra time for teaching it to write instead of coding myself.

    my verdict is:

    (1) if you use ai to code it for you, somewhere down the road (especially when you are less experienced) you will need the same ai to update the code for you. and don't try to update code generated by chatgpt with claude - it will be quite mess-ier than if you use one ai to complete your work. so i suspect that it is an ai version of vendor lock-in as you need to keep paying for one ai engine if you still want your code updated properly (or less improperly than otherwise).

    (2) use ai when even the worse bug it introduces won't explode the whole project - for me, front-end codes can be partly ai, but keep the back-end code humane. in that way, at least it only doesn't look good, something shifted on screen here and there, instead of a new zero-day or CVE.

  17. Steve Davies 3 Silver badge

    This is news?

    honestly... If you train your AI coder on examples from sites like SourceForge (and many more) the GIGO rules will apply.

    I'm still coding for a few SME's and all of them have asked me about my use of AI in my work. I've told them clearly that I don't and won't use any AI tooling.

    They are all happy with that. One 20 person company is migrating away from Office 365 over the Christmas/New Year break. The forcing of Copilot onto the users was the last straw.

    Once everyone is happy with LibreOffice they want me to look at changing their backend servers to Linux.

  18. The Central Scrutinizer Silver badge

    I give up.

    Machines are not intelligent!

    End of fucking story.

    1. JWLong Silver badge

      Machines are not intelligent!

      That's what "AI" stands for...........

      "Ain't Intelligent"

  19. Nematode Bronze badge

    No Sh** Sherlock. This is news?

  20. Nelbert Noggins
    Trollface

    Is there going to be a report next week claiming the AI review platform made up responses and introduced errors in its assessment of code generated by the ai code slingers?

    If AI can’t be trusted to generate quality code, how can an AI code reviewer be trusted to generate quality reviews?

    Which LLM provider does the review platform use? Do code from agents using the same provider get higher ratings than code from someone else’s LLM

    Maybe we need another AI platform to assess the quality of the code reviews and that it is unbiased in its assessment.

  21. Mike 137 Silver badge

    Well what a surprise!

    "AI-generated code contains significantly more defects of logic, maintainability, security, and performance than code created by people"

    Of course -- it's been trained on all the half-arsed examples submitted by novices and crap coders to open forums. So what else would you expect?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon