back to article GitHub's boast that Copilot produces high-quality code challenged

GitHub's claim that the quality of programming code written with its Copilot AI model is "significantly more functional, readable, reliable, maintainable, and concise," has been challenged by software developer Dan Cîmpianu. Cîmpianu, based in Romania, published a blog post in which he assails the statistical rigor of GitHub's …

  1. PinchOfSalt

    What will it mean to be a....

    There's an underlying question here which I've found troubling.

    If people are going to use such tools to do things they couldn't do before, what will it mean to be a professional?

    For example, to be an accountant today you have to not only study and pass exams but also practice it for a period of time. Then you can rightfully claim to be an accountant and people won't call you out for overstating your competence.

    However, in future, with an AI widget telling a layman how to do accountancy, would that laymen therefore be able to claim they no longer need an accountant? And if the advice from the AI widget is sufficiently good that they are an accountant themselves?

    I pick this example as there is a bar set with a professional standards body, but this could apply to anything in the knowledge economy.

    1. Gene Cash Silver badge

      Re: What will it mean to be a....

      > However, in future, with an AI widget telling a layman how to do accountancy, would that laymen therefore be able to claim they no longer need an accountant?

      Sure, if he's stupid. AI can't even add up the number of Rs in "raspberry" so I certainly wouldn't trust it with my taxes.

      > And if the advice from the AI widget is sufficiently good that they are an accountant themselves?

      It's not. Just like a ton of people you're confusing real artificial intelligence with a program that spews out words based on a statistical model of which words usually follow other ones, which is all we have at the moment.

      1. PinchOfSalt

        Re: What will it mean to be a....

        I don't think I confused anything. I asked a question, you implied that this was something I agreed with, when it isn't.

        However, what you state is correct. There will be a lot of people who conflate the two and use generative AI to do things it definitely shouldn't.

        I watched a post on LinkedIn several months ago where someone said they were using GenAI to summarise differences between NDAs. Complete lunacy given the risks involved, but they felt they were being clever in avoiding legal fees.

    2. EricM Silver badge

      Re: What will it mean to be a....

      > If people are going to use such tools to do things they couldn't do before, what will it mean to be a professional?

      There will be less professionals. And the ones left will need to compete with AIs for jobs/customers willing to pay a premium for someone to speak to - instead of talking to something.

      > However, in future, with an AI widget telling a layman how to do accountancy, would that laymen therefore be able to claim they no longer need an accountant? And if the advice from the AI widget is sufficiently good that they are an accountant themselves?

      The future envisioned by supporters of an AI transformation would probably be more like an AI _being_ the actual accountant. The accounting company will morph into a small/med datacenter operations facility with just as many humans left to cover the not automatable, manual tasks.

      Then again, most people of today's service economy will have no longer paying jobs in that scenario - so instead of an AI that does taxes we probably will see AIs that generates perfect requests for welfare/benefits.

      1. Fonant
        Headmaster

        Re: What will it mean to be a....

        [Fewer professionals (professionals are countable)]

        I think the number of professionals with subject expertise will remain the same. The presence of computer-generated bullshit (stuff that looks plausible but may well be wrong) makes their jobs more useful, not less.

        1. EricM Silver badge

          Re: What will it mean to be a....

          > The presence of computer-generated bullshit (stuff that looks plausible but may well be wrong) makes their jobs more useful, not less.

          For the current state of affairs: I fully agree.

          However, the OP asked what would happen if laymen started using those tools - which IMHO implies a scenario in which they actually work as intended by the user, without hallucinations.

          And my post just extrapolated what is already happing to a certain extent to graphics designers and writers facing competition from ChatGPT, Dall-e and the like: Falling prices and fewer( thx :) ) people able to make a living off those skills.

          1. Richard 12 Silver badge

            Re: What will it mean to be a....

            which IMHO implies a scenario in which they actually work as intended by the user, without hallucinations.

            It doesn't. Laypeople use stuff that doesn't work all the time, because they don't know the difference.

            In many cases they only discover the problem later on, when something goes very wrong.

            Eg asking ChatGPT to do structural engineering calculations can seem just fine right up until the floor collapses.

            1. Anonymous Coward
              Anonymous Coward

              Re: What will it mean to be a....

              That'll be a popcorn moment in court one day sadly.

          2. doublelayer Silver badge

            Re: What will it mean to be a....

            "However, the OP asked what would happen if laymen started using those tools - which IMHO implies a scenario in which they actually work as intended by the user, without hallucinations."

            I don't think that's correct. People use things that are not perfect all the time, and sometimes it makes sense to do so. For example, let's use their own example of accountancy. I am not an accountant. I have not taken an accounting course. Yet I have learned a lot of the basic terms and can avoid doing the most idiotic things. Someone who employed me to manage their finances would not be making a good decision, but while their needs remained simple, it would probably work. The problem would come when they needed something more advanced and I either failed to do it or tried and demonstrated my lack of skills. Still, for someone who didn't have the funds to employ an actual accountant, someone with my level of knowledge might be good enough for something simple.

            I think the same will apply to LLM-generated code, with the slight mitigating factor that the quality is more random. If you choose the right model/specific layer above a model, one like GitHub Copilot that was specifically designed to generate code rather than one like ChatGPT which can generate stuff that looks kind of like code, you can get valid code for some simple programming problems. As long as that is all you need, you may get what you want. When you start needing things that are more complex, the models break down, and by relying on them instead of having someone who knows what they're doing, that invalid code is going to run and possibly cause damage. One major difference between this and the me as an accountant scenario is that, if I was somehow employed as a cut-rate pseudoaccountant, I am likely to identify when something is well beyond my area of knowledge and refuse the task. That means that the business needs to be most concerned with tasks that are slightly outside my area but I might be stupid enough not to recognize this, but they don't need to worry as much about big things because I will point out I don't know what I'm doing. An LLM doesn't do that, so no matter how ridiculous the assignment they're given, they'll produce some kind of code. As soon as people start asking for things that connect to systems, chaos should be expected. That's just the most immediate and obvious version of chaos. A lot of other options are worse.

        2. abend0c4 Silver badge

          Re: What will it mean to be a....

          professionals are countable

          And AI is not accountable.

        3. The Indomitable Gall

          Re: What will it mean to be a....

          If generative AI LLMs will not repeat grammatical "rules" invented by pedants in the 18th century and will actually accept real common English as correct, then maybe I can live with AI after all...

    3. jokerscrowbar

      Re: What will it mean to be a....

      I passed my two year Computerised Accounts course with a distinction. Does that qualify me to say that staff at ElReg have been AI enhanced for a while now and none of us even noticed the difference. Their absolutely professional articles are no more or less stinky than usual.

      If the cohort in the example only found 25 people that were any good at their job then of course the rest should be using Copidiot…

      … to write their job applications to McDonalds.

    4. Fruit and Nutcase Silver badge

      Re: What will it mean to be a....

      If people are going to use such tools to do things they couldn't do before, what will it mean to be a professional?

      Just a couple of weeks ago, I was assisting a colleague with a program - we needed to call a certain api at a certain point - I couldn't quite remember the exact parameter list but if I were at the keyboard, I'd have looked up the api and worked it out - instead, my colleague calls up Copilot and poses a question on what we are trying to achieve with this call - and it comes up with a 2 part answer - a method which calls the api function that I'd mentioned, and some other preamble code to call the method.

      In the end, what ended up getting coded was exactly the single line that I was looking to use, not all of the other padding that Copilot presented - however, if I were not there, then I suspect my colleage would have just done a copy/paste of that method. Just asking for brain-rot - pity - he's relatively one of the better programmers on the team,

      Time wise, looking up the api and this would have been about the same - and looking up the api would have also given an insight into the api/associated descriptions - and learnt something along the way, including caveats/limitations described in the documentation - with Copilot, the developer is oblivious to all that

    5. GNU SedGawk

      Re: What will it mean to be a....

      If the difference between the professional and the layman is simply access to a black box tool, there is no profession, only gatekeeping.

      I would suggest that the profession is full of things like, how to train a new starter effectively on the pile of hacks and black magic, we laughingly refer to as production.

      An understanding the contract you are signing is a trap, and you should walk away from the obligation.

      An ability to determine what parts of a project to not build, is very important. Prioritisation into MSCW style (Must/Should/Could/Wont).

      Sensible addressing schemes - why would I prefer a 10.0.0.0/19 over a 10.0.0.0/22 - what impact would that have for capacity planning when for example things have to be geographically redundant?

      I know what my addressing scheme is based on subnet usage, how is an LLM going to advise me based on cidr range, without also understanding?

      Here's a verbatim recommendation from a LLM - "subnets should be sized in powers of two". Well I'll just take my /25 and go forth and multiply shall I?

      I think beyond image generation and automated translation. Technical text generation is only as useful as the generated test suite. Which if it's coming from the same source offers no control.

      Sometimes for accuracy a sum of products is more accurate than a product of sums. Mathematically that's nonsensical - but for solid hardware related reasons with real numbers, it's a reality of numeric computing.

      I think the LLM is my overconfident junior, who doesn't know what he doesn't know and will copy and paste with abandon. - Leaving me unsure where that suspicious chunk of code came from.

      I can at least get rid of them if I find it's actually copied off some rando open source repo or whatever - I'm not confident about attribution from generated LLM code.

      1. doublelayer Silver badge

        Re: What will it mean to be a....

        I don't think "subnets should be sized in powers of two" was trying to argue against /25s, since the smallest IPV4 subnet you could create if it was would be a /16. I think it was making the equally useless point that subnets should contain 2^X addresses, for example your /25's 128 addresses. Otherwise phrased as "you shouldn't create subnets that are impossible to create" since a /25.356 for 100 addresses isn't generally supported.

        1. GNU SedGawk
          Pint

          Re: What will it mean to be a....

          That's a bit more logical tbf, still useless but indeed "you shouldn't create subnets that are impossible to create"

    6. katrinab Silver badge

      Re: What will it mean to be a....

      For accounting, the correct answer in England in 2023 may not be the correct answer in 2024, as tax laws change frequently.

      It may not be the correct answer in Scotland in 2023, because some tax laws are different in Scotland, or even if the tax law is the same, some of the other laws surrounding it can be different in a way that impacts it, like for example partnership law. An once you get outside the UK, they are even more different.

      Do you think large language models would be able to cope with that? I don't.

    7. CowHorseFrog Silver badge

      Re: What will it mean to be a....

      In basically every country, there is no professional board that certifies software engineers. Anyone can, and often does call themselves a software engineer.

      Unfortunately our modern society is plagued with people who chase and give themselves labels or titles when they have no actual skills.

    8. The Indomitable Gall

      Re: What will it mean to be a....

      But this is kind of similar to the massive off-shoring of entry-level jobs a decade or two ago (I was working in IT in the UK).

      The talk then was that only the "low value" jobs were being off-shored, and all the high value, highly skilled work was still here, managing and supervising.

      But didn't everyone in a high-value job start in an entry-level job...? Where do the next generation of high-value jobs come from if we off-shore the entry-level jobs that are a vital part of staff development? In the case of off-shoring, the IT and service industries are getting more and more high-value jobs offshored, because now there's more offshore staff looking to climb the career ladder than onshore, but with AI, you can't just expect GitHub Copilot to get so good at it's job that it can review the next generation of AIs to get them through their low experience.

      People need to be really good programmers to perform code reviews, and as programming gets shifted to AI, how are we going to be able to get people doing enough programming to learn to do reviews?

  2. BigAndos

    I use copilot in VS code every day. My own experience is more 50/50. I mostly use it for python scripts to be run via Airflow. It is quite good at boiler plate stuff I can’t be bothered to write (basic REST calls, mock data for unit tests, function docs) and it can give clues if you want to do something you’ve never done before. The new copilot edits feature is quite good as you can prompt it to edit across a set of files.

    It has some drawbacks though. It often lags behind on new library/tool versions (current knowledge cutoff is October 2023).

    It is prone to hallucinations and coming up with methods that don’t exist. I let it write some unit tests that passed so I moved on. Turned out it hallucinated an assertion method on a magicmock which ran without doing anything so the tests were pointless! Lesson for me there to thoroughly check its output.

    Also, it works best in a large project with established code patterns. If you ask it to write things from scratch it often comes up with garbage.

    So overall it is useful but needs using with extreme caution!

    1. Gene Cash Silver badge

      > Lesson for me there to thoroughly check its output.

      Really? And does it actually end up saving you any time/effort after all that? In my experience, the answer is "no"

    2. Anonymous Coward
      Anonymous Coward

      > It is prone to hallucinations

      AI doesn't hallucinate, it is prone to errors.

    3. IamAProton

      +1 on the boilerplate stuff and nothing else.

      the code is always nicely written, but properties, members, functions are made up on the spot when you ask for something more than the usual boilerplate and then you are wasting time testing the code or (worse) asking the chatbot to change it.

      And when you ask it to improve... good luck. "Hey copilot, the package you referenced is deprecated" "oh, i'm sorry, let me fix it: <package_name> --version <made-up_non-existent_version>"

      What bothers me is that the chatbot is always 'happy and positive' regardless on how much you curse it when it fuc*s up, we need chatbot that can suffer and cry.

      1. Yet Another Anonymous coward Silver badge

        We've found that the more obscure the field the more accurate the code is. Suspect that there are fewer examples to learn from, and those few examples are from more authoritative sources.

        For something like Vulkan boilerplate it's excellent

        1. Richard 12 Silver badge

          And of course, the more likely you are to end up provably breaching someone's copyright.

          For boilerplate that won't happen, but the line between boilerplate and enforceable copyright is fuzzy.

        2. IamAProton

          I agree. Asking the chatbot to write queries in Flux language (already EOL language for querying InfluxDb, so not gonna bother learning it) is surprisingly helpful.

    4. Tim 11

      What I find it is great for is if you want to do a simple task that you know lots of people have done before but you don't know the syntax or don't know the exact an API. For example "how to get all values from a set in javascript" or "how to generate a presigned URL to upload a file to s3 from python". these are things you could easily work out from the documentation but why bother?

      once AI has generated the code and you've tested it, it's usally pretty self-evident whether it's correct.

      However it is prone to hallucinations. It once had 10 goes at code to list the allowable keys from a typescript type at runtime before I decided to research and found it was impossible.

    5. Mike007 Silver badge

      My "test" for AI generated code was simple. I tried creating a skeleton node based web app with a login screen and a page to manage users. Basic requirement for every standalone app, and my feeling is that if it can't manage that properly then it's not a good start.

      It has been a while since I did this exercise, but if I have to tell it that you don't store passwords in plain text then that's a fail for me. If after giving it the instruction to hash passwords I then need to go back and be more specific about using the same hashing algorithm to validate the password as when setting it... Long story short, I wasn't impressed.

      1. Roland6 Silver badge

        So we can expect AI to create readily exploitable insecure code, particularly given the amount of code out there that examples poor design and security….

      2. Gordon 10 Silver badge

        Long story short, you were holding it wrong.

      3. Mike007 Silver badge

        Just to post a followup, I was actually asked to to this task again at work this week. My boss told me to try out a few AI code assistants. More tools. Quality wise... Let's just leave it as there are now more tools?

        One example: I asked one model to change the docker compose file to reference an environment file, to share the mysql password between containers instead of putting them under each container. It made the requested change, and also deleted the volumes at the same time. Just because.

        I will not use Microsoft's offering so can not comment on if that is better, but people say the tools I tried are comparable...

  3. Peter Prof Fox

    AI assistance

    Surely AI is just a tool to be used not a robot trusted to invent finished solutions. Who expects it to work straight out of the box?

    1. AndrueC Silver badge
      Unhappy

      Re: AI assistance

      Who expects it to work straight out of the box?

      Sadly too many people do. And it's usually those who are already hard of thinking.

    2. Gordon 10 Silver badge

      Re: AI assistance

      Thats precisely the point, it doesnt always.

      Just like Spellcheckers or Grammar checkers, you use them at need knowing they will sometimes be wrong, but they generally offer a quick shortcut for when you cant be arsed.

      Like every IT invention ever - ignore the hype and crack on with using it how you see fit. It has a learning curve that's all.

      Lets face it - if most IT solutions are shitpits when you dig enough turtles down. The gold standard ones are rare and always have been.

  4. Bebu sa Ware
    Coat

    Try telling that to...

    Try telling that to the authors who don't write good prose, the recording artists who aren't good musicians, the video makers who never studied filmmaking, and the visual artists who can't draw very well.

    Easier to hand bouquets to those few not on this list. Authors that cannot write are pretty much the rule, u-tube is saturated with the second and third while artists that couldn't draw water from a well are probably responsible for the infantile "illustrations" found on the walls of toilet cubicles.

  5. ChoHag Silver badge

    > And those with substantial coding experience also see value in AI code suggestion models.

    Perhaps, but there is also value --- more, in fact --- in stack overflow and in the example sections in documentation.

    How many trillions?

    > Try telling that to the authors who don't write good prose, the recording artists who aren't good musicians, the video makers who never studied filmmaking, and the visual artists who can't draw very well.

    No problem: Your output is as good in your respective field as the amateur robo-coding is in ours. Trash.

    1. Gordon 10 Silver badge

      And yet there are notable sucesses in each field who dont meet those bars.... methinks a your example is flawed.

  6. Zippy´s Sausage Factory

    It sounds like the test was about how fast you code something you already know how to do, rather than getting out of your comfort zone. It is, of course, the latter that I'm sure management are way more interested in getting right faster.

    1. katrinab Silver badge
      Alert

      Manglement expect to be able to ask the "AI" thing to write them an Oracle replacement, deploy it, and migrate all the data across; and expect it to do everything perfectly within the next second or so.

      Obviously that is never going to work.

  7. Anonymous Coward
    Anonymous Coward

    Try telling that to the authors who don't write good prose, the recording artists who aren't good musicians, the video makers who never studied filmmaking, and the visual artists who can't draw very well.

    what a really petulant swipe at the OP? like are there bad authors, artists, and musicians? yes. do I rely on their skill to keep me alive? not at all.

    bad coders, though... the chance of my life being impacted negatively by poor programming practices and even AI-slop generated botshit might be pretty low, one hopes... but it certainly would not be zero.

    1. Richard 12 Silver badge

      Exactly

      An author, artist or musician who creates stuff I don't like barely affects me.

      At worst I might purchase a book I don't enjoy, and donate it to a charity shop.

  8. Fonant
    Facepalm

    GenAI = GenBullshit

    Generative AI is basically a bullshit generator. It creates plausible stuff that looks as though it's correct, but there's no way of knowing whether it is correct or not, without a human carefully checking.

    Using GenAI code without any changes is gambling: the likelihood is that the code does what you want it to, but there's no way of knowing how that code was generated (or what data was used to train it).

    1. Anonymous Coward
      Anonymous Coward

      Re: GenAI = GenBullshit

      > Generative AI is basically a bullshit generator. It creates plausible stuff that looks as though it's correct, but there's no way of knowing whether it is correct or not, without a human carefully checking.

      I've had human colleagues who are like that too.

      1. Anonymous Coward
        Anonymous Coward

        Re: GenAI = GenBullshit

        Isn't that a default for many management positions?

        Basically, you could replace those with AI and nobody would notice the difference.

        Hmm, that could save the company a LOT of money..

        :)

    2. CowHorseFrog Silver badge

      Re: GenAI = GenBullshit

      AI is like asking doctor google for medical advice instead of going to a real doctor.

      Its sad people really think they can get good advice when the internet is known to be plagued with bullshit.

  9. juice

    Maybe...

    They used AI to summarise the study's results!

  10. Steve Davies 3 Silver badge

    Really?

    Quote "Somebody who doesn't know how to program can use Claude 3.5 artefacts to produce something useful."

    How is that different from a junior dev using one of the numerous framework libraries that already exist and while producing something that sorta works is a maintenance nightmare especially finding out what bit of which framework has changed.

    This AI stuff is all a bit exercise in Hype. and is full of other 4-letter words like crap, shit etc.

  11. MOH

    Did Copilot produce that “high quality“ graph that doesn't add up?

    This seemed suspicious:

    " GitHub's inadequately explained graph that shows 60.8 percent of developers using Copilot passed all ten unit tests while only 39.2 percent of developers not using Copilot passed all the tests."

    Bit of a coincidence that they add to 100? I thought maybe they'd meant the of those who passed all the tests, 60.8% used Copilot

    In fact, the graph in the linked GitHub blog post shows that of those not using Copilot, 39.2% passed all the tests and 62.2% didn't. Out of 101.4%.

    Meanwhile, of those who did use Copilot, a total of 98.6% did or didn't pass all tests.

    It's possibly the percentage are supposed to add vertically, but that's crap presentation.

    Either way that graph looks like AI produced garbage.

    1. CowHorseFrog Silver badge

      Re: Did Copilot produce that “high quality“ graph that doesn't add up?

      Does anyone know how many times MS repeated the study and how many resuls they edited before publishing the results ?

      We dont even know what the questions were or the results of their study group...

      No serious journal in the medical field or any other professional field would accept this publication as proper scientific research.

      Its no different to accepting the results of the last Syrian election from that champion of human democracy the glorious leader of freedom mr Assad. Not sure why he is running away from Syria, when in his last election he got 95% of the votes, its almost like every single Syrian loves him and yet he runs.

      Im confused.

  12. Mike 137 Silver badge

    Grammar please

    "code written with its Copilot AI model is "significantly more functional, readable, reliable, maintainable, and concise"

    than what? Grammatically, "more" is a comparative, so this statement is meaningless unless it includes a secondary subject to compare the first with. So (not surprisingly) this is pure hype.

  13. CowHorseFrog Silver badge

    How many pull requests has co-pilot made to any PUBLIC dot-net repo ?

    If the answer is none (yes i know the answer is none just chcked) why not ?

  14. Grunchy Silver badge

    If Copilot was any good at coding it would be able to improve on its own source code, in which it would become like Algernon: unlimited potential.

    (Not for another coupla years or so.)

  15. sarusa Silver badge
    Devil

    No surprise

    This is the same Microsoft which claims, very obviously liar liar pants on fire , that they're now making $10B a quarter from AI.

    There is no way in hell that is true without massive accounting 'fraud' like claiming every Windows license is AI because it comes with Copilot. Of course that's not actual fraud, because most accounting is fraud but technically legal - but it's not true in a way any honest person would recognize as 'AI is making us ten billion dollars a quarter'.

  16. Will Godfrey Silver badge
    Boffin

    Lest we forget...

    In my experience, anything I've had to read and think about I'm likely to remember, and it will pop into my mind in later similar situations. Something I've just copied gets forgotten almost immediately.

    1. Anonymous Coward
      Anonymous Coward

      Re: Lest we forget... [That is the intend aim exactly !!!]

      You are exposing the *real* push behind AI !!!

      Create a large dependent group who use AI but cannot replicate what the AI does. [Gradual loss of skills]

      Over time this group will have lost the skills required to replace the AI and 'BIG Tech' will have a revenue stream fixed for many decades.

      Just maybe the AI will get good enough to be useful BUT this is not a primary objective .... at all !!!

      [The more 'useless' the AI is the more the deskilled will use it ... as they have no other choice !!!]

      :)

  17. imanidiot Silver badge

    Why Python?

    I'm not a programmer (I'm a mechanical engineer) but Python seems like a poor choice for the chosen subject anyway. And if you're only going to count syntax/style errors then Python, where a single space out of place is an error, seems like an especially poor choice as it is very easy to inflate the error count in such a scenario. Choosing a white space agnostic language might yield very different results (which might well be why they didn't choose it).

    1. doublelayer Silver badge

      Re: Why Python?

      Probably to make simple tasks, the only ones their bot does well most of the time, something a person is willing to write for a study. Python has several downsides, but one advantage of it is that there are various simple tasks that don't require lots of basic plumbing code to get working right. Reading in a json file can be three lines because there is a json parser in the standard library and you don't have to be too specific about types (which can be annoying for larger tasks but speeds up smaller ones a lot).

  18. XxXb

    Farming analogy

    The horse no longer pulls the plough, 100 men no longer harvest the crops or sow seeds but the farm still produces food.

    This industry is changing radically and the losses (dressed as efficiency) will be huge.

    Learn to plumb springs to mind.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like