back to article ChatGPT's odds of getting code questions correct are worse than a coin flip

ChatGPT, OpenAI's fabulating chatbot, produces wrong answers to software programming questions more than half the time, according to a study from Purdue University. That said, the bot was convincing enough to fool a third of participants. The Purdue team analyzed ChatGPT’s answers to 517 Stack Overflow questions to assess the …

  1. Boris the Cockroach Silver badge
    FAIL

    We tried it

    And then tried a subset of an infinite bunch of monkeys hammering at an infinite number of keyboards.

    The monkeys made better code

    1. Anonymous Coward
      Anonymous Coward

      Re: We tried it

      Sorry, bit of ambiguity there.

      By "try it" did you mean ChatGPT or Stack Overflow?

      Of course, your conclusion can apply to both equally well.

      1. Hello Anonymous

        Re: We tried it

        Are you still here? I told you you were outdated

        AI, in the present format, is a search paradigm. As such it can search and find only into written by humans texts.

        1. sabroni Silver badge
          FAIL

          Re: AI, in the present format, is a search paradigm

          ChatGPT introduces errors and mistakes into the results it produces.

          How is that a "search paradigm"?

          Search engines don't shuffle the words of the results they link to.

          1. Hello Anonymous

            Re: AI, in the present format, is a search paradigm

            Shuffling is another matter: after a paragraph is found in its contexts, it is rewritten anew. However, no one knows and cannot say in prior that there was something more or less true in the paragraph. Google, for example, solves the problem due to the use of popularity, when a lot of people manually decide how truthful the paragraph is.

            The new paradigm does not allow the use of "popularity" and there is a solution, which this is personalization, as the creation of individual AIs. What I called lexical clones 20 years ago: you deal with individuals you trust to, instead of strangers.

        2. Antipode77

          Re: We tried it

          How useful would a calculator be to you, if it was incorrect on average every second time (53%) you used it ?

          Pretty useless, i would say.

      2. Boris the Cockroach Silver badge
        Unhappy

        Re: We tried it

        Tried it with Java using a simple serializable class for holding a bit of text ... failed

        Tried it with some 6502 assembly code........ illegal command .

        If i've got to spend ages tuning the input to chapgpt in order to get decent code out... I may as well save myself the bother and write the code myself...

        1. Hello Anonymous

          Re: We tried it

          More convinient to construct a robot that programs without programmers involved. Now Gitlab and OpenAI ChatGPT act as intermediaries between the programmer and his customer on the one hand, and the computer on the other. Even though it doesn't make any sense! The customer may well do without a programmer. OpenAI got financed and soon no more programmers

  2. Rikki Tikki

    Soooo....

    ChatGPT spouts bullshit and is usually wrong. Maybe promote it to a management position?

    1. Flocke Kroes Silver badge

      Re: Soooo....

      Aiming to low. Take out the racism filters an run it for president.

    2. Snowy Silver badge
      Coat

      Re: Soooo....

      Yes bullshit but good looking bullshit!

      1. timrowledge

        Re: Soooo....

        At least at the beginning it was confidently telling people I was the British astronaut geologist in Apollo 17. Whilst also being in charge of MI5 in Cairo.

    3. JoeCool Silver badge

      Re: Soooo....

      "Even when the answer has a glaring error, the paper stated, two out of the 12 participants still marked the response preferred. The paper attributes this to ChatGPT's pleasant, authoritative style."

      Clearly a bright future as a tech / dev manager.

      And *that's* how AI will take over the world, by co-opting middle managment. The Skynet model is waaay too much effort.

      1. Antipode77

        Re: Soooo....

        You know, maybe this could be used as a means to keep authoritarians out of power.

  3. Justin Pasher

    Useful as a guide, not the end-all-be-all

    I've typically gone to ChatGPT for some more obscure technical problems that I struggle finding meaningful answers to on Google (nowadays, it seems like Google gives a few pages of mostly unique answers, then it just starts repeating itself). I've asked things like how a particular daemon config needs to be written to accomplish X when the documentation doesn't give you enough details or maybe just a real quick script that I don't feel like writing, like a batch file to loop through a list of subdirectories and create a separate ZIP file of each one (I do more bash, not batch).

    In most cases, I'd say the answer ChatGPT provides is at least mostly correct. Usually at a minimum, it leads me in the right direction to solving the problem. I think therein lies the difficulty it will have at being the big job replacement tool for many technical-type roles. If you don't understand the nuances of the concept you are dealing with, you probably can't figure out how to fix little things that are wrong. Your best bet is just asking again and seeing it can fix it. However, that often leads you down a rabbit hole of frustration.

    For example, I was asking questions about using some PowerShell commands to do something. It kept giving me commands (which were valid) with parameters that were not. I had to keep correcting it by saying, "Command X doesn't support the Y parameter". It would apologize, then continue to give answers that simply did not work. I had similar results when asking it how to do some routing/firewall config for a switch. It kept giving me directives that didn't exist for my model or firmware version, even when I told it what I had.

    That's why I see ChatGPT as simply another helpful tool, but not something that you should expect will give you exactly what you want or need.

    1. Anonymous Coward
      Anonymous Coward

      Re: Useful as a guide, not the end-all-be-all

      "I do more bash, not batch"

      Ever considered just installing bash and leveraging what you do know instead of what sounds like a lot of time going round in a loop piddling about with ChatGPT?

    2. Richard 12 Silver badge

      Danger, Will Robinson

      In my experience, when prompted in subjects I know well, ChatGPT produces somewhat to very wrong answers most of the time.

      Whether right, inaccurate or completely wrong, it produces equally confident language.

      It is therefore extremely likely to produce confidently wrong answers in subjects that I do not know well, or at all.

      I cannot tell whether the text it is spewing are approximately correct, dangerously wrong, or merely somewhat inaccurate.

      Therefore it is clearly worse than useless.

    3. Wellyboot Silver badge

      Re: Useful as a guide, not the end-all-be-all

      When it comes to writing code GPT being 'Mostly correct' is a good start because you can generally see where it goes wrong, same applies to SO answers.

      What happens if a complete novice asks for a recipe and the result includes 'Use raw chicken as a salad garnish' or a simple brake bleeding procedure that omits to explain why even a small amount of air behind the piston is really dangerous.

      Having a reasonably good knowledge of a subject should be in place before asking GPT for assistance otherwise the answers will eventually kill someone.

  4. Anonymous Coward
    Anonymous Coward

    Participants ignored the incorrectness

    > when they found ChatGPT’s answer to be insightful. The way ChatGPT confidently conveys insightful information (even when the information is incorrect) gains user trust, which causes them to prefer the incorrect answer.

    Really stressing the key word here: INCORRECT.

    This isn't exactly a new behaviour from SO participants, which has been going downhill for a while.

    If anything good is going to come from this work, the very best outcome would be to give SO a damn good wake up call: your traffic is going down because you are as crap as ChatGPT but at least the LLM is polite about it! Not that management can actually do anything to change SO answers (and, more importantly wrt politeness the comments) for the better, is there?

    1. xyz Silver badge

      Re: Participants ignored the incorrectness

      Google results usually plops out a few SO results and SO can be useful... Gave me a nice solution about a year ago for when you don't know the JSON model when you're model binding.

      The actual answer (from a Russian bloke) was only about 4 years old, but every other solution I looked at presumed you knew the incoming model and went on in full Nazi fashion like you were a cretin if you didn't model bind by default.

      The model design given by the sender was quote.. "nearly right" to which I went ape.... Bloody open source people. I wasted a week over that, so thanks Russian bloke and SO.

      1. Anonymous Coward
        Anonymous Coward

        Re: Participants ignored the incorrectness

        "but every other solution I looked at presumed you knew the incoming model and went on in full Nazi fashion like you were a cretin if you didn't model bind by default."

        That is the behaviour that has been dragging SO down in recent years IMO. And a lot of it feels like "I can shove my oar in here and tell everyone that I participate in The Community by giving answers on SO" without any regard for whether that is actually useful to anyone.

        There is still plenty of good stuff on SO (as you found) - the older stuff is still present!

  5. Ace2 Silver badge

    I know it’s fashionable to hate on SO, but I think it’s a great resource. Particularly the debates in the comments under the top answers. Even if my question isn’t directly answered it almost always sends me looking in the right direction.

    (Of course you have to understand the answer and test that it actually works, yadda yadda.)

    1. tfewster
      Terminator

      Stack Overflow sort-of has peer reviewing of answers, so you're more likely to find something that works and covers edge cases. ChatGPT is more "I'm feeling lucky".

      1. Anonymous Coward
        Anonymous Coward

        "But seeing how this command could blow your server away, you have to ask yourself: Am I feeling lucky?"

  6. Doctor Evil

    MO

    "The way ChatGPT confidently conveys insightful information (even when the information is incorrect) gains user trust, which causes them to prefer the incorrect answer."

    Isn't that pretty much the way a con artist operates too?

    1. Anonymous Coward
      Anonymous Coward

      Re: MO

      And politicians, CEOs, management, estate agents,etc.

      1. TRT

        Re: Estate agents

        I once went to view a house that was described as having "An enormous potential"... Turns out there was a national grid pylon in the back garden.

        1. Antipode77

          Re: Estate agents

          Must have been voltage potential.

      2. Martin
        Happy

        Re: MO

        And politicians, CEOs, management, estate agents,etc.

        As the man said - con artists.

  7. FeepingCreature

    In my experience, ChatGPT often gives wrong answers where it gets the broad idea right but makes some mistake that's easy to correct for me. Its mistakes are, interestingly, often uncorrelated with human mistakes. That's why I find it useful.

  8. DS999 Silver badge

    Pretty sure everything ChatGPT knows about programming

    It scraped from stackoverflow!

    The problem is it does so without understanding, so it can combine correct and incorrect information in a nice wordy way that I guess people like? I suppose if you're knowledgeable enough to know what bits it got wrong you'll be fine. Though if you are that knowledgeable you probably don't need its programming help...

  9. ComputerSays_noAbsolutelyNo Silver badge
    Joke

    ConGPT

    It probably wrong, but is sounds so nice.

  10. Anonymous Coward
    Anonymous Coward

    Not my experience ...

    but then you need to know what to ask and how to ask it.

    That said, I mostly use ChatGPT to rewrite or refactor code or - and it's biggest use by far - to arbitrarily reformat whatever pretty HTML a site spits out into a csv.

  11. thondwe

    It's Boris

    ChatGPT aka Boris - Appears to have a nice pleasant demeanour but spouts wrong answers a good chunk of the time?

    1. Anonymous Coward
      Anonymous Coward

      Re: It's Boris - Question to managers

      Would you hire a human with the exact same behaviour ? Someone who is wrong but confident ? Would you hire this person for your customer facing line of business or for your back-office infrastructure ?

      If not, why ?

  12. TRT

    Hm

    An articulate, thought provoking and insightful article. Plenty of explanation, well written.

    Obviously by an AI, so I don't believe a word of it.

  13. KroSha

    I've used ChatGPT a few times recently, as I've started doing some Powershell, which isn't my forte. It took about 10 iterations to get a script that worked; testing and feeding the errors back in, tweaking the requirements and generally fiddling about. That was about half a morning, to get a script that would have taken me days to research and write on my own.

    I've also fed snippets from SO into it and had the bot comment it fully, so I can understand what's going on.

    Great tool; use with caution.

    1. Anonymous Coward
      Anonymous Coward

      > I've also fed snippets from SO into it and had the bot comment it fully, so I can understand what's going on.

      No doubt it was very confident and convincing when it made its comments.

      Which is where the risk lies.

      > use with caution

      Hopefully you were sufficiently doubtful and took the time to verify what it said - and that the SO snippets were actually working and useful ones in the first place. But will everyone be so diligent?

  14. Alistair Wall

    "Stack Overflow may want to incorporate effective methods to detect toxicity and negative sentiments in comments and answers in order to improve sentiment and politeness."

    Or leave them in, to help readers distinguish them from AI.

  15. 4789765364

    Working as intended

    This is it working exactly as intended, the original purpose of these LLMs was only to generate text that is stylistically indistinguishable from its training data. It does not and can not care whether what it's producing is factually correct. It's purely a side effect that they ever produce factually correct answers.

  16. Pete 2 Silver badge

    Still better than people?

    > Our analysis shows that 52 percent of ChatGPT answers are incorrect and 77 percent are verbose

    Without knowing how that compares to the human-supplied answers, it is impossible to form a rational opinion.

    My personal experience of seeking help on forums is that many responses are wrong. Many more are answering an entirely different question while the rest are either passive-aggressive replies, arguing that the question is wrong, showing off or disagreeing with what others have posted.

  17. Anonymous Coward
    Anonymous Coward

    Don't worry! Artificial intelligence is a self-learning system, and soon it will become much better. Consequently, programmers will be out of work. Very soon. All programmers, to a single one. I don't like them all, boring dudes.

  18. JulieM Silver badge

    This should not surprise anyone

    In the absence of any way to assess the correctness of an answer, humans tend to prefer a confident delivery style.

    Religious leaders have exploited this phenomenon for millennia.

  19. Kristian Walsh

    Oppositional Networks train for plausibility, not accuracy.

    Okay, this shouldn’t be a surprise at all if you dig in to how GPT and similar are trained.

    First, you get an AI that can categorise files (e.g., “this is an image of two kittens playing scrabble”, “this is C source code for a quicksort”, etc.). Then you get another AI that tries to generate files. When the second one can produce documents that the first accepts as matching the description, you are ready to release.

    Now, I’m sure you can already see the problem here. The first AI has no knowledge of whether: a. the original descriptions were correct (the kittens may actually have been playing ludo, the C code could have had a bug), or b. why those descriptions were correct. All it did was build an enormous, opaque network that gave the correct description for its inputs. And the second one had no knowledge either, it just kept making shit up until the first AI accepted it.

    Doesn’t that sound like that guy* you hired who could do a great interview, but didn’t know anything about anything once you put them to work?

    Basically, we have created machines that can bullshit better than a human. I will leave it to you to decide what level of concern that should attract...

    __

    * yes, it’s almost always a guy. My interviewing experience taught me that female candidates are almost always honest in interviews, males are about 50/50 between truthful and braggarts.

    1. KroSha

      Re: Oppositional Networks train for plausibility, not accuracy.

      "Basically, we have created machines that can bullshit better than a human."

      Better add Politician to the list of jobs under threat from AI then.

    2. FeepingCreature

      Re: Oppositional Networks train for plausibility, not accuracy.

      Transformer is not a generative-adversarial type of network.

  20. bonkers

    Stats pedant

    Arguably, the worst one can be is "no better than a coin-flip".

    If ChatGPT answers 99% wrong, it's doing really well - you just need to invert the result.

    Obviously we're not talking binary options here, there are many more ways to answer wrong than to answer right - but it is then not a fair question to ask of a coin.

    1. Antipode77

      Re: Stats pedant

      "Inverting the result"

      does not work

      1. Pick the wrong solution from an infinity of wrong solutions.

      2. Reject result and . . . .

      Which correct result am i going to select from an infinfinity of incorrect + some correct result ?

      This is asymmetrical.

  21. Andrew Williams

    It explains a lot about my coworker

    I think he is actually either an early iteration of ChatGPT or uses it for everything he does.

  22. John Geek
    Trollface

    I e said this before and I'll undoubtedly say it again. Artificial Ignorance is based on GIGO. Garbage In, Garbage Out.... This latest round is ever slicker and shinier, but it's still a garbage generator

  23. pitrh

    Superficially plausible (to the ignorant) bulls**t

    I would tend to agree with the main points of the article.

    In my own experience, the bots tend to produce material that seems plausible to anyone who does not know the first thing about the subject at hand.

    Out of curiosity I tried to make ChatGPT generate pf.conf (OpenBSD firewall config) to spec, and well, the results are available at https://bsdly.blogspot.com/2023/06/i-asked-chatgpt-to-write-pfconf-to-spec.html or trackerless https://nxdomain.no/~peter/chatgpt_writes_pf.conf.html

    TL;DR: the bot produces superficially (to the ignorant) *bullshit*.

  24. Roland6 Silver badge

    “ Our analysis shows that 52 percent of ChatGPT answers are incorrect”

    I’m surprised the “error rate” is so low.

    So far every MS Bing AI response to a technical search query I’ve made via Edge has been 100 percent wrong and/or inaccurate (I’m now looking at ways including group policy settings to disable Bing AI).

    The laugh is once you’ve fought your way through the obstructive AI b*llocks, an authoritative answer can usually be found with the first page of web search results; if not, switch to Google and repeat the search…

  25. Efer Brick

    Not content providers "poinsoning the well"?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like