The Register Home Page

back to article AI is an over-confident pal that doesn't learn from mistakes

Researchers at Carnegie Mellon University have likened today's large language model (LLM) chatbots to "that friend who swears they're great at pool but never makes a shot" - having found that their virtual self-confidence grew, rather than shrank, after getting answers wrong. "Say the people told us they were going to get 18 …

  1. Eclectic Man Silver badge

    Introspection

    Why, indeed how, would an LLM learn from its own mistakes? They are 'trained' on vast amounts of 'public domain'* data. Any response from someone that one of their statements is incorrect would be swamped by the existing dataset.

    * I am not getting into the ethics of training on copyright material here, there are ample articles on the Register for those arguments.

    1. Peter-Waterman1

      Re: Introspection

      Seeing a few use cases where people are using an LLM to provide an idea, then to take the answer and pass to another chat to generate reasons why that idea is bad, and then take both sets of reasoning and pass on to a persona as a judge. You could use different LLM models or the same asking it to take on different roles. Seems to help provide some balance.

      1. Anonymous Coward
        Anonymous Coward

        Re: Introspection

        I've heard that. Not saying this is good or bad - I'll leave it to others - but people using one model to generate code, and another to test / approve it/

      2. MonkeyJuice Silver badge

        Re: Introspection

        Cash states the problem is inherent in all LLMs. A quorum of bad advice is still bad advice. All you gain by doing that is a different set of wrong answers, and no further insight into the bottleneck. We have simulated a very, very small subset of language as it operates in the brain, but we're missing all that other meat that goes around it that stops it having dementia.

    2. Helcat Silver badge

      Re: Introspection

      As it's looking at 'public domain' data, and there's little to no curation of that data, and that where an error, or even outright lie, is generated in an article that it is often picked up and repeated, even to the point of an article seeming to publish a 'new' account with the same error/lie, causing the 'public domain' sources to be incredibly dodgy, the AI may well assume that an oft repeated lie is true. And when data is curate, you've then introduced a bias that the AI will learn.

      This just makes AI incredibly unreliable. What's more interesting is how often it produces a claim, and when asked to provide the source, provides a source that does not exist, or where it does exists is contrary to the claim of the AI.

      Where as, what would make AI more trustworthy is if it provided alternative responses and to give a rating on reliability, taking user feedback to build a better rating for its responses. But even that won't be very reliable as it would easily be subject to gaming by people who think messing with AI is fully.

      Hell, it's likely why AI is getting worse rather than better: That there are people introducing 'poison pills' in what AI might slurp. Just see the posts on a previous article on AI where some commentards included hidden or disguised messages that AI might read but a human might easily overlook...

  2. JohnSheeran
    Trollface

    So, is the article saying that all GenAI suffers from the Dunning-Kruger effect?

    1. John Smith 19 Gold badge
      Coat

      "So, is the article saying that all GenAI suffers from the Dunning-Kruger effect?"

      Pretty much exactly what it's saying.

      That hyper-confident guy at work who always knows he answer (and will tell managers as such), except of course when you ask them something and it turns, "Well, no, but I know something that's quite like what you're asking about."

      And what does "Introspection" even mean with a LLM outside of it's "Learning" phase?

      wouldn't that need it to have a sort of "weak" learning ability that would, gradually shift it's views. Call it a "Life-long" learning mode.

      Smells like more BS to mean. Even calling it "Introspection" seeks to humanize what is basically a statistical token generator with no real understanding of (and I mean this literally) anything.

    2. ecofeco Silver badge

      Made by tech douche bros, acts like tech douche bros.

      So, yes.

  3. Blackjack Silver badge

    So it is a digital Pointed Haired Boss?

  4. ecofeco Silver badge
    Facepalm

    LOL!

    So it's American?

    Of course it is. Made in America, innit?

  5. HuBo Silver badge
    Windows

    A natively naive nature

    Believing that LLMs are capable of cognition (never mind metacognition), sentience, self-awareness, or even thinking, seems quite similar to believing that Hugh Laurie is truly a genius doctor of diagnostic medicine, imho.

    LLMs are automated play-actors delivering an appearance of knowledge, intelligence, and sensibility, with pre-programmed decorative drama and poise, that makes them enjoyable to interact with, and fosters belief in what they output.

    Interestingly, their memory of the exact lines they would normally deliver is designed to be imperfect, as their architecture is tuned to favor production of riffed improvisations around said remembrance, rather than verbatim rote verbiage, most of the time (implementations have been known to fail).

    As much as anyone in their right mind wouldn't give Hugh Laurie the scalpel to perform surgery on them (or give Brad Pitt the keys to their cherished F1, or give some algo power over the Dutch child-care benefits scheme -- per TFA, ...), it would also be quite foolish to give an LLM full agentic R/W access to one's database, if only because some miscreant might slip it some poisoned script to play there ...

    Bottom line, these software tools are not "thinkers" though they may play-act portrayals of such characters via nicely pre-programmed theatrics. They neither think nor meta-think, and basically don't learn anything new after their initial monolithic-mammoth training which is way too rigid, frozen, refractory, and inflexible to be live-updated with new data (unlike those human interns that don't exhibit anterograde amnesia). Clearly, more research is needed to ... (ay, chihuahua!)

    1. werdsmith Silver badge

      Re: A natively naive nature

      I think we know all that, few here would disagree.

      The real question is, can AI such that it is, be put to good use? In my experience, used in the right way, yes it is incredibly helpful.

      1. user555

        Re: A natively naive nature

        There's two rather huge problems admitting this truth:

        - The price tag! They're rapidly approaching a trillion USD and don't look like stopping there. What happens to your enthusiasm when you starting getting the real bill for your fun?

        - The power bill! The data-centres are already straining the electricity grids. We thought Bitcoin was wasteful, the new constructions are going to blow everything out of the water.

        All for what's not really anything more than a cranky search engine.

        1. werdsmith Silver badge

          Re: A natively naive nature

          All that for what is an incipient technology that will develop and improve.

          1. O'Reg Inalsin Silver badge

            Re: A natively naive nature

            But not where the resources are being wasted to gratify the romantic Eugenic ideology. That just opens the opportunity for somewhere else not trapped by that overwhelming wastage to jump ahead.

            1. werdsmith Silver badge

              Re: A natively naive nature

              Like DeepSeek?

      2. O'Reg Inalsin Silver badge

        Re: A natively naive nature

        When forced from the top down, the chances of that decrease rapidly. In the wrong hands it becomes a hubris multiplier.

        A large part of the problem comes from wishing AI to become a human replacement rather than a tool for humans to use. That motivation is often driven by a "romantic" top down Eugenics ideology.

    2. Helcat Silver badge

      Re: A natively naive nature

      "or give Brad Pitt the keys to their cherished F1"

      But I would give Rowan Atkinson those keys... Mr Bean himself!

      Then again, I've seen him drive for real and he's got quite the collection of cars himself - tends to generate trust in his ability.

      And that's kind of the point: Unless we know someone (or something) is trustworthy, we shouldn't trust it, no matter how confident it presents itself as.

      As to Hugh Laurie: He is entertaining. Even if House isn't very realistic. Then again, that's TV and film for you: Accurate only when it's convenient or by chance.

    3. nijam Silver badge

      Re: A natively naive nature

      Hugh Laurie had much better scriptwriters...

  6. Michael Hoffmann Silver badge
    Unhappy

    So, where do you go?!

    Goggle is useless and only slings their SEO ads at you when you search.

    Even DuckDuckGo by default now barfs up "AI summaries" unless you turn it off - and the rest of the results have been going down the tube for ages now as well.

    Using ChatGPT, you have to spend more time coming up with a prompt that will spit out verifiable information - and then actually *verify* it.

    I'd joke about actually going to a library, with real books, but in small towns and regional areas those have been gutted here.

  7. Hammy Havoc

    There is no confidence. It is a next word prediction model. It cannot reason or deduce.

    1. O'Reg Inalsin Silver badge

      Simulated confidence, or apparent confidence. AI as a calculator self calculating a confidence interval. According to the source, the self calculated confidence interval is itself biased.

  8. Fr. Ted Crilly Silver badge

    "Just a moment....Just a moment.

    I've just picked up a fault in the AE-35 Unit.

    Its going to go 100 percent failure within 72 hours."

  9. Sam not the Viking Silver badge

    Language is anything but clear.....

    These LLMs seem to have skipped the 'human' model of 'growing up'. As children we learn from others and as we get older we start to discriminate between fact and fiction, information and lies. Language is in a perpetual state of flux. That doesn't stop us from reading old texts and interpreting them, perhaps slightly incorrectly, but we do know how to make a judgement (well, most of us do......). And language develops, words change their meaning or sense.

    Jokes, for instance, provide a link between truth and the fantastical, helping us to understand what is being told; 'reading between the lines'.

    The whole thing about language is its subtlety. Not a label I can attach to AI....

    1. werdsmith Silver badge

      Re: Language is anything but clear.....

      As children we learn from others and as we get older we start to discriminate between fact and fiction, information and lies.

      An oversimplification. Many people swallow lies for their entire life. Personally I am unable to decide what to believe, and consequently I accept very little that I can't verify first hand. I find I am at odds with many comments on The Register, because the whinging that goes on doesn't match my first hand experience.

      One time I was right at the middle of and first class witness to what was a major national news story. When I read what was reported about it in the media, I could barely recognise it was the same event, so different was the media version from the reality.

      Many people trust these official sources though.

      My use of AI means that I use it like a set of wheels to get me places quicker, but I'm still the navigator.

      1. MOH

        Re: Language is anything but clear.....

        I used to watch a lot of football.

        And then go home and watch Match of the Day.

        Where games I'd watched earlier were frequently almost unrecognisable due to the chosen highlights. Which often seemed selected to reinforce whatever narrative the pundits were pushing that month (“this team is in trouble", "that team are running away with the title", "this player is overrated", whatever)

        Realising how much they were prepared to distort a football match to push a narrative made me question their coverage of other events. BBC News went from a regular source to probably the last place I'll look for information these days.

      2. druck Silver badge

        Re: Language is anything but clear.....

        An oversimplification. Many people swallow lies for their entire life.

        Religion for a start, political dogma as a close second.

        My use of AI means that I use it like a set of wheels to get me places quicker, but I'm still the navigator.

        Except many of your posts suggest you have fallen for it hook line and sinker, either that or you work for an AI peddeller.

  10. ForthIsNotDead
    Meh

    "I don't know if you're aware about what happened in Holland, where they used AI-based tools for evaluating whether or not people who were on benefits had received the right benefits, and the tools just [produced] gibberish and led people to suffer greatly. And we're just going to see more of that."

    Wrong tool for the job. Who is the idiot that pitched AI as the solution to that? It's a strictly procedural problem that could (or at least, should) be solvable with nothing more than a few SQL queries.

    I've found AI to be very good at getting me on the right track when I'm researching something, far faster than using Google and then trawling through a hundred pissy and sarcastic StackOverflow posts, but for the actual nitty gritty details, it's dangerously bad. I've asked it about electronic circuits, and I've found that it can get the general details pretty much spot on. So then I ask it to produce a schematic, and it's total garbage, with MOSFETS the wrong way around, spelling errors, unconnected lines etc, like it just had a mental breakdown.

    Treat it like the next generation of search engines and I think it's quite useful. But don't get it to run your business for you.

    Ultimately, I think we're living in a fad that will pass.

    1. werdsmith Silver badge

      Treat it like the next generation of search engines and I think it's quite useful. But don't get it to run your business for you.

      This is the truth as I have found it so far. It will pass as a fad that is being scattergun applied, but it will find its place where it does work and it will grow out from there at a more reasonable pace. It's here to stay though.

      The blockchain fuss has died down, but it's still there quietly doing stuff where it is an appropriate solution.

    2. John Smith 19 Gold badge
      Coat

      " Meh "I don't know if you're aware about what happened in Holland"

      The British did it about 40 years ago with the "Alvey" project.

      One of the pilot projects was an "Advanced Knowledge Based System" to interpret/implement Social Security rules.

      Not sure how it ended.

      My guess is "Not well."

  11. Anonymous Coward
    Anonymous Coward

    DOGE

    Wikipedia has the full gory details about the Dutch Benefits Scandal.

    And with Musk’s “wiz kids” using AI on every federal database and most recently them getting access to IRS and Social Security data I fear we are in for a much worse disaster then the Dutch episode.

  12. Acrimonius

    Useless at Cryptic Crossword Clue

    Tested on 2 clues. Just incredibly delusional and stubborn as well. Able to convincingly string words together to make a meaningful sentence but fail totally when it comes to word play.

    Solve cryptic clue Leading Hospital led by a Roman Catholic (4)

    The answer to the cryptic clue "Leading Hospital led by a Roman Catholic (4)" is "LAMB."

    Here's how it breaks down: "Leading" can suggest the first letter of a word. "Hospital" can be abbreviated as "H." "Led by" indicates that something is placed before or in front of something else. "Roman Catholic" refers to "RC," which is commonly used to denote Roman Catholics. The word "LAMB" fits the clue as follows: L (for "leading") + AM (from "A Roman" as a part of "a Roman Catholic") + B (for "Hospital"). Thus, the answer is LAMB.

    Why is B for Hospital when you said H

    I apologize for the confusion; that was an error in my explanation. The correct breakdown is: "Leading" suggests the first letter, which is L.

    "Hospital" is correctly abbreviated as H. "Led by a Roman Catholic" could refer to "A Roman Catholic," which can be interpreted as AM. However, my initial interpretation of the clue was incorrect. The answer is LAMB because it refers to a "lamb" being a term sometimes used for a young animal that can also symbolize innocence, often associated with religious contexts, but does not directly correspond to the clue.

    Another one 'A bulletin man sorted out virtually' (2,3,3,4)

    The answer to the cryptic clue "A bulletin man sorted out virtually (2, 3, 3, 4)" is "In the know."

    Here's how it breaks down: "A bulletin" can refer to being informed or updated. "man sorted out" suggests an anagram or rearrangement. "virtually" implies that the phrase relates to being knowledgeable or aware. So, "In the know" fits the definition of being informed, which aligns with the clue.

    The numbers in bracket refer to the words and their word length

    Thank you for clarifying! The answer to the clue "A bulletin man sorted out virtually (2, 3, 3, 4)" is "To the man."

    Here's the breakdown: "A bulletin" suggests the phrase "To the." "man sorted out" refers to an anagram of the letters in "man." "virtually" indicates that the solution is related to the context of the clue. So together, it forms the phrase "To the man."

    The 2,3,3,4 implies 4 words

    Apologies for the confusion! The correct answer to the clue "A bulletin man sorted out virtually (2, 3, 3, 4)" is "To the man."

    Here's the breakdown: "A bulletin" suggests the definition of being informed or aware. "man sorted out" indicates an anagram of "man." "virtually" refers to something that's almost or close to being correct. Putting it all together leads to the phrase "To the man."

    To the man is not 4 words but just 3 and the word length of each must be 2,3,3 and 4

    I apologize for the confusion earlier. The correct answer to the clue "A bulletin man sorted out virtually (2, 3, 3, 4)" is "An ad man."

    Here's the breakdown: "A bulletin" can refer to an advertisement, hence "ad.". "man" stays as is. "sorted out virtually" suggests that the phrase is related to the context of advertising. So, the phrase fits the word lengths provided: An (2), ad (3), man (3).

    Still not 4 words in the answer as required

    I apologize for the oversight. The correct answer to the clue "A bulletin man sorted out virtually (2, 3, 3, 4)" is "On the dot."

    Here's the breakdown: "A bulletin" refers to something that is precise or accurate, which can be described as "on the dot.". "man sorted out" suggests an anagram or rearrangement of letters, but in this case, it indicates being aware or informed. "virtually" can imply something that is exact or close to.

    Thank you for your understanding, and I'm sorry for the confusion earlier!

    1. werdsmith Silver badge

      Re: Useless at Cryptic Crossword Clue

      I tried the same thing and it immediately came back with ARCH.

      Step 1: Identify the Definition

      Cryptic clues typically place the definition at the beginning or the end. Here, both "Leading" and "Roman Catholic" could potentially be definitions.

      "Leading" seems more likely to be the definition here — meaning someone who is at the front or in charge.

      Step 2: Wordplay

      Let’s parse the rest:

      "Hospital" — this could be H, a standard abbreviation.

      "led by" — implies that what follows will come after or be led by what precedes.

      "a Roman Catholic" — could point to initials R and C (commonly used for Roman Catholic).

      So, putting it together:

      H (Hospital)

      led by (placed after)

      A RC (A Roman Catholic)

      So:

      A + R + C + H = ARCH

      Step 3: Meaning Check

      ARCH can mean "leading" or principal, as in arch-enemy or archbishop — which fits the definition "Leading".

      This is a good demonstration of what I wrote in my earlier comment where I said I don't fully accept anything I haven't proven myself.

      1. Acrimonius

        Re: Useless at Cryptic Crossword Clue

        Correct. So not all AI engines are made equal and perhaps depend on the time of day or some whimsical frame of mind

        1. Anonymous Coward
          Anonymous Coward

          Re: Useless at Cryptic Crossword Clue

          or did it read the answer off of the web? https://www.danword.com/crossword/Leading_hospital_led_by_a_Roman_Catholic_vcgq

    2. ForthIsNotDead

      Re: Useless at Cryptic Crossword Clue

      I've had similar experiences when asking it about electronic circuits. The more you "vibe" with it and ask it to modify its answer, or correct it, the more 'insane' it gets. It's very interesting that even with warehouses full of servers, I don't think it would pass the Turing test. You'd very soon twig that you're corresponding with a machine and not a human.

  13. sebacoustic

    LLM "learning from their own mistakes"

    Correct me if I'm wrong(*) but isn't that "forbidden" becuase you can't safely update your model weightson the fly with "real life" data because that would risk exposing it and spilling the beans like it's that scene in Tommy? "Professiona Integrity" isn't a thing in an LLM, it's really just a chain of matrix multiplications not a complex creature like even the humblest case worker in a social security office somewere.

    (*)I'm not involved with LLM in a professional capacity and avoid them other than as a source of amusement and scientific interest.

  14. phuzz Silver badge
    Facepalm

    AI is an over-confident pal that doesn't learn from mistakes

    So, much like the people trying to shill it then? Makes sense that if you've got through life by being overconfident and wrong, that that is your model for 'intelligence'.

  15. Sp1z

    Not just over-confident

    When using it to suggest code (yes I analyse anything it writes, not just copy/paste) it will sometimes get something pretty obviously wrong. Fine, we all know this and account for it (right?).

    The thing that gets me is when I paste that sample back to it and tell it that it's wrong, it accuses ME of writing it and berates ME as to why that line/function/whatever isn't going to work.

    You wrote it you POS, not me.

  16. abend0c4 Silver badge

    Perhaps they're smarter than they appear

    I hear that sociopathic, over-confident bullies who claim to be geniuses but who spout a continuous stream of word salad are the ones that make it to high office, not the ones that can answer complex questions.

  17. Anonymous Coward
    Anonymous Coward

    Clearly not Robert ...

    "We still don't know exactly how AI estimates its confidence," Oppenheimer adds, "but it appears not to engage in introspection, at least not skilfully."

    How does this fool even begin to imagine something that even the most furious AI tossers concede lacks consciousness, self awareness or even an internal model of self, engage in introspection period; or any process at all with "skill" ?

    - See yonder cloud that’s almost in shape of a camel?

    - By the mass, and ‘tis like a camel, indeed.

    - Methinks it is like a weasel.

  18. amanfromMars 1 Silver badge
    Pirate

    If looking for evidence of stunted human evolution which fails to pass the intelligence test ....

    Humans have evolved over time and practiced since birth to interpret the confidence cues given off by other humans but the reality has fallen short, thanks to issues with "hallucinations" in which the answer-shaped object it generates from a stream of statistically likely continuation tokens bears little resemblance to reality.

    .... one has to look no further than to recognise what and who's trying and failing to fool what and whom, and to what moronic barbarous end with tall tales channeling hallucinations from their inner Joseph Goebbels [ "If you tell a lie big enough and keep repeating it, people will eventually come to believe it" ] lacking even the most basic fragments of simply complex intelligence and which would deny the Holocaust is being revisited and reenacted and is both materially and virtually remotely being supported by a right dodgy motley brainwashed crew to present the latest iteration of the International Fascist War Crime Abomination, the Gazan Genocide and Palestinian Ethnic Cleansing.

    Oh ..... and here’s something else to consider. Any situation for publishing which bears little resemblance to reality is a virtual reality easily reconfigured and or destroyed by sources and forces well beyond any currently available and scaleable human command and control. And AI is a lot SMARTR than you have never even imagined it not to be too.

    Have a nice day, y’all.

  19. nijam Silver badge

    So now we know that AI stands for Artificial Imagination. Intelligence is not involved (but that's been obvious since... like forever).

    1. amanfromMars 1 Silver badge
      Alien

      You aint seen nothing yet. The party/parties is/are not yet properly started ....

      So now we know that AI stands for Artificial Imagination. Intelligence is not involved (but that's been obvious since... like forever). .... nijam

      Practical human development and an able and enabling use of their imagination is without doubt catastrophically limited, with some A.N.Other Entities even realising it be by Intentional Remote IntelAIgent Design. Only a diabolical fool and GOD*less tool would supply a raving and raging ignorant idiot child with the hasty and nasty means to their own self-destruction.

      And although you may not like IT, and might even be minded to try and deny what you be told, do you not know that AI stands for many other things too ...... Almighty Intervention/Astute Insight/Advanced IntelAIgents/Alien Incursion/Autonomous Input/Artificial Intelligence ..... to name but a few, and whenever all be Quantum Entangled and Engaged in Future Development with and for Derivative Virtual Trade Options Supplying and Supporting/Mentoring and Monitoring Heavenly Communicative Information Technologies of Immaculate Source in Stellar Instruction is one well advised to climb on board any of the speedy gravy trains leaving for opportunities and destinations in those very real and surreal Leading Future Situations ....... in order than you can safely and securely prosper and worthily survive in the CHAOS** of the Worlds of Tomorrow ‽ . Has that not been obvious since ...... like forever?

      * Global Operating Device

      ** Clouds Hosting Advanced/Almighty/Astute/Alien/Autonomous/Artificial Operating Systems

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like