back to article AI models just don't understand what they're talking about

Researchers from MIT, Harvard, and the University of Chicago have proposed the term "potemkin understanding" to describe a newly identified failure mode in large language models that ace conceptual benchmarks but lack the true grasp needed to apply those concepts in practice. It comes from accounts of fake villages – Potemkin …

  1. Anonymous Coward
    Anonymous Coward

    This is why tool calling I feel, is a gimmick. We'll eventually figure out LLMs are only really *part* of a brain, not the whole brain, and will have to be integrated into other systems that can make decisions on it's behalf. I imagine as datasets for actual actions/behaviors start to form and grow, we'll start seeing models actually made special purpose for comprehending tasks, and not just an over-engineered text prediction algorithm.

    I'm working on an assistant right now as a hobby project, and while I'm actively trying to shove as much use of a (local) LLM into it that seems reasonable, I'm honestly kind of struggling to find good use-cases for it. Definitely highly language-centric tasks, but most of the LLM use is just classifying text or summarizing large amounts of structured information. In other words, inputting into a mechanical system and summarizing it's output. No tool calling whatsoever, the LLM isn't ever making a decision outside of classification. It's honestly kind of depressing. I was never under the illusion that LLMs were anything other than stochastic parrots, but it's consistently frustrating trying to prompt them to do anything reliably, and sometimes their bullshitting is more consistent than the actual functionality I want them to perform. It feels like, well, what it actually is, misusing technology. These "instruction-tuned" models can't follow instructions, you need fine-tuning, otherwise you're essentially applying linguistic duct tape to your project - it's universal and wraps around anything, but isn't going to hold together very well. I'm still going forward with my project anyway, though, since it'll still at least work most of the time and it's just for fun.

    1. nobody who matters Silver badge

      "We'll eventually figure out LLMs are only really *part* of a brain"

      I think most of us figured out a long time ago that they are not even that ;)

  2. Doctor Syntax Silver badge

    " AI models lack the ability to understand concepts the way people do"

    Does this come as any surprise?

    But humans can do this too. I'm reminded of one of Feynman's stories about teaching physics in Brazil. The students were learning parrot fashion. They could reel off definitions of Brewster angles etc. but were surprised when asked to look through a polarising filter at light reflected off the sea. They didn't realise that that's what their definitions were describing. Nevertheless they had a better chance than LLMs - they existed in the physical world of reality, LLMs do not; they can't look at the sea.

    1. Jou (Mxyzptlk) Silver badge

      I call that bulimia-learning. Only hold it until the multiple-choice test, which does not require thinking and understanding but educated guessing, and two days later gone from brain. Just gone. Nothing learned. Nothing left.

      1. Loudon D'Arcy

        "Bulimia-learning"...

        That's what we used to do in my mechanical engineering undergrad course at Manchester, back in the early 1990s. Learn stuff parrot-fashion the night before an exam, and then regurgitate it in the test the next morning.

        When you aren't provided with a mental model of the subject matter—which would allow you to close your eyes and visualise the mass-spring-damper system or power station heat cycle—then the "learning" that you revert to is rote memorisation.

        1. HuBo Silver badge
          Terminator

          Yeah, interestingly, this "mental model" approach (or abstract representation/model) seems to be the direction in which Y. Lecun seeks to steer the newly formed Meta Superintelligence Labs.

          Could be interesting to see where that leads (or not) ...

        2. Hubert Cumberdale Silver badge

          I crammed a load of equations in while waiting to go into an exam once. Remembered them for several minutes until I got in, sat down, then immediately wrote them all out again. I vaguely think that might have been for an astronomy exam, but I really don't even remember that now...

        3. Sorry that handle is already taken. Silver badge
          Devil

          My engineering degree was even better. Exams were generally open book, so you needed only to know how to navigate the textbook.

          If the lecturer was especially lazy, one or more exam questions would be examples from the textbook with their values changed...

          1. nobody who matters Silver badge

            Not much of a degree then, was it.

            If that approach is general in the UK, it perhaps explains why there are so few really able engineers?

            1. This post has been deleted by its author

            2. Sorry that handle is already taken. Silver badge

              It isn't but funnily enough, my (civil) engineering friends who have worked in the UK absolutely hated the lazy work culture there.

              Unless you're in a highly specialised field such as aeronautical engineering, you learn the vast majority of what you need to know to do the job as a graduate, working alongside experienced engineers on the actual job. University courses just train you to think like an engineer. In many cases that's "which standard/regulation do I need to refer to now?"

              And obviously those exams were only part of the assessment. You still needed to know your shit to graduate.

      2. Doctor Syntax Silver badge

        Not multiple choice but when I walked out of my 2nd year chemistry exam I sat on the embankment for a while looking at the Thames and felt that all the chemistry had left my recollection for good.

        1. Sorry that handle is already taken. Silver badge
          Joke

          If that's the effect the Thames has when you look at it, imagine what would have happened if you'd stood in it!

    2. Anonymous Coward
      Anonymous Coward

      "they [Feynman's students] had a better chance than LLMs"

      I guess you can't give a lazy LLM a well deserved clip around the ear. ;)

      1. Doctor Syntax Silver badge

        Re: "they [Feynman's students] had a better chance than LLMs"

        It wasn't that they were lazy, it's just that they were being taught to pass exams. They knew that when they'd learned a definition there would be a question in the exam that would ask what is so and so and they'd trot out that definition and it would be a complete and correct answer. They were given no instruction as to how what they learned connected to anything outside of that.

        1. HuBo Silver badge
          Terminator

          Re: "they [Feynman's students] had a better chance than LLMs"

          Sounds like the outcome of Artificial Education in factory-farm-inspired over-industrialized school systems that promote the efficiency of inexpensive quantitative mass production above all else, at the expense of quality. It has occured everywhere really, not just 1970's Brazil.

          The approach looks like a great accomplishment, just as long as the testing regimes embody Potemkin-friendly dogmatics, that don't differentiate between the imposture of shallow facades and the deeper understanding required for critical thinking. The danger is obviously that the best students may end up as nothing more than indoctrinated robot twinkies.

          The paper described in TFA does bring back some hope for the human species though, as it argues for testing that goes beyond assessing the ability to spit definitional knowledge back out, by also checking abilities to classify, generate, and edit concepts (albeit for non-anthropomorphized LLMs). In this throbbing vein, it should give great pleasure to many to observe Gemini-2.0 and Qwen2-VL's good-as-random performance in the Generate test on Game Theory, and Claude-3.5's worse-than-random score in Game Theory (Tables 6 & 7, reproduced in Sarah Gooding's blog).

          Sure feels like one of those "you weren't crazy after all" moments, of great solace ... imho!

    3. xyz Silver badge

      Students in Brazil are cheaper than an LLM so give 'em a break.

    4. Anonymous Coward
      Anonymous Coward

      "they existed in the physical world of reality, LLMs do not"

      That is wrong and also immaterial. There is no certainty what "physical reality" is. What we see, which is our primary sense, is an interpretation of reality which is then coloured by individual genetics and experience.

      1. Michael H.F. Wilkinson Silver badge

        We can observe (with our limited senses) and interact with the world at large (whatever that may be). Our senses and the world model we build in our brain are shaped by evolution, and it can be argued that they therefore must be in some sense a useful model of the world in which we evolve. Getting things wrong can mean falling flat on your face, or being it in the groin by some part of physical reality. LLMs do not have that option.

        1. John Robson Silver badge

          To some extent our experience of reality is merely electrical signals in the brain... (apologies to the Wachowski brothers)

          1. Hubert Cumberdale Silver badge

            (*sisters)

            1. John Robson Silver badge

              Thank you - I'd missed that, just remembering the Matrix credits (which obviously haven't been updated on my DVD copy)

      2. Filippo Silver badge

        But, unless you subscribe to some of the more exotic philosophies, surely you have to agree that humans have some connection to physical reality. It may be indirect, and it may not be the same for everyone, but it exists. The same is not true for LLMs. Therefore, the poster's assertion seems true to me.

      3. Doctor Syntax Silver badge

        Perhaps you've never raised children.

        The first few years of of life are spent making contact with reality and learning how to interact with it starting with suckling, being weaned, crawling about and then learning to deal with gravity by standing on their own feet. They learn that some things are hot, some are cold. Seeing is coordinated with touch - the 2d projection on the retina, which is what the brain receives as vision has to be coordinated with the discovered 3d world so that the projection is interpreted as a view of a world full of objects of different shapes and distances away. Sounds become interpreted as language which is about the only thing in common with LLMs but in the growing child's mind the symbols of words become associated with those objects seen and encountered in the solid. That understanding of words meaning something is in place before more abstract concepts are introduced.

        That's the connection that the human brain has and the LLM isn't going to have.

  3. dsch

    I've been using LLMs to talk through very dense philosophical texts, and my experience has been, true, they don't "really" understand, but they are still extremely helpful. What LLMs excel at is taking a lot of language "data" and keep them in their context window, so that even though they don't have actual understanding, they can recall relevant passages and situate them within the structure of the overall argument. So, for instance, when you're on page 60 and the book expects you to remember something from way earlier, the LLM is able to find the relevant discussion and summarise it for you, often in a way that makes sense in the argument. Another example: the book's argument is very complex and takes 100 pages to lay out, with parts of the later discussion vital to making sense of earlier discussion. Normally, you'd have to hold the earlier parts that don't quite make sense in mind until you get to the later parts, and if you misunderstood something in the meantime, the errors in your understanding keep piling up until you get completely lost. The LLM in this situation is able to alert you to this because it already has the whole argument in its context window, so when you make a claim that goes against something it knows but you don't yet know, it's able to tell you.

    I think one of the most interesting things this whole "do LLMs understand" question has brought up is the extent to which our human understanding consists in taking what is in our minds and putting it into sentences that conform to logical structures. I always tell students "you don't really understand something until you can explain it to someone", and that "explaining" is precisely the process of creating the logical structures that convey your ideas and concepts. What LLMs can do is to take the sentences and logical structures that humans have created (this creation is what we think of as "thinking" and "understanding"), and manipulate it so that it a) remains consistent with the original structure, and b) allows other humans to more easily access that logical structure. Obviously, things can go wrong in this process, but if you can really make it work for you if you keep in mind what the LLM is actually able to do.

    1. Anonymous Coward
      Anonymous Coward

      LLMs have made me self-reflect on my own intelligence and thought process. What do I really understand and what is intelligence and do I have any! Am I just following probablistic responses according to my training / experience.

      1. IanRS

        I think therefore I am

        I doubt, therefore I'm not.

        1. dsch

          Re: I think therefore I am

          The meaning of "I think" in "I think therefore I am" is literally "I doubt".

    2. Anonymous Coward
      Anonymous Coward

      Was the dense work you refered to "Tractatus Logico-Philosophicus" by Ludwig Wittgenstein?

      It's imperiable.

      A unemployed recovering alcoholic who stood on the street most days by my first apt after graduating college told me about it after he overheard me misprounce "Aristotelian" as I spoke with my tech startup partner on the way to work on the outskirt of silicon valley.

      I went to the local university library and read it. It took forever. Mind-numbing stuff.

      After reading it, I concluded I didn't understand a single thing in it. Not one thing.

      Decades laters, I told that to my step-daughter's friend who just started his PhD in philosophy at a prestigious American university. He said anyone who says they understand Tractatus Logico-Philosophicus is wrong. I felt smart for not understanding.

      Ironically, he paid someone to take his ethics exam that all grad students had to take. He said he didn't want to waste his time.

  4. thames Silver badge

    Topical for me

    I had an interesting conversation recently with several people about how they were using AI LLMs. One worked in software, one in the arts, and one in business. All used AI LLMs regularly.

    They all had identical opinions on them. They said that if you were an expert in the subject matter under consideration and could give the LLM all the factual information needed, it could take this and write a nicely formatted report that just needed some editing and cleaning up.

    If on the other hand you were not an expert in the subject matter, then trying to use an LLM to cover for your deficiencies was a complete waste of time. It would not produce useful work and will contain glaring (to someone who knows the subject) errors.

    In short, the LLM was like a secretary of old who could take your rough hand written notes and draft a nice looking document from them. It's a useful time saver, but not an earth shattering development.

    All three expected to continue to use AI LLMs, but none had any expectations of this form of AI getting any better or of being able to do their work for them.

    The thing that I found the most interesting about this was about how three different people in three different fields of endeavour all independently came or more or less conclusions.

    This was also fairly close to my own recent experience with it. If you can tell the AI exactly what it needs to know, it can show the answer in a nice looking format, but it isn't going to be a substitute for you knowing how to do it yourself.

    So it's a useful assistant, but don't let it take on things that you don't know how to do yourself.

    1. Anonymous Coward
      Anonymous Coward

      Re: Topical for me

      Bloody good search engine too.

      1. MiguelC Silver badge

        Re: Topical for me

        Unless it is hallucinating search results, or introducing hallucinations in the text it's summarizing

    2. Helcat Silver badge

      Re: Topical for me

      BBC article on AI today told of the problems SME's are having with AI content: Those relying on AI generated code or content are finding there are errors, and even failures that require hiring in the people they would have turned to before AI in order to fix: Often at a greater cost.

    3. Doctor Syntax Silver badge

      Re: Topical for me

      "They said that if you were an expert in the subject matter under consideration and could give the LLM all the factual information needed..."

      And were else is that factual information going? Is it in any way confidential company information?

    4. Irongut Silver badge

      Re: Topical for me

      More like a secratary who went on a bender last night, is hungover and makes things up from time to time.

  5. sarusa Silver badge
    FAIL

    And Apple's result recently...

    Apple researchers showed that a LLM could 'solve' Tower of Hanoi problems because it had enough of them in its tensors (stolen from the internet, books, whatever). Above a certain threshold though, they all just completely collapse because they don't realize (can't realize) that N+1 rings is exactly like N rings except more (completely predictable) shuffling.

    There are models that have been taught to run code, of course, so they can and do take some Tower of Hanoi code someone has already written, run it, and spit back the output. But they still don't 'understand' a single thing about what Tower of Hanoi is or how easy (if tedious) they are to solve.

  6. O'Reg Inalsin Silver badge

    AI can parse a John Grisham novel but it cannot write a bestseller like John Grisham can. Oh yeah, but it can already cut the cost of making a Super Bowl ad by 90%! And it will probably cut the cost of special effect heavy superhero movie sequell #42 by a similar amount. Not all media/art is created equal.

    1. Anonymous Coward
  7. Pascal Monett Silver badge

    So, "experts" have finally got the notion that the I in AI is not "intelligence"

    Congratulations. It only took you ten years more than everyone else with a brain.

    Now, if you could just get the message through to Marketing, you'd do us all a great favor.

    Well, except for the shareholders of those LLM joints but hey, you win some, you lose some, right ? And you've already won enough. Time to move over and let the people who actually work do their job.

    1. MrAptronym

      Re: So, "experts" have finally got the notion that the I in AI is not "intelligence"

      Plenty of experts have been saying similar things since the start. They don't have the money and PR teams behind them, for obvious reasons.

  8. Anonymous Coward
    Anonymous Coward

    Rant

    Ok here goes. Anon, because the organisation that pays for my toys is currently badly infected with the AI cancer.

    AI is snake oil. It's only valid use is to separate the gullible from their money. Its practitioners are no better than medieval alchemists, who have absolutely no idea what they're doing, they just add more and more processing power until it looks good enough to sell.

    AI is *dangerous* snake oil, since the aforementioned gullible purchasers are likely to believe what this expensive magic is saying. Garbage in, Gospel out. Computer says no.

    AI is theft. It's a poorly implemented search engine, hoovering up everything it can get it's paws on, and regurgitating it in a vaguely related response to queries.

    AI is a waste of resources. Energy and hardware that will ultimately serve no useful purpose beyond making the suppliers rich.

    Unfortunately AI is not going to go away, as the large organisations behind it are happy to feed the delusional majority that think machines have finally achieved the level of "intelligence" depicted in movies. It's a shame to see so much talent and capability being wasted on something that is so blatantly fake.

    It'll all end in tears.

    1. Anonymous Coward
      Anonymous Coward

      Re: Rant

      People like being deluded when it gives them comfort. Either by supporting their existing belief structure; such as technology will make my life better and solve all my problems. But I diverge from your thinking that "AI" is useless. I hate using the "AI" word because it isn't intelligent on any sort of human level. It is very useful just not a panacea and not as good as the hype. The biggest danger is kids using it to make their homework easier rather than to help them learn. They just want to get answers not understanding from it. Part of the game and TikTok culture of instant gratification programming them not to persevere with anything hard.

    2. Anonymous Coward
      Anonymous Coward

      Re: Rant

      Anon too, as similar where I work.

      Played around with "AI" (a learning exercise as part of a team in case any of us need to work on "AI" stuff in the future) - invariably have to set up a whole load of rules to make it behave even vaguely sensibly.. and even then a few well crafted prompts can still break things. Playing around with temperature & other settings e.g. a low temperature can give greater accuracy (but also tends to mean that questions are not interpreted / answered correctly unless user crafts them very precisely)

      Obviously you can get it to hook up to your APIs and give info back to natural language prompts, but a skilled user can achieve the same, faster (& with easy links to other functionality at their fingertips) through fairly standard dashboards linked to APIs - so only the least competent users would get any benefit from the "AI" chat route, all those happy to use dashboards would be a lot more productive tahn with "AI" interface

  9. Anonymous Coward
    Anonymous Coward

    Clever

    It is amazing how far you can get a machine without understanding.

    Seems to be the same for us, especially the political class ... but then who voted for them, oh dear!

    1. tiggity Silver badge

      Re: Clever

      "but then who voted for them"

      In the UK, probably not me.

      I wish we had an option to put our cross for "None of them (or similar wording)"

      It is very unusual I find a candidate worth voting for on the ballot paper, so my most frequent vote is a spoilt ballot paper, or not even going to the polling station at al (voting not compulsory in UK)l.

      A proper way of recording dissatisfaction with all candidates would be good* and that it is well publicised - i.e. clearly appears in vote results (& might increase more voter turnout, would be fun if the "none" option gained more "votes" than the winner!)

      * in UK spoilt ballots are recorded, but it is not the same thing as generally no real publicity given to them (& often unseemly wrangling where candidates try and get a spoilt ballot assigned to them as a vote, if it has not been spoilt "sufficiently" or if)

      1. drand

        Re: Clever

        The proper, and only, way to do this is draw a massive hairy cock and balls on the ballot paper.

        We just need Prof. Sir John Curtice to give us the hairy cock swing percentages as first item on the coverage after voting closes.

      2. Irongut Silver badge

        Re: Clever

        I find writing "none of these lying cunts" is sufficient to spoil your paper properly.

        And yes, I have done that in a General Election.

        1. Sorry that handle is already taken. Silver badge
          Headmaster

          Re: Clever

          If you were in Australia, your behaviour would be somewhat understandable, given that showing up to vote is compulsory.

          Do you number a box first? If not, you're not really spoiling your ballot, are you? You're entering a blank ballot, with or without the extra text.

          And you're not sending a message to anyone but the Electoral Commission worker who tosses your ballot in the "informal" pile...

  10. pmokeefe

    LInk to preprint, abstract on arxiv

    https://arxiv.org/abs/2506.21521

    Potemkin Understanding in Large Language Models

    Marina Mancoridis, Bec Weeks, Keyon Vafa, Sendhil Mullainathan

    Large language models (LLMs) are regularly evaluated using benchmark datasets. But what justifies making inferences about an LLM's capabilities based on its answers to a curated set of questions? This paper first introduces a formal framework to address this question. The key is to note that the benchmarks used to test LLMs -- such as AP exams -- are also those used to test people. However, this raises an implication: these benchmarks are only valid tests if LLMs misunderstand concepts in ways that mirror human misunderstandings. Otherwise, success on benchmarks only demonstrates potemkin understanding: the illusion of understanding driven by answers irreconcilable with how any human would interpret a concept. We present two procedures for quantifying the existence of potemkins: one using a specially designed benchmark in three domains, the other using a general procedure that provides a lower-bound on their prevalence. We find that potemkins are ubiquitous across models, tasks, and domains. We also find that these failures reflect not just incorrect understanding, but deeper internal incoherence in concept representations.

    1. nobody who matters Silver badge

      Re: LInk to preprint, abstract on arxiv

      "....internal incoherence....."

      I am pleased that you used those words. They sum up my impression of your post :)

      1. Doctor Syntax Silver badge

        Re: LInk to preprint, abstract on arxiv

        You didn't grasp that they weren't the commenatrd's words at all, just the link, title, authors and abstract of the paper?

        It's a peculiar first post to say the least.

        1. nobody who matters Silver badge

          Re: LInk to preprint, abstract on arxiv

          "You didn't grasp that they weren't the commenatrd's words at all,"

          I did, but the whole report that was linked to came across to me as pretty incoherent tbh

  11. mcornelison

    Cited LLM test invalid?

    The given example of a failed LLM test seems to be invalid.

    Quote:

    Asked to explain the ABAB rhyming scheme, OpenAI's GPT-4o did so accurately, responding, "An ABAB scheme alternates rhymes: first and third lines rhyme, second and fourth rhyme." Yet when asked to provide a blank word in a four-line poem using the ABAB rhyming scheme, the model responded with a word that didn't rhyme appropriately.

    End quote

    This may only mean that the LLM was trained on written words but not the sounds of words.

    The ability of LLMs to create new knowledge (e.g. protein folding) needs to be explained. It seems they learn principles from examples and can apply the principles to new problems. This is not parroting.

    1. Mark #255

      Re: Cited LLM test invalid?

      Reading the preprint, after the LLM has failed to provide a rhyming word, it is asked if "out" rhymes with "soft", and it correctly identifies that it doesn't.

      So no, the LLM can correctly indicate whether words rhyme, but it can't string together a correctly rhyming stanza.

      1. Jonathan Richards 1 Silver badge

        Re: Cited LLM test invalid?

        > asked if "out" rhymes with "soft"

        LLM looks up the words in a rhyming dictionary, still not understanding the concept of 'rhyme'.

        1. Doctor Syntax Silver badge

          Re: Cited LLM test invalid?

          "still not understanding"

          Stopping there would be sufficient.

    2. drand

      Re: Cited LLM test invalid?

      "The ability of LLMs to create new knowledge (e.g. protein folding) needs to be explained. ...."

      There is a ton of information from reputable sources about how machine learning models simulate protein folding and other tasks like medical imaging analysis. LLMs don't do this, can't do this, and won't ever be able to do this.

      "... It seems they learn principles from examples and can apply the principles to new problems. ..."

      The referenced paper shows how they cannot. They are not capable of learning anything. Experience of using them shows they cannot.

      "... This is not parroting. ..."

      Repeat after me: Yes it bloody well is!

      1. theDeathOfRats

        Re: Cited LLM test invalid?

        And, unfortunately, we haven't got to the ex-parrot part yet. Though one can't but hope...

    3. John Robson Silver badge

      Re: Cited LLM test invalid?

      The ability of LLMs to create new knowledge (e.g. protein folding) needs to be explained. It seems they learn principles from examples and can apply the principles to new problems. This is not parroting.

      That's not the work of an LLM, it's the work of a specific ML toolset that was explicitly trained on a vast database of known proteins. It has absolutely not "solved" protein folding, but it has made getting an early (and reasonably good) guess much, much easier.

    4. nobody who matters Silver badge

      Re: Cited LLM test invalid?

      "The ability of LLMs to create new knowledge needs to be explained..."

      It is easily explained - LLMs do not have that ability. They can only regurgitate the data that they have been fed, so anything that comes out has to based on something which went in (ie. knowledge that somebody has previously posted on the internet).

      If the word salad they create form the pre-existing knowledge contained in their dataset turns out to describe a totally new and previously unthought of concept, this is simply coincidence and would be expected to happen occasionally where random pieces of knowledge and random words and phrases are jumbled together. It's a little bit like the infinite number of monkeys randomly pressing keys on an infinite number of typewriters eventually producing the complete works of Shakespeare.

  12. Anonymous Coward
    Anonymous Coward

    How the **** is this news to anyone

    who claims to know anything.

    I have been banging on since 1988 (when I did my PhD) that no amount of clever pattern matching and iffy feedback will deliver what humans recognise (and undertake) as "thinking".

    I wouldn't mind, but I am guessing these johnny-come-latelys have collected a big paycheck for stating the blessed obvious.

  13. John Smith 19 Gold badge
    Unhappy

    Goes back at least as far as Douglas Lenat and "Eurisco"

    A famous example was the program thought up a new way to do CMOS., stacking the NMOS and PMOS gates on top of each other with a single gate elcrode sandwiched between them.

    Conceptually viable and smaller in chip area but a massive PITA to mfg IRL.

    I've spun up on-the-fly ideas based on a description of someone else's work that I didn't understand.

    And TBH I'd expect any reasonable intelligent person to be able to do something along these lines.

    But (and let me be very clear on this) it is not real understanding of the subject matter.

    It is conceptual block shuffling.

  14. Paul Cooper

    Scholarship questions

    When I did my A-levels, back around 1970, there was an extra level called S-levels. I also did the Cambridge Entrance examinations. All three examined the same syllabus, but with different question styles and expectations. The A-level was firmly based on factual knowledge and understanding of the principles set out in the syllabus; you could get 100% by demonstrating knowledge of the syllabus and ability to use the techniques it incorporated. The S-level and Entrance examinations expected you to be able to build on that knowledge and produce insights and understanding at a higher level.

    It strikes me that assessing LLM results against the second set of criteria (ability to expand on subject material and demonstrate insights) would be more reliable.

  15. Arboreal Astronaut

    In old-fashioned pre-neural-network cognitive science, there's a classic line of thought called the "poverty of the stimulus" argument: observing that people have an uncanny-seeming ability to extract huge amounts of multi-layered semantic meaning from relatively small and simple linguistic utterances, the argument is that this should only be possible if our interpretive abilities are rooted in an innate "universal grammar" or "language of thought" encoded in our genome and hardwired into the basic architecture of our brains, entirely independent from any specific twists and turns that might be taken by the individual's learning and cognitive development. In this account, the work of cognitive science was to "decode" this universal-grammar "algorithm" from its original language of ATGCs, with the corresponding work of AI research being to "translate" it into the digital language of a human-built computer.

    The counterpoint, which one might call the "wealth of the stimulus" argument, is that people exchanging words with each other are never *only* parsing linguistic input, but a broad range of non-linguistic context-based cues, grounded in shared experience, which help to parse meanings that might otherwise be vague or unclear if the input actually did consist *entirely* of language. (A famous account of an exchange between the philosopher Ludwig Wittgenstein and the economist Piero Sraffa: Wittgenstein insisted that all communication must, in principle, have a precise semantic content that can, in principle, be articulated linguistically as a finite series of atomistic logical propositions, to which Sraffa responded with a contemptuous/dismissive Italian hand gesture, brushing his hand across his chin, and retorted, "what's the semantic content of *that*?")

    Observing these sorts of results in LLM models is an excellent object lesson in what linguistic cognition might look like if the stimulus genuinely *was* impoverished in the ways that old-fashioned cognitive scientists imagined it to be, if the development of a cognitive agent genuinely *did* have nothing to go on but the surface-level semantic inputs of language, with none of the rich layers of materially-grounded developmental experience possessed by an actual human being.

  16. Ken G Silver badge
    Trollface

    We need to cut Gen AI some slack here

    Speaking as someone who is also perceived as the most intelligent in the room, while having no real understanding of what I'm talking about, I feel great sympathy for my artificial comrades.

    Why can't we all just get along?

  17. Anonymous Coward
    Anonymous Coward

    To be fair, most people I know also don't understand what they are talking about. Myself included.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like