The Register Home Page

back to article AI chatbots waffle on GOV.UK queries, then get facts wrong when told to zip it

Artificial intelligence chatbots can be too chatty when answering questions on government services, swamping accurate information and making mistakes if told to be more concise, according to research. The Open Data Institute (ODI) tested 11 large language models (LLMs) on more than 22,000 questions, comparing their responses …

  1. Anonymous Coward
    Anonymous Coward

    If you can use "AI" right

    It can really turbocharge dealing with benefit applications.

    I am guessing this is almost exactly what was not intended.

    Is there a sweepstakes yet on the year (or month at this rate) when there will have to be "state approved AI". as the rush to recreate religious factions I can see holy wars between the various sects of AI

    1. elsergiovolador Silver badge

      Re: If you can use "AI" right

      The goal probably is how to funnel billions to US corporations whilst making applications more difficult and claim that Britain is successfully adopting AI in the flash of braindead reporters ultimately being paid by same people who own US corporations.

  2. Paul Herber Silver badge
    Megaphone

    'AI chatbots waffle ... then get facts wrong ...'

    So, looks to me like being an MP could be an early casualty in the AI race to the bottom.

    1. Stumpy

      For all their faults, I'd happily take an LLM over the shower of crap we have in government at the moment ...

      1. Paul Herber Silver badge

        An AI> I live in *********. Who is my MP?

        The word 'my' implies ownership and you won't own an MP until you have bought and paid for one.

        An AI>

      2. Ken Hagan Gold badge

        Funny how you've attracted upvotes and downvotes for this but you didn't actually say which country you are living in. It appears that crap government is a universal phenomenon.

        Still, be careful what you wish for...

    2. Red Ted
      Go

      Waffling and never admitting it doesn't know

      There was a recent article in the Guardian about the results of the cartoonist Martin Rowson (who has quite a large on line presence) asking an AI who his wife is (who does not have much of an on line presence). The suggested options are quite amusing (and all incorrect).

      I asked AI to name my wife. To the hopelessly incorrect people it cited, my deepest apologies

  3. MacDBB

    Agressive hallucination sometimes seems linked to history accumulation.

    One of the most disturbing recent interactions I had with an LLM was with the Gemini (full model) system last month. I passed it a web page and started to ask questions about its content. The first warning flag was that Gemini spent no detectable time downloading and processing the page (it was quite a long page that can load slowly). After that it proceeded to make up quotes from the page and criticise the content. But checking the quotes revealed none of them were actually in the page, when asked about this Gemini apologized and proceeded to make up more quotes. When further pressed on this Gemini started to blame the authors of the page (that it hadn't read) for being wishy washy and informed me that in coming to Gemini I was going to get the truth whether I liked it or not. Completely clearing Gemini's discussion history returned behavior to normal so I now run with history disabled. I wonder if these questions were asked serially in a single session or over multiple sessions with inter-session history disabled?

    1. Tom Chiverton 1

      Re: Agressive hallucination sometimes seems linked to history accumulation.

      I wonder why you are trusting Gemini to do this at all

      1. MacDBB

        Re: Agressive hallucination sometimes seems linked to history accumulation.

        Because for a lot of scenarios it helps to triangulate responses between AIs to reveal areas of agreement and points where individual AIs may have distinct perspectives.

        1. DJV Silver badge

          Re: AIs may have distinct perspectives

          They certainly do - and all of them are wrong!

    2. Andy The Hat

      Re: Agressive hallucination sometimes seems linked to history accumulation.

      If I asked a DWP droid about UC I would both expect that person to give accurate information and be legally liable for the accuracy of that information. Where does the responsibility for data accuracy lie with this new "AI" system?

    3. Bebu sa Ware Silver badge
      Coat

      Re: Agressive hallucination sometimes seems linked to history accumulation.

      "apologized and proceeded to make up more quotes" "When further pressed on this [he] started to blame the author"

      Eerily like the late PM, Boris Johnson (or any politician I suppose.)

  4. elsergiovolador Silver badge

    Wine and steak

    I am sure any worries of the ministers and civil servants can be resolved by ^^^ see title

    Roll out the AI, open the money taps, don't worry about consequences. In the UK people are always falling upwards.

    1. Paul Herber Silver badge

      Re: Wine and steak

      'In the UK people are always falling upwards.'

      Gravity should have been privatised many years ago. By now there would be no money left and gravity would have to be turned off at night and weekends.

      1. Paul Herber Silver badge

        Re: Wine and steak

        Local gravity effects are already being felt ...

        https://en.wikipedia.org/wiki/Morris_dance#/media/File:Morris_dancers_York_8667.jpg

    2. Bebu sa Ware Silver badge
      Childcatcher

      In the UK people are always falling upwards.

      In the fantasy novel "The Court of the Air" one of the functions of the Court appeared to be retrieving individuals caught on the chunks of the planet which have been projected into space by some geological process.

      One reviewer wrote "The Land of Jackals is a warped echo of Britain during the early 19th century"

      We might write: "Britain in the early 21st century is a warped echo of its past."

      Surprising what you read picking up discarded books from the kerbside. Karst Geomorphology anyone? Actually not too dry. ;)

  5. Long John Silver Silver badge
    Pirate

    Smaller is more beauteous?

    "Smaller, cheaper-to-run LLMs can deliver comparable results to large closed source ones such as OpenAI's ChatGPT 4.1, the ODI said."

    To my mind, the quoted statement ranks almost as self-evident.

    Pursuit of 'all singing, and all dancing' LLMs may, in part, be attributed to an assumption that the more an 'AI' knows the closer it becomes to a 'general intelligence'. As of now, that assumption appears ill-founded.

    Apparently, general purpose commercial 'AIs' are trained using whatever digital 'content' is to hand. 'Discrimination' appears anathema to 'AI' technicians. Hence, the phrase 'slop-in gives slop-out' is relevant when 'AI' training uses the content of Twitter (and similar) alongside the best texts Anna's Archive can offer.

    Thus, it would be sensible for organisations requiring databases for interrogation by employees and/or clients to commission bespoke 'AIs'. Learned professions should consider commissioning and updating specialised 'AIs'; that doesn't preclude tapping into general purpose models. Also, it shouldn't be assumed that highly specific models require immense electricity-hungry computers for their training: hardware and software technologies advance apace. At present, relatively small resources are needed for fine-tuning and for 'distilling' enormous models to make them both adept at particular tasks and containable on modest equipment.

    1. Bebu sa Ware Silver badge
      Windows

      more an 'AI' knows the closer it becomes to a 'general intelligence'.

      I don't think that is even particularly true of human intelligence. Part of intelligence is being able to quickly discard irrelevant information/knowledge and to focus its analytical processes on the remainder.

      Sherlock Holmes was famously unaware of the Heliocentric arrangement of the Solar system but as he pointed out to Watson that it was irrelevant to his work whether the Earth went round the Sun, or the Sun the Earth.

      "General Intelligence" even in the perverse sense the slop merchants intend must ultimately be those focused analytical processes that characterises human intelligence and not the purely representational manipulations of encoded information that amounts to current AI.

      1. Mike 137 Silver badge

        Re: more an 'AI' knows the closer it becomes to a 'general intelligence'.

        "Sherlock Holmes was famously unaware of the Heliocentric arrangement of the Solar system"

        I strongly suspect that, given the character of Holmes as portrayed, he did know but didn't care, and this was actually heavy sarcasm aimed at Watson's predilection for concentrating on irrelevancies.

  6. Pete 2 Silver badge

    When "I don't know" is all that's needed

    > they rarely refuse to answer, even when they probably should

    So, just like politicians. Who always feel the need to proffer an opinion. Even when they are speaking far outside their area of expertise or responsibility

  7. Anonymous Coward
    Anonymous Coward

    Just like Civil Servants then?

    "They found that models often waffled, burying the facts or going beyond authoritative government information"

    Sounds just like a civil servant then. Or indeed any worker in the public sector.

    So we could replace them all with AI and no-one would notice the difference?

    1. Anonymous Coward
      Anonymous Coward

      Re: Just like Civil Servants then?

      Starmer has already publicly announced that AI is the future, and as a result it is being thrown at almost every activity the Civil Service undertake (I speak as a civil servant).

      Remember that this is what you wanted.

      1. Anonymous Coward
        Anonymous Coward

        Re: Just like Civil Servants then?

        As the AC its most definitely not what I wanted.

        I didn't vote for "Change", and the perpetual omnishambles from the chaos that pretends to be the current UK government.

  8. Mike 137 Silver badge

    Surprising really - not

    "Verbosity is known behavior of LLMs – they are prone to 'word salad' responses that make them harder to use and decrease their reliability,"

    Of course! They're language models, not concept models. The LLM has absolutely no understanding of anything - it's merely a statistically driven blind token stream generator. I've given up trying to fathom why this really basic fact has not sunk in. Maybe it's down to the hype, as in the case of the tulip mania, the South Sea bubble and the convulsionnaires of Saint-Médard? Or maybe we've just given up using our own brains because it's so much easier to query an LLM than actually think.

  9. PB90210 Silver badge

    "Some, including Anthropic's Claude 4.5 Haiku, were more verbose than others."

    Is this because it answers in haiku?

    1. Like a badger Silver badge

      Hey! There's something AI is good for - second rate haiku for lazy English speakers! After a few tries, Perplexity offered me this haiku, to cover how crap AI is (well, I didn't use those exact words):

      Whispering code lies

      Empty gods of silicon

      We kneel, jobs undone

      That's not half bad really? I do wonder how many billion have been thrown on the bonfire for this low achievement.

  10. Winkypop Silver badge
    FAIL

    You know

    It’s all a bit shit really

  11. DoctorPaul Bronze badge

    "we need to understand where the technology can be trusted and where it cannot,"

    Well that's simple then, it can't be trusted anywhere.

    Any more questions?

  12. Fr. Ted Crilly Silver badge

    Sir Humphrey's revenge...

    Waffle, blather verbosity, distraction...

  13. Tron Silver badge

    users be told ... where to find authoritative information.

    Well they would presumably expect to find 'authoritative information' from a government website or app - where these chatbots are.

    They are happy enough to pay for dodgy US planes that are not compatible with our missiles, but can't be arsed to train and pay people to help UK citizens use their own government services.

  14. c1ue

    Enshittification at scale

    I continue to be fascinated at the ongoing LLM tradeoff of scale vs. accuracy.

    Yes, LLMs can enable far greater access and manipulation of data but at a direct cost of double digit percent inaccuracy.

    GIGO is replaced by GOAATAS : Garbage Out, At Any Time, At Scale

    Is this corporate enshittification run rampant? A modern societal weakness for quantity over quality, or ease of use over reliability?

    Compute has been taken over by grading on a curve.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon