The Register Home Page

back to article Stale gov.uk pages are feeding AI overviews old data and Brits are believing it

AI overviews from the likes of Google are serving up false summaries of UK government information by drawing on stale GOV.UK pages, according to content designers at the Department for Business and Trade (DBT). The problem, senior content designer Giorgio Di Tunno and content operations lead Neil Starr wrote in a GOV.UK blog …

  1. Brewster's Angle Grinder Silver badge

    "We now need to design with the expectation that much of what we publish will be read indirectly, atomized, summarized or reinterpreted by systems we don't control,"

    The machines are now our masters.

    1. elsergiovolador Silver badge

      Translation: "People will actually read our pages at scale and find out we have been doing poor job"

      1. Anonymous Coward
        Anonymous Coward

        A good number of pages include a date, so that should be a giveaway. Other sites have archived pages that retain an old visual style- easy for a human to read, but potentially not easy for a scraper.

        Getting information from departments that no longer exist is not a great for AI given the trillions of dollars invested so far. Especially for Google- if those pages haven't been viewed in ages despite having fairly commonly requested information their older algorithm must have been able to identify it as old/outdated/otherwise less-helpful.

      2. Tron Silver badge

        True.

        You might discover just how much more expensive everything now is in the UK and blame the government.

        Governments work hard to make sure that stuff like campaign promises and stats that emphasise their failure are difficult to find and aren't mentioned by the TV or papers. AI is undermining this cover up at scale.

        Shocking, eh?

  2. Doctor Syntax Silver badge

    IoW some departments have no process for removing pages when they're superseded. Sir Humphrey would be most displeased.

    1. Paul Herber Silver badge
      Trollface

      Not only is there no such process but nobody is allowed to tell anyone else when pages need to be removed or updated. There might even be pages that have old superseded names for the PM and head of the Foreign Office!

    2. Dan 55 Silver badge

      They usually put a banner across the top saying this page is archived, is not being updated any more, and may contain out of date information.

      If AI is so clever clever, why can't it understand that?

      1. RockBurner

        You think any AI is capable of distinguishing the state of any piece of information it digests?? (old, new, genuine, fake, deliberately misleading, joking, threatening, pacifying.... etc etc)

        It's all grist to the mill, that's (one of) the problem(s).

      2. Doctor Syntax Silver badge

        "may contain out of date information"

        If there's no way of telling that information on the page may be true or false - which is what "outdated" means then the page contains no information at all and ought to be removed because there's no point in keeping it.

        Archive it off-line for historical purposes if that's required.

        1. Dan 55 Silver badge

          People might need to go back and look at the archived version if they previously took a decision some time ago based on the old version.

          It's a service for the public so historical versions should be available. It's up to Google to get it right.

          1. Doctor Syntax Silver badge

            TNA is the place for archives or, for the web, archive.org

            1. Dan 55 Silver badge

              So much for digital sovereignty if uk.gov has to depend on archive.org. Also, is everyone supposed to know how to copy and paste addresses into the Wayback machine and use the fiddly date control?

              The current system is perfectly usable and has a good reason to work the way it does, it's just Google's slurp can't understand it. That's a Google problem, not a uk.gov problem.

          2. Disgusted of Cheltenham

            We wouldn't be using Google if the gov.uk search worked properly.

            1. Anonymous Coward
              Anonymous Coward

              I'd normally be the very last person to defend the UK government machinery, but I will say it's not just their problem. At a previous employer, it was a well-known running joke that if we as employees needed to find anything on our corporate public website, it was far easier to Google for it rather than battle the completely useless on-site search function.

              And the same went for non-public stuff; the company's intranet search was just as bad, but chances were depressingly high that whatever confidential document you were looking for, Google would find a copy of it that had been leaked to El Reg or Anandtech or wherever. Many were the times I'd look for a roadmap document, give up on the intranet, then google and quickly find it posted here :)

        2. ChrisC Silver badge

          Disagree, the information it contains is knowledge as to what *was* being presented to the public as truthful/accurate between the time the page was first uploaded through to the time it was withdrawn, regardless of how truthful/accurate any of it might have been...

          Similarly with provisional releases of upcoming changes to information, where there may also be inaccuracies between what's proposed and what eventually makes it into the final version, there are reasons why making these non-current versions can still be useful to users. So as long as all non-current versions are marked up in a way which makes it clear they aren't the current version, then there shouldn't be any problem in allowing them to remain readily available rather than being hidden away in a myriad of different offline archives requiring more effort to access.

        3. plunet

          Absolutely not, the context of the page is what is important but we should not be removing pages, and whilst the government does stuff many things up, gov.uk is not one of the screw ups, and it usually clearly signposts where old policy or guidance has been superceded.

          On the same basis would we be burning certain pages out of Hansard just because the policy or law being debated at the time has been changed since?

          1. This post has been deleted by its author

          2. Doctor Syntax Silver badge

            "On the same basis would we be burning certain pages out of Hansard just because the policy or law being debated at the time has been changed since?"

            No, because they record what was said even if the policy or law has changed since.

            But consider this. You have a web page with a banner saying some of the rest of the content on the page may be untrue - or out of date if you prefer the verbose form. Now go through each piece of information on the page, testing it against the banner and delete it if cannot be relied on. Because the banner tells you it can't be relied on you'll delete every piece of information and be left with an empty page. At that point you might reasonably decide to go for the current page.

            A page which purports to tell you something current needs to be current. Stale pages have been the curse of the web almost singe TBL invented it.

            1. Dan 55 Silver badge

              If you need information that is current you go to the current version. If you need the information as it was when it was relevant at a certain time in the past, you stick with the archived version. There are reasons for both versions to be there, so they should be there.

              As LLM scrapers don't heed robots.txt anyway, the government couldn't do anything but take it offline so it would be inconveniencing its citizens to please a foreign private corporation. And who's even sure that removing the old versions would remove them from LLM training data?

              Surprisingly, even in the 2020s, the government's job is not inconveniencing its citizens and pleasing foreign private corporations.

            2. thames Silver badge

              The problem isn't with the web pages. The problem is that the AI is telling you that some outdated information that it dug out of the bowels of the historical archives in the web site is current. If you had just gone to the web site yourself and used the proper procedures for finding the relevant information you would have been presented with the current information. If you had looked at the old web page yourself instead of relying on the AI you would in most cases see a notification of some sort that the page is out of date and has been archived for historical reference.

              Of course the AI may also have simply made the information up out of thin air like it commonly does when it doesn't "know" the answer, and that no doubt happens a lot in these cases as well.

              The root of the problem though isn't the web site. The root of the problem is people believing what an AI tells them. Until people stop accepting what an AI says as being true, there is no solution.

      3. Anonymous Coward
        Anonymous Coward

        Because AI doesn't 'understand' anything. I think what we're looking at here is hallucinations.

        The 5 core reasons hallucinations happen

        1. The model predicts the most likely next token, not the true next token

        LLMs don’t retrieve facts — they generate text by predicting what should come next based on patterns in training data. If the model lacks the exact fact, it fills the gap with the most statistically plausible continuation.

        This is the root cause of hallucination.

        2. Missing or incomplete training data

        If the model has:

        • never seen the fact

        • seen conflicting examples

        • seen outdated information

        …it will still try to answer, because that’s what it’s designed to do. It won’t say “I don’t know” unless explicitly trained or instructed to.

        3. Overgeneralisation

        Neural networks compress knowledge into shared parameters. This means they sometimes:

        • blend concepts

        • merge similar patterns

        • infer relationships that don’t exist

        This is the same mechanism that makes them powerful — and fallible.

        4. Ambiguous or underspecified prompts

        If the prompt is vague, the model fills in details to make the answer coherent. Humans do this too, but we have real-world grounding; the model doesn’t.

        5. No built-in fact-checking

        LLMs don’t have:

        • a database

        • a truth engine

        • a verification layer

        Unless connected to external tools, they rely entirely on internal patterns. So they can produce fluent, confident, incorrect statements.

        Why hallucinations are inevitable in generative models

        Because the model must always produce an output — even when:

        • the data is missing

        • the question is impossible

        • the facts are unknown

        • the prompt is contradictory

        A human can say “I don’t know.” A generative model must generate.

  3. Guy de Loimbard Silver badge
    Terminator

    Shit in.....

    Shit out.

    AI like this is just fancy searching isn't it.... but being too lazy to go direct and actually look at GOV webpages, so where does the issue lay?

    1. Richard 12 Silver badge
      Devil

      Re: Shit in.....

      Google, Bing et al are actively trying to make sure nobody does go to the original sources.

      Because when people do that, they might send some ad impressions to someone else!

  4. Andy Non Silver badge
    FAIL

    Similar issues with the NHS

    I needed to ask a general question of a local hospital. An NHS website gave an email contact address for such enquiries. I sent an email and got an automated reply back saying the email address had been discontinued and to email a different address. Tried the new address. Never got a reply. Sigh.

    1. elsergiovolador Silver badge

      Re: Similar issues with the NHS

      Knowing NHS, the staff probably doesn't know how to operate the computer that has the email on it.

    2. Anonymous Coward
      Anonymous Coward

      Re: Similar issues with the NHS

      Trying to communicate with the NHS can resemble the experiences of the surveyor in Kafka's 'The Castle'...

  5. jake Silver badge

    Remember, data is not ephemeral. It lasts forever[0].

    However, it is quite capable of becoming stale. As I've mused elsewhere, what percentage of the combined mass of a government or a company's data can become stale, before that whole mass of data becomes virtually useless?

    Searching alpha-goo for information on a few people who I personally know very well shows that data to be staler than last month's bread. In a couple cases, it is heading past the compost stage and into the sludge at the bottom of the septic tank territory. And in many of those cases, it has also been intentionally corrupted by the people in question.

    Combine that with the fact that everybody seems to think that collecting any and all data possible off "The Internet" (whatever that means these days) is a good thing, despite the fact that such bulk slurping is by its very nature full of demonstrably incorrect, incomplete, incompatible and/or irrelevant data ... At some point, entropy says the data will become completely useless. Gut feeling is that we are getting very close to that point.

    And we're using that clusterfuck of garbage to feed so-called AI? Lovely. No wonder it's prone to "hallucinating" Back in my day, we called it GIGO.

    [0] When stored properly.

    1. Dan 55 Silver badge

      Re: Remember, data is not ephemeral. It lasts forever[0].

      After 10 years or so, it seems more links are broken that working.

  6. Peter Prof Fox

    Expiry dates

    When I'm Prime Minister all government pages will have a 'Best Before date'. This would be of a defined duration according to circumstances. Something along the lines of: This information is not expected to change before the 31st April 2027. Or Not valid after 1st April 2027. That way it's a prompt for the maintainers to actually maintain their stuff or be called-out.

    A valid-from date might add confidence.

    1. elsergiovolador Silver badge

      Re: Expiry dates

      Then civil servants will put random dates just to submit the form and be done with it.

      1. WSWS

        Re: Expiry dates

        Make that a fireable offense and see how quick they stop. Civil servants don't care about doing their jobs properly, but they sure as hell care about their jobs.

    2. Brave Coward Bronze badge

      Re: Expiry dates

      Do we have April 31st now?

      I really feel like I've lost track of time lately.

    3. Dan 55 Silver badge

      Re: Expiry dates

      Gov.uk pages do have a published date, a last updated date, a changelog, and some have an expected date of next revision date (NHS).

    4. johnB

      Re: Expiry dates

      Agree 100% - just about all webpages would benefit from a date of issue and also a "best before"

  7. Andy The Hat

    Why is this seen as a particular issue with gov.uk? There are millions of sites that archive or deprecate pages daily, often leaving them available as reference or archive, but still expect viewers to have the common sense to read the page that's in front of them before they dig down into historic material.

    As we all should know, data is not information, even if AI says it is.

    Anybody who blindly trusts AI to validate what it spits out as truth deserves all they get ...

    1. I ain't Spartacus Gold badge

      Was chatting with the new lad in the office the other day. I mentioned my parents buying a house for £80,000 back in the 80s. Tappity-tap, that's £2 million nowadays. Me, nah, surely about £200k worth? Him Google says £2m and they've even got a link to the Bank of England inflation calculator. Which says average inflation since then has been 2.3% a year. Which I highly doubt, that seems way too low. So it looks like it can't do basic maths or look up the correct inflation percentage over 40 years - surely the maths ought to be the easy bit and the research the tough bit?

      Those AI search results are a terrible advert for Gemini.

      1. WSWS

        2.3% a year for 40 years makes 80k almost exactly 200k.

      2. Anonymous Coward
        Anonymous Coward

        It's a terrible advert for your new lad.

        Sticking a 0 on the end of the £80,000 gets you his £2million.

      3. IGotOut Silver badge

        Or he went tappity tap and it said the average house in the 80s costing £80,000 is probably worth about £2million now, which I would totally believe, especially in London.

        As an example, my parents bought their house in the 80s for £22,000 and us worth around £250,000 and that's in a crappy town in Worcestershire.

        1. I ain't Spartacus Gold badge

          That house got extended and sold in 2008 for about £450k - it's now some kind of group living place for kids in their last couple of years in care - so they can learn how to cook, shop, clean and generally do horrible adult crap - life would be so much nicer if I wasn't required to also have life-skills! If only I could have domestic staff... Anyway, who knows what it's worth now.

          I did the same Google Gemini thing when my nephew asked my Mum how old she was in days. According to it, being 87 made her 3 million days old! Which sounded horribly wrong to me. I could have done the calculation properly, but there are websites which will tell you how many days there are between two dates - so I just grabbed one of those.

          I presume it's because the LLMs are text based, and biased towards giving answers rather than saying they don't know. I wonder whether "agentic AI" would do better, because it's supposedly designed to go off and chuck the question into an existing tool or website that can actually do the calculations? Or whether it's equally crap until you've trained it how to go off and do repetitive tasks?

  8. frankyunderwood123 Bronze badge

    people expect miracles

    the lack of understanding of how LLMs work is going to land a lot off people into hot water.

    where I work a recent series of consultations to determine a future path was decided by an LLM.

    I pointed out that many of the inputs were little other than bias or guesses, yet miracles were expected.

    The LLM did a convincing job reaching a recommendation based on flawed data.

    The recommendation was followed.

  9. nijam Silver badge

    > AI overviews from the likes of Google are serving up false summaries of ...

    ...everything.

  10. Pete 2 Silver badge

    Lie-ability

    > Google are serving up false summaries

    The UK has the Online Safety Act which requires websites to take down material that is deemed illegal. Surely misleading the public comes under the description of illegal?

    Although if that was to be applied to the media, as it is to advertising standards, there would be precious little material on most "news" websites or their print versions.

  11. Anonymous Coward
    Anonymous Coward

    Zombie pages!

    Took over a large site once where the previous admin had built a *new* version of each page every time someone wanted to make a change to things like fees and taxes. The old pages were left swinging. It took a while to clean up.

    1. jake Silver badge

      Re: Zombie pages!

      Around there here parts, them's known as "cobweb pages".

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon