"We now need to design with the expectation that much of what we publish will be read indirectly, atomized, summarized or reinterpreted by systems we don't control,"
The machines are now our masters.
AI overviews from the likes of Google are serving up false summaries of UK government information by drawing on stale GOV.UK pages, according to content designers at the Department for Business and Trade (DBT). The problem, senior content designer Giorgio Di Tunno and content operations lead Neil Starr wrote in a GOV.UK blog …
A good number of pages include a date, so that should be a giveaway. Other sites have archived pages that retain an old visual style- easy for a human to read, but potentially not easy for a scraper.
Getting information from departments that no longer exist is not a great for AI given the trillions of dollars invested so far. Especially for Google- if those pages haven't been viewed in ages despite having fairly commonly requested information their older algorithm must have been able to identify it as old/outdated/otherwise less-helpful.
You might discover just how much more expensive everything now is in the UK and blame the government.
Governments work hard to make sure that stuff like campaign promises and stats that emphasise their failure are difficult to find and aren't mentioned by the TV or papers. AI is undermining this cover up at scale.
Shocking, eh?
"may contain out of date information"
If there's no way of telling that information on the page may be true or false - which is what "outdated" means then the page contains no information at all and ought to be removed because there's no point in keeping it.
Archive it off-line for historical purposes if that's required.
So much for digital sovereignty if uk.gov has to depend on archive.org. Also, is everyone supposed to know how to copy and paste addresses into the Wayback machine and use the fiddly date control?
The current system is perfectly usable and has a good reason to work the way it does, it's just Google's slurp can't understand it. That's a Google problem, not a uk.gov problem.
I'd normally be the very last person to defend the UK government machinery, but I will say it's not just their problem. At a previous employer, it was a well-known running joke that if we as employees needed to find anything on our corporate public website, it was far easier to Google for it rather than battle the completely useless on-site search function.
And the same went for non-public stuff; the company's intranet search was just as bad, but chances were depressingly high that whatever confidential document you were looking for, Google would find a copy of it that had been leaked to El Reg or Anandtech or wherever. Many were the times I'd look for a roadmap document, give up on the intranet, then google and quickly find it posted here :)
Disagree, the information it contains is knowledge as to what *was* being presented to the public as truthful/accurate between the time the page was first uploaded through to the time it was withdrawn, regardless of how truthful/accurate any of it might have been...
Similarly with provisional releases of upcoming changes to information, where there may also be inaccuracies between what's proposed and what eventually makes it into the final version, there are reasons why making these non-current versions can still be useful to users. So as long as all non-current versions are marked up in a way which makes it clear they aren't the current version, then there shouldn't be any problem in allowing them to remain readily available rather than being hidden away in a myriad of different offline archives requiring more effort to access.
Absolutely not, the context of the page is what is important but we should not be removing pages, and whilst the government does stuff many things up, gov.uk is not one of the screw ups, and it usually clearly signposts where old policy or guidance has been superceded.
On the same basis would we be burning certain pages out of Hansard just because the policy or law being debated at the time has been changed since?
This post has been deleted by its author
"On the same basis would we be burning certain pages out of Hansard just because the policy or law being debated at the time has been changed since?"
No, because they record what was said even if the policy or law has changed since.
But consider this. You have a web page with a banner saying some of the rest of the content on the page may be untrue - or out of date if you prefer the verbose form. Now go through each piece of information on the page, testing it against the banner and delete it if cannot be relied on. Because the banner tells you it can't be relied on you'll delete every piece of information and be left with an empty page. At that point you might reasonably decide to go for the current page.
A page which purports to tell you something current needs to be current. Stale pages have been the curse of the web almost singe TBL invented it.
If you need information that is current you go to the current version. If you need the information as it was when it was relevant at a certain time in the past, you stick with the archived version. There are reasons for both versions to be there, so they should be there.
As LLM scrapers don't heed robots.txt anyway, the government couldn't do anything but take it offline so it would be inconveniencing its citizens to please a foreign private corporation. And who's even sure that removing the old versions would remove them from LLM training data?
Surprisingly, even in the 2020s, the government's job is not inconveniencing its citizens and pleasing foreign private corporations.
The problem isn't with the web pages. The problem is that the AI is telling you that some outdated information that it dug out of the bowels of the historical archives in the web site is current. If you had just gone to the web site yourself and used the proper procedures for finding the relevant information you would have been presented with the current information. If you had looked at the old web page yourself instead of relying on the AI you would in most cases see a notification of some sort that the page is out of date and has been archived for historical reference.
Of course the AI may also have simply made the information up out of thin air like it commonly does when it doesn't "know" the answer, and that no doubt happens a lot in these cases as well.
The root of the problem though isn't the web site. The root of the problem is people believing what an AI tells them. Until people stop accepting what an AI says as being true, there is no solution.
Because AI doesn't 'understand' anything. I think what we're looking at here is hallucinations.
The 5 core reasons hallucinations happen
1. The model predicts the most likely next token, not the true next token
LLMs don’t retrieve facts — they generate text by predicting what should come next based on patterns in training data. If the model lacks the exact fact, it fills the gap with the most statistically plausible continuation.
This is the root cause of hallucination.
2. Missing or incomplete training data
If the model has:
• never seen the fact
• seen conflicting examples
• seen outdated information
…it will still try to answer, because that’s what it’s designed to do. It won’t say “I don’t know” unless explicitly trained or instructed to.
3. Overgeneralisation
Neural networks compress knowledge into shared parameters. This means they sometimes:
• blend concepts
• merge similar patterns
• infer relationships that don’t exist
This is the same mechanism that makes them powerful — and fallible.
4. Ambiguous or underspecified prompts
If the prompt is vague, the model fills in details to make the answer coherent. Humans do this too, but we have real-world grounding; the model doesn’t.
5. No built-in fact-checking
LLMs don’t have:
• a database
• a truth engine
• a verification layer
Unless connected to external tools, they rely entirely on internal patterns. So they can produce fluent, confident, incorrect statements.
Why hallucinations are inevitable in generative models
Because the model must always produce an output — even when:
• the data is missing
• the question is impossible
• the facts are unknown
• the prompt is contradictory
A human can say “I don’t know.” A generative model must generate.
I needed to ask a general question of a local hospital. An NHS website gave an email contact address for such enquiries. I sent an email and got an automated reply back saying the email address had been discontinued and to email a different address. Tried the new address. Never got a reply. Sigh.
However, it is quite capable of becoming stale. As I've mused elsewhere, what percentage of the combined mass of a government or a company's data can become stale, before that whole mass of data becomes virtually useless?
Searching alpha-goo for information on a few people who I personally know very well shows that data to be staler than last month's bread. In a couple cases, it is heading past the compost stage and into the sludge at the bottom of the septic tank territory. And in many of those cases, it has also been intentionally corrupted by the people in question.
Combine that with the fact that everybody seems to think that collecting any and all data possible off "The Internet" (whatever that means these days) is a good thing, despite the fact that such bulk slurping is by its very nature full of demonstrably incorrect, incomplete, incompatible and/or irrelevant data ... At some point, entropy says the data will become completely useless. Gut feeling is that we are getting very close to that point.
And we're using that clusterfuck of garbage to feed so-called AI? Lovely. No wonder it's prone to "hallucinating" Back in my day, we called it GIGO.
[0] When stored properly.
When I'm Prime Minister all government pages will have a 'Best Before date'. This would be of a defined duration according to circumstances. Something along the lines of: This information is not expected to change before the 31st April 2027. Or Not valid after 1st April 2027. That way it's a prompt for the maintainers to actually maintain their stuff or be called-out.
A valid-from date might add confidence.
Why is this seen as a particular issue with gov.uk? There are millions of sites that archive or deprecate pages daily, often leaving them available as reference or archive, but still expect viewers to have the common sense to read the page that's in front of them before they dig down into historic material.
As we all should know, data is not information, even if AI says it is.
Anybody who blindly trusts AI to validate what it spits out as truth deserves all they get ...
Was chatting with the new lad in the office the other day. I mentioned my parents buying a house for £80,000 back in the 80s. Tappity-tap, that's £2 million nowadays. Me, nah, surely about £200k worth? Him Google says £2m and they've even got a link to the Bank of England inflation calculator. Which says average inflation since then has been 2.3% a year. Which I highly doubt, that seems way too low. So it looks like it can't do basic maths or look up the correct inflation percentage over 40 years - surely the maths ought to be the easy bit and the research the tough bit?
Those AI search results are a terrible advert for Gemini.
Or he went tappity tap and it said the average house in the 80s costing £80,000 is probably worth about £2million now, which I would totally believe, especially in London.
As an example, my parents bought their house in the 80s for £22,000 and us worth around £250,000 and that's in a crappy town in Worcestershire.
That house got extended and sold in 2008 for about £450k - it's now some kind of group living place for kids in their last couple of years in care - so they can learn how to cook, shop, clean and generally do horrible adult crap - life would be so much nicer if I wasn't required to also have life-skills! If only I could have domestic staff... Anyway, who knows what it's worth now.
I did the same Google Gemini thing when my nephew asked my Mum how old she was in days. According to it, being 87 made her 3 million days old! Which sounded horribly wrong to me. I could have done the calculation properly, but there are websites which will tell you how many days there are between two dates - so I just grabbed one of those.
I presume it's because the LLMs are text based, and biased towards giving answers rather than saying they don't know. I wonder whether "agentic AI" would do better, because it's supposedly designed to go off and chuck the question into an existing tool or website that can actually do the calculations? Or whether it's equally crap until you've trained it how to go off and do repetitive tasks?
the lack of understanding of how LLMs work is going to land a lot off people into hot water.
where I work a recent series of consultations to determine a future path was decided by an LLM.
I pointed out that many of the inputs were little other than bias or guesses, yet miracles were expected.
The LLM did a convincing job reaching a recommendation based on flawed data.
The recommendation was followed.
> Google are serving up false summaries
The UK has the Online Safety Act which requires websites to take down material that is deemed illegal. Surely misleading the public comes under the description of illegal?
Although if that was to be applied to the media, as it is to advertising standards, there would be precious little material on most "news" websites or their print versions.