
The world's so ****ed-up right now, if I got a completely accurate summary of events (and assuming I wasn't actually witness to the s**tness firsthand) I'd be inclined to dismiss the summary as nonsense from the AI.
Still smarting from Apple Intelligence butchering a headline, the BBC has published research into how accurately AI assistants summarize news – and the results don't make for happy reading. In January, Apple's on-device AI service generated a headline of a BBC news story that appeared on iPhones claiming that Luigi Mangione, a …
This so-called "AI" stuff is essentially an automated version of Steve Bannon/Vladimir Putin's strategy of "flooding the zone with shit". In essence: if the truth is already out and hard to refute, you can flood the world with many alternative takes that don’t have to be very believable as long as people’s response is to trust nothing. Putin did this fairly successfully after his men shot down the MH17 passenger plane. Just come up with dozens of alternative scenarios, even if you have hundreds of aviation experts say that they are impossible, at least you can get people to distrust every take, even those from experts.
Now, I don't think AI was deliberately created to destroy common knowledge or shared truth (Unlike, let's say, Elon Musk's strategy which is quite openly to destroy shared truth, hence his attacks on Wikipedia) but creating uncertainty and doubts among the people is not a problem for the millionaire class. They will still have access to the factual information, they will not be held back by paywalls or Fake News (though some people argue that Elon Musk was radicalized by disinformation on his own platform). They will manage, they may even thrive because of it.
It’s no surprise that one of the main criticism of these “AIs” is that they increase inequality in the world. They concentrate the power over information in the hands of a small group of men while simultaneously making it harder for everyone else to still get the same quality of facts, information and data. There are even people out there who think they can use AI as an internet search engine, essentially giving up on finding actual information.
Sigh.
Wikipedia Prepares for 'Increase in Threats' to US Editors From Musk and His AlliesThe Wikimedia Foundation is building new tools that it hopes will help Wikipedia editors stay anonymous in part to avoid harassment and legal threats as Elon Musk and the Heritage Foundation ramp up their attacks on people who edit Wikipedia. Some of the tactics have been pioneered by Wikimedia in countries with authoritarian governments and where editing Wikipedia is illegal or extremely dangerous.
But I think wikipedia should just remove all US based editors until the MAGA fever (hopefully) passes. There will be plenty of them actively trying to rewrite history just like they are trying to do with school curriculum, so even if you could protect those trying to do things right you'll have to waste a ton of resources trying to undo those trying to "flood the zone with shit" ala Bannon.
Well actually no, you cannot draw that conclusion. Current AI is quite capable of (and frequently does) do worse than the material it was trained on. It is quite capable of mangling high- as well as low-quality input. Note also that this has little to do with human summarising; these systems would not be relying on human summary data, which might not even exist at the time of generation.
No, I don't think so. Look at what happened with its summary of how Pelicot found out about her being drugged and raped. The issue isn't bad summaries written about her case, but I'm guessing, it found more information about people not remembering abuse and assumed that the same causes applied in her case as well.
I wish someone would be honest and say "These things aren't intelligent, they just throw words together that seem, more or less, to be relevant to your question. A goodly amount of the time, the results are garbage. We don't really know when or why this happens, or even after the fact that it *did* happen, or we'd stop them from generating it. We *are* hoping we can figure some of this stuff out before you stop investing in us, though."
That may actually be partially correct.
While your statement about the human summarization is dubious at best (start with providing some evidence to support that conclusion) the GIGO principle definitely applies.
The thing is, though, that I can see much of the garbage actually coming from tons of disinformation that's currently flooding the Interwebs and that has been AI-generated in Russia and other countries where truth is considered a malleable commodity. Concerted attempts to disrupt the German elections are just one example. AI training is extremely vulnerable to that since an AI can't tell fact from fiction. Look no further than DeepSeek for an example of how this works.
So while the GIGO princple could (and in my opinion does) play a role here, I think it might be more a matter of AI-generated misinformation muddying the waters and one AI training itself on the output of another AI than the problem originating with humans not doing their jobs.
> ... in Russia and other countries where truth is considered a malleable commodity.
Sadly, "other countries" now includes the USA; it has long been the case in China and North Korea and is, more broadly, associated with populism in Europe, South America, South Asia and beyond.
> AI training is extremely vulnerable to [disinformation] since an AI can't tell fact from fiction.
It seems that would apply to humans too, though - else why would those disseminators of disinformation (let's just call it "lies", shall we?) even bother.
> So while the GIGO princple could (and in my opinion does) play a role here, I think it might be more a matter of AI-generated misinformation muddying the waters and one AI training itself on the output of another AI than the problem originating with humans not doing their jobs.
Fully agree.
But also, as the article says, some ‘facts’ appear to have been plucked from thin air, in the opinion of BBC journalists (who I respect but sometimes wish they would elaborate a little more on a story point). How the AI could have inferred that Michael Mosley died on a different date to that published in newspapers is beyond me, except maybe if the AI picked up some extraneous information in an ad or a ‘more on this’ linked article that a human would have instantly dismissed as being incorrect.
> They can only echo their training data because they're a fancy autocomplete.
I wouldn't disagree. However, bear in mind that to a large extent humans are only as good as their training data (cf. the insidious and recursive effects of the current explosion of mis/disinformation, rabble-rousing, science denialism and denigration of expertise).
But at least there's a limit to one human's output or effectiveness, one LLM could be serving up unlimited different kinds of crap to thousands of people per hour.
It's back to the old adage that computers are designed to make mistakes much faster than a human can.
I disagree, they can summarise extremely well, if treated properly. You have to take care telling it what you want it to do.
I often use LLMs to summarise input that I give it. For instance, I will feed it say, a 1000 word long piece of text and ask it to summarise it in say, 100 words. It usually does a good job in rewriting the content more concisely and clearly than the input was. Often it will drop something in the summary that I look at and decide "no, that bit is important, I really want that in" and I put it back in. It's also very good if I want it to rewrite something in a different way for a different type of audience.
The usual reasons for it doing a bad job is when I don't tell it to just use the import source. Then it hallucinates at worst, or uses unreferenced sources at best. So long as I tell it to stick to the input I give it, it does something helpful. I suspect Apple aren't feeding their AI carefully proof read inputs but instead are telling it to "go find what you can about this...".
"Often it will drop something in the summary that I look at and decide "no, that bit is important, I really want that in" and I put it back in."
"Then it hallucinates at worst, or uses unreferenced sources at best."
Yeah, that doesn't sound like a reliable tool at all.
Imagine if you passed the text off to an editor or junior to do that and they missed off important bits and made up stuff because you didn't tell it explicitly every time to stick to a single source (and even that, instructions to LLMs are not instructions... they're barely suggestions).
The problem is we've had things like this relying on far, far older technology since the late 90s, too. Sure it's slightly more accurate now, but a move from 97% accuracy to 99% accuracy is really only an extremely minor improvement in the grand scheme of things when compared to an actual human.
When it works, it feels magic, and it may work a lot of the time of the time, but to prevent something incredibly stupid from creeping in you have to just sit there, validating every output. Fatigue sets in, and even the human reviewer lets things slip through the net.
While we have made impressive advances to hardware, and transformers are a considerable improvement on architectures that have come before, we are simply not there yet, and nobody really knows how to fix these issues. Despite what the AI bros at large companies may say, academia is considerably more cautious unless they're pitching for funding.
"Often it will drop something in the summary that I look at and decide "no, that bit is important, I really want that in" and I put it back in"
Serious question - if you have to review things in that depth anyway, and it's only 100 words, is it actually more efficient?
I could do a 100 word summary of a doc I wrote in about 2 minutes and not have to revise it - is the AI really helping all that much if you have to apply so much effort into review? Let's leave aside the nightmarish reality that probably 90% of people wont bother to review output at all...
"Serious question - if you have to review things in that depth anyway, and it's only 100 words, is it actually more efficient?"
That's the key question! There are times when the process just doesn't work and I say to myself "forget it, I'll do it myself". But most of the time (way more than 50%) it does save time. The main thing is that I'm getting better (and quicker) and writing the right prompts rather than having to refine lots of prompts over and over. Also, the output comes back in seconds so I'm still in the right mindset to do the final edit rather than having task switched onto something else, that's where the big time saver is.
If the LLM can do 90% of the job and I have to do the 10%, I find that is still often (not always) quicker than me doing 100% of it. YMMV.
> The main thing is that I'm getting better (and quicker) and writing the right prompts rather than having to refine lots of prompts over and over.
Are you reusing the same prompt (plus copy'n'paste the article text)?
If not, and you are thinking up new prompts (and are those 100 words or less)?
If your task is to write 100 words to summarise something you already know about, is it really more efficient, timewise, to use an AI or are you spending more time and effort on it - but having more fun playing with the LLM? So it feels (at the moment) like the easier way to do it.
For shits and giggles, I asked Copilot to summarise this article. (Gemini didn't want to play ball.) It did a pretty good job, including the facts and figures from the article.
But maybe Microsoft have had time to analyse it and put in a db (as a blog linked below says Google does for Gemini). So I asked it on a niche site I run (a few hundred hits a month). Again it was accurate and correctly inferred a subtext that wasn't spelled out. Data on the site changes weekly and it gave uptodate data for the current week.
Finally, I asked it to summarise a tutorial on a technical blog I read yesterday. I thought I'd caught it in a mistake. But when it looked, I was wrong.
Whatever's going on behind the curtain, it's doing more than echoing it's training data.
Or it could just be you got lucky.
You'll need to repeat this experiment at least 1000 times with careful review, at which point you could draw a vague conclusion. Three sample points are not adequate to draw anything from. If you find it performing perfectly each time even then, I would eat my hat.
"Whatever's going on behind the curtain, it's doing more than echoing it's training data."
Sorta sort of. It's a huge tangled ball of wtf. Things like word2vec showed you can do some interesting algebraic 'reasoning' from unsupervised training at the word level. For example (man - king + woman == queen), which actually makes sense when you look at how it works.
At the scale of these tens of orders of magnitudes larger models, nobody really knows why it is doing anything, and some portions of the network may be simply undertrained but we don't know because we've not tried it yet (and will never have enough data to satisfy training it anyway, or adding more data will cause previously working parts to start barfing). When it comes to anything that requires auditing and accountability, this should be an enormous red flag.
Also remember that LLMs were created to generate content. They were not created to provide answers. .... AC
If not providing content creating answers is its purpose to generate questions and divisions denying human solution in favour of more of their moronic output/farcical input?
That would be a quite nonsensical situation and crazy tragicomical recursive suicidal loop arrangement delivering a self-destructive existential threat to the progress and radical evolution of a struggling barabaric and nomadic humanity ........to leave a once again barren place which be easily taken advantage of by SMARTR A.N.Others for recolonisation and commanding control of such formerly hostile deadly spaces with alien sources and forces ... trailing and trialing Pioneering AI Solutions.
*AI LOVE RAT* ..... Advanced IntelAIgent Live Operational Virtual Environment Remote Access Trojan
PS .... If you have any questions, ask the LOVE Machines.
Yes, I think the core problem of many LLMs is that they are essentially bullshitters. Bullshitters, as opposed to liars, don't really care about whether something is true or not, or even whether people find out it's not true. A prime example of a human bullshitter is Boris Johnson, he just chucks out a lot of sentences (some may be true, some may not, but he doesn't care either way) and he just sees what sticks.
A major problem with LLMs is that they communicate with a confidence that is completely unwarranted considering the quality of their knowledge. They would rather tell you the wrong answer than say that they are unsure or don't have an answer at all. That is obviously a commercial decision, LLMs wouldn't look impressive to simpletons if they exuded less confidence. And without impressing simpletons there wouldn't be these levels of investment. Also, they don't need to convince a software developer that the bullshit code it spews out will work, they only need to convince their bosses that they can reduce developer headcount by replacing developers with LLMs.
Ironically, experiencing overly confident LLMs that keep spouting the wrong answers, even when told their previous answer was wrong, ultimately reduces the confidence that people have in them.
Yesterday's experiment in LLMs:
DDG has added a "Chat" facility with access to several chatbots. I asked the the question "What is the Maythorn Way" (the correct answer, the only one which both Google & Bing find, is that it's the name given to a pre-turnpike era route between Marsden in West Yorkshire and Penistone in South Yorkshire originally proposed to have existed without any particular name being attached to it.)
GPT-4 hallucinates (really the only word for it) some woo which varies with asking - today's offering is "The Maythorn Way is a concept or term that may refer to various things depending on the context, but it is not widely recognized in popular culture or literature. It could potentially relate to a specific philosophy, a method of living, or a particular approach to a subject. If you have a specific context in mind, such as a book, a philosophy, or a community practice, please provide more details, and I would be happy to help clarify!" Yesterday's was similar but, from memory not quite the same. Note the request for more information.
Llama 3.3 confidently describes, in glowing tourist office terms, a walking route in the Cotswolds from Cirencester to Stowe-on-the-Wold. That's an interesting one.. It's not what a search engine would have given and a quick search doesn't give such a walking route that I could find under a different name. Is it hallucination or has it picked it up from some discussion group of Facebook? Possibly some commentard from that area could elucidate.
The others admit to ignorance and ask for more information although maybe adding some general remarks about Taoism. I think I'd be more inclined to trust their responses to questions where they did provide direct answers.
The request for more details may contradict DDG's note that chats are never used to train AI models.
I grew up round that way, and Llama seems to have decided to tell you about the Cotswolds way.
Except that the Cotswolds Way doesn't pass through either Cirencester or Stow-on-the-Wold (no 'e' in Stow). It gets quite close to both, but it would be a several hour detour from the actual route. So it's complete bobbins basically.
I like to ask it about the TV series The Good Life.
There is COPIOUS material out there, long established and hasn't changed in decades and they only made a handful of episodes, so you can literally know EVERYTHING that happens in the show just as a casual viewer, let alone a computer being asked to analyse it.
It will gladly make up characters, assign them attributes similar to existing characters, say that an actor played that part even when that actor has never been in the series, deny actors that were in the series ever featured in it. You can make Jerry and Margo have kids just by asking it their names. You can rename the pigs at will. You can have it deny George Cole was ever the bank manager. And so on.
Without any kind of "prompt poisoning", just asking innocent questions, like "Who was Gavin in The Good Life?" and things like that. It will then fight with you about their existence, flip-flopping between them being in it and not, and then argue outright that certain actors played the part, etc.
Just one demo with every new release when someone says "Ah, but Llama v2.0.4037854569 doesn't have that problem" is enough to convince me that AI is just a very fuzzy statistical output at best, and absolute trash at worst.
"The Maythorn Way is a concept or term that may refer to various things depending on the context, but it is not widely recognized in popular culture or literature. It could potentially relate to a specific philosophy, a method of living, or a particular approach to a subject. If you have a specific context in mind, such as a book, a philosophy, or a community practice, please provide more details, and I would be happy to help clarify!"
Sir Humphrey would be proud of that...
So I understand that, in some cultures, to not be seen as helpful is not a good look. Thus if someone in that culture is asked, say, directions to somewhere, if that person doesn’t know, they will make stuff up rather than say (admit, in their eyes?) they don’t know. Has Gen AI been programmed with that sort of culture, and could it be turned off? Perhaps compile with flag BS = False?
For what it is worth, Perplexity seems to get the right answer now:
"The Maythorn Way is an ancient track described by W.B.Crump in Huddersfield Highways Down The Ages (1949). It runs from Marsden to Penistone via Meltham, Holmfirth, Hepworth, Maythorn, and Thurlstone. ...",
citing yorkshiremilestones.co.uk as its primary source.
I've posted this before and it's still relevant.
LLMs keep demonstrating that summarising is their weak spot. They can shorten but, because they're inherently stupid and they have no idea what they are doing, these "AI" implementations are unable to distinguish the important from the not important. And that's key for summarising.
When ChatGPT summarises, it actually does nothing of the kind.
AI worse than humans in every way at summarising information, government trial finds
Not that is likely to be of any use.
The percentages look useful but are here woefully too low:
51 percent of all Trump's answers to questions have significant issues of some form.
19 percent of all Trump's answers introduced factual errors – incorrect factual statements, numbers, and dates.
13 percent of the quotes offered by Trump were either altered from the original source or not present in the article cited.
"By the way, what happened to ending the Russia - Ukraine war on Day One?"
Well Biden did nothing for 3 years and Trump, outside of his expected exaggerations, is actually in the process of putting an end to this. From what I have read it might take a week or two more.
BUT, at least his is doing something and he will an end to this stupid war.
Whereas the last crowd just got rich whilst spending crazy amounts on American Tax Payers money.
You people are so blinded by your own refusal to see how you were being turned over by your own side.
@AC: totally not. Negotiations over Gaza have taken months by people expert and invested in the situation. Trump comes along with his lead wellies, applies “common sense” and unilaterally walks all over the delicately agreed terms of the cease fire.
Watch or listen to any actual expert historian or journalist on the Middle East, and you’ll realise you keep out or tread extremely carefully.
I spent several years in the Middle East and if you think that diplomacy works there then it just shows that you don't know who you are dealing with.
Why do you think that so many of their leaders are hard liners, Arrafat, Gdafi, Ayatolah Khomeni etc These are cultures that don't wear velvet gloves, they have swords or large knifes tucked into their wasitbands... They are warrior nations that pride themselves on being warriors.. I don't knock them for that, for that is their culture/choice.
An odd claim to make bearing in mind that it was diplomacy led by Henry Kissinger that persuaded Yasser Arafat to negotiate a peace settlement that had succeeded in maintaining an (albeit uneasy) peace between Israel and the Palestinians since the 1990s until Hamas decided to smash it in October 2023.
Yasser Arafat was even awarded the Nobel Prize for Peace over it.
AI summaries turn real news into nonsense, BBC finds
Well, they would, wouldn’t they.
* ....... https://en.wikipedia.org/wiki/Well_he_would,_wouldn%27t_he%3F
Yeah, that could well be the fulcrum of many a tightrope balancing acts and pivots. IMHO though, genAI's relationship to News is best apprehended metaphorically as that an ElReg commentard has to its TFAs. Rather than News Summaries, the LLMs' outputs are best viewed as automatically generated machine "opinions", sometimes factual, sometimes "creative", most times drug-inducedly hhaalluucciinnaatteedd II tthhiinnkk!
* 51 percent of all AI answers to questions about the news were judged to have significant issues of some form.
* 19 percent of AI answers which cited BBC content introduced factual errors – incorrect factual statements, numbers, and dates.
* 13 percent of the quotes sourced from BBC articles were either altered from the original source or not present in the article cited.
Still better than BoJo
If you don't *understand* what you are summarising. Then you can't summarise it.
Languages aren't equations. If they were we'd be speaking in equations.
Languages are vectors for communication between sentient beings.
If only someone had mentioned this 30 years ago.
Oh, hang on, I did. When I did my Masters.
"AI" is currently like "spirit magnetism" or whatever electricity was sold as by the sharks of it's day. The unscrupulous flogging to the gullible on the back of the ignorance of the population.
<....."Pete Archer, Programme Director for Generative AI, wrote about the corporation's enthusiasm for the technology, detailing some of the ways in which the BBC had implemented it internally, from using it to generate subtitles for audio content to translating articles into different languages.".....>
We have had programs for speech recognition and language translation for some years.
Neither of them are artificial intelligence.
<......."An OpenAI spokesperson said: "We support publishers and creators by helping 300 million weekly ChatGPT users discover quality content through summaries, quotes, clear links, and attribution. We've collaborated with partners to improve in-line citation accuracy and respect publisher preferences, including enabling how they appear in search by managing OAI-SearchBot in their robots.txt. We'll keep enhancing search results.""......>
Translation: An OpenAI spokesperson said a lot of bollocks and completely missed the point.
Nothing deep, just that I'm not hearing anything mentioned about Gell-Mann Amnesia. It's the BBC in this case, and I have lost count of their news 'stories' I have particular knowledge of, which were reported with so many factual innacuracies as to be worthy of any AI hallucination.
It would be worthy of the BBC to actually spend as much analysis finding their own beams, rather than search for motes in another's AI.
Nicely timed by the BBC to throw credible doubt on “AI” just as the government et al are going big on all the benefits of AI.
The BBC may be biased and left leaning (if you believe it’s Tory supporting critics), but with this piece of news it shows it’s left leanings are not aligned with the current governments left leanings.
I've been trying tu get it to rewrite my cv. I'll literally upload my cv into chatgpt and give it a job spec. It then goes off and makes everything up.
I tell it not to lie..it actually apologises & then carries on making stuff up.
The REALLY scary part is that recruiters & hiring managers are using this shit to filter the CVs they get!
The AI, which isn't really AI, is a bubble waiting to burst but also will cause deaths. People are relying on it too much to "summarise" e-mails.
One day, at the DWP, it will summarise an e-mail from a resident that is attempting to claim benefits. It will miss out that "Jane Doe is very vulnerable and needs help. If she doesn't get it, she will die". So the lazy person who had the e-mail summarised for them, won't double check the result, file it and Jane will die.
It already started at the DWP I believe it was, when they set AI up to decide peoples benefits. They had to turn it off as it just kept saying no to pretty much everyone. Appears they are rolling it out again.
If you know anything about specification gaming with AI. If it has to get Jane Doe from A to B quickly then it will just decide to launch her there and after say "You never said she still had to be alive when she got to point B".
Generative AI can only create bullshit: content that looks plausible but may or may not be correct, accurate or truthful.
https://link.springer.com/article/10.1007/s10676-024-09775-5