
However ...
when the Internet searching ChatGPTs encounter stories about the man, who supposedly killed his kids, will it then "learn" from its previous hallucinations?
A Norwegian man was shocked when ChatGPT falsely claimed in a conversation he murdered his two sons and tried to kill a third - mixing in real details about his personal life. Now, privacy lawyers say this blend of fact and fiction breaches GDPR rules. Austrian non-profit None Of Your Business (noyb) filed a complaint [PDF] …
Tell me you don't know how LLMs are trained and how they generate output from that without telling me you don't care...
In this text, "child murderer" and this dude's name crop up really close together. For an LLM this means the two phrases are related statistically and should be used together. That's why we cannot have nice things.
if it were me I would ask how the heck they had any information about me and tell them to delete it under gdpr, which they are obligated to do.
"tell them to delete it under gdpr, which they are obligated to do"
Easier said than done.
Norway isn't in the EU so the EU regulations don't protect anyone resident in Norway no more than they do in the UK. It may well be that Norway has its own version but then does OpenAI have a legal entity resident in Norway that can be brought to book?
(Yes I know the UK also has its own version which successive govts. seem to want to move out of the way of LLMs and their owners in the interests of putting, as they think, GDP ahead of GDPR.)
It seems to me that thetr are 2 issues here.
1) That these systems are using my personal data without my permision and in ways that breach the law. (GDPR)
2) We have the concept of a legal person. It seems to me that Albert Idiot aka John Doe should be treated in law as a legal person.
So, when Albert slanders or libels me I can take him to court and seak redress for the damage that he caused. Also he (AI) could be obliged to publish retractions and correct the record so that further untruthes are avoided.
In other words, I dont think that we need any new laws or regulation, just the rigourous application of the long estabkished ones that we have already.
Tell me which part of it is illegal to output incorrect information about real persons?
I have been saying for years, outputting incorrect information is illegal and these companies need to solve this problem before they start pushing their systems on the public. It is the same as the copyright violations instead of licensing the information they use, like everybody else, they want exemptions because they are too important and paying the licences would affect their bottom line... If the cost of licenses isn't calculated into the bottom line, they didn't set up their business model properly. Likewise, if their products are in violation of the law, they should be pulled off the market until they can comply.
Ford had to pull the Pinto from the market, when it was shown that the fuel tank configuration could lead to explosions in low-speed impacts, my last 4 cars were all recalled because there were defective parts that could cause accidents or break the law, this is no difference, the LLM is defective and can cause damage, it should be fixed. OpenAI should invest the money on developers instead of lawyers trying to make them exempt from the law...
No, it's NOT at all like copyright.
Copyright DOES NOT APPLY to LLM training any more than it applies to a person learning from a book they read.
LLMs don't distribute copies of what they're trained with, they use it to learn how to make new stuff.
All the publishers whining about LLM training need to STFU, their rent seeking is disgusting. They're not entitled to a penny, and they know it.
"will it then "learn" from its previous hallucinations?"
That personal data is being served up as fact even ignoring the 'disclaimer' just shows that NONE of these systems are safe to use for any purpose. The 'models' that are being used live NEED to be able to learn when they are told something is wrong. Until that time any output is simply a best guess based on the crap that has been input so far?
Spent yesterday trying to get a section of my own websites working again using a combination of Mistral and raw search. That Mistral simply rewords the same wrong information just highlights another recent story about faulty output. Until I had a combination of facts so I knew just what questions to ask I was unable to fix the configuration so not sure that using Mistral is ACTUALLY improving productivity. Most of the questions were correcting it's mistakes trying to get at a correct answer and I suspect I would have solved the problem quicker had I just skipped straight back to raw searches of CURRENT information.
I think these hallucinations are not based on incorrect input but on forming incorrect connections. The man’s name was correct. And I bet there was a case where someone murdered his child. The “hallucination” is connecting both pieces together incorrectly.
There was a case of a court reporter who had been writing about some rather serious and rather varied crimes in his court. An AI then claimed these were all crimes he had committed, not crimes he had written about.
I like the phrase "stochastic parrot". Sums it up neatly
You beat me to the same comment.
LLMs desperately need a post-processing layer reality check. No good having an input data/ pre-processing check as that doesn't protect against hallucination.
I had a really good example recently. It quoted a citation which, unsurprisingly, didn't exist. However, interestingly the make-up of the citation, i.e. authors, journal, title, doi, was very close to real ones of the same. So even for a citation, it mixes up input data and creates the output. Clearly a citation has to exist exactly, but it doesn't seem to understand this.
The whole world of modern IT is based on no care, no responsibility, innit? IT related to control systems might be the only area where developers seem to actually still care about quality, though Tesla is challenging that idea I suspect.
As a non-IT engineer, I wish my mistakes didn't have the potential for people dying.
Please, please stop just believing the mainstream media, they are as deceiving as any Internet source.
Re Tesla, I have no idea what their defect rate is - do you? How does it compare? Yet, you disparage them, why? Because the media told you to hate Musk and publicise any failing. Did you hate Musk before he started supporting Trump?
Apparently the Tesla delivered to Trump had 57 recall notices against it. I would be interested how many recall notices were typically raised against a car from the 90's containing no software. and in the pursuit of balance raise against a modern non Tesla model.
"Because the media told you to hate Musk and publicise any failing"
Um, the media in the US is mainly at worst neutral in Musk to positively spaffing over him (Fox).
The media in Australia, where I am, barely mentions him
I felt favourable to Musk when he appeared to support traditional thought. Once he started acting like a jerk, I no longer took a favourable view of him. Your side won, you need to stop pretending everyone is against you.
They don't need to pretend everyone is against their side, because everyone who isn't on their side—having seen what their side is like—really is against them.
As in, they a FOR: honesty, compassion, doubt, truth, generosity, humility,... (and all the other things that the Trumps, Putins, and Musks of this world are demonstrably against, or 'for' only insofar as it serves their self-interest, so not really for them at all).
Please, please stop just believing the mainstream media, they are as deceiving as any Internet source.
No, that is a false statement.
The "mainstream media" (for whatever definition of "mainstream" that you personally choose) is not "as deceiving as any Internet source". It is often wrong, of course - and some definitions of "mainstream" are wrong more often than others - but there are "Internet sources" much worse than even the worst "mainstream media".
I'll be more likely to believe a source of information if it openly admits and corrects errors of fact.
I don't care if they're mainstream or not; I just know what Musk's attitude to fact checking is. So no, I won't even read your website if it takes a lawsuit to get it to admit it was wrong.
"The whole world of Modern IT?" Have you seen our licenses?
THIS SOFTWARE IS PROVIDED `'AS IS″ AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
Try selling a car, or a kettle, or anything other than software with that clause and see how far you get. Nothing modern about this trend.
>” Nothing modern about this trend.”
The modern addition is the decision to actually deliver software products with no real reasonable attempt being made to make the software fit for purpose. Ie. This “as-is” no warranties statement is being taken as a mandatory design requirement, in the interests of cost cutting.
"As a non-IT engineer, I wish my mistakes didn't have the potential for people dying."
Given that there are plenty nutjobs believing whatever AI tells them it's quite likely that AI will, if it already hasn't, led to deaths. The difference is that the techbros don't care because they're unlikely to be held to responsibility in the way they should.
Toyota software was already killing customers in the early 2010s, and I seem to recall a lot of teething pains when Airbus went fly-by-wire. I really thought there would be a political wave pushing for personal responsibility of software engineers developing safety critical systems, but I guess we are far enough off building bridges and skyscrapers that it's hard to discuss the risks without glazed eyes. I doubt it even hurt Toyota's sales.
https://users.ece.cmu.edu/~koopman/pubs/koopman14_toyota_ua_slides.pdf
No, it's not. I've consulted for 5 different organisations in the past 5 years and they all took legal responsibility for the data they processed.
There's a whole load of money being spent on "AI", and that is pressuring engineers to do shit work, but IME most IT workers takes data processing seriously.
This post has been deleted by its author
I think you'll find over time that the dickhead lost the election and the current incumbent is telling more truth than you believe possible, shaking your world view which you reject. I'll give you a hint the whole system is more corrupt than you can possibly imagine, was working for the benefit of a tiny minority at the expense of the majority. The new 'dickhead' is better than the previous and if you can't see that you are just listening to sound bites from media that was getting paid by the previous regime to lie. But hey don't worry I'm sure the destuction of the West and total collapse will resume after this short pause.
Sure the new dickhead is working for me. Before his tax reform i never owed the federal government any money paycheck withholding was enough. After the changes i've owed thousands every year and i make less that $50,000. That dickhead paid $750 federal tax while i pay 4 times as much. He claims to be a billionaire. I've been lucky if i could afford to pay heating bills. Tell me again how he's working for my benefit. As for being more corrupt? how is putting the owner of space x in a position to close nasa not corrupt? Not surprised you hide as anonymous coward. Biggest clue that he's not telling the truth is when he says "everyone knows this". He confessed to lying to the prime minister of Canada in front of a live camera and mic thinking it was funny. Likely Trudeau knew he was lying and just kept his mouth shut. He'd been dealing with trade negotiations for years, and if he wasn't sure on a point he could ask one of his ministers who might be up on the question. Sorry. he's unable to tell the truth ever about anything. he's not very bright. Anyone who thinks he smart must be even stupider, and anyone who claims he's telling the truth must be even more mendacious.
Seriously, I would like to know there you got 'your' world view ???
The common refrain from supporters of Trumpf is that they are 'right' you are 'wrong' because they are 'right' ... AKA Trumpian Logic !!!
Please reference/quote accepted sources to back up your 'World view'.
NO posts from Social Media/etc as these sources cannot be verified, no posts from other people simply repeating your views word for word.
Old style coherent logically argued proofs from people who have the knowledge/education/background to support their views with 'evidence'.
Definitely, nothing from Trump/Vance or any of their cronies that have been placed into position of power/influence, otherwise known as 'Puppets'.
(Trumpf is a 'fact/evidence free' entity and so are his cronies/puppets.)
I know this will be somewhat difficult BUT it would be very useful as a means to convince others of your point of view !!!
:)
Yes, we're all so grateful that President Trump has singlehandedly killed inflation, ended the wars in Ukraine and Gaza, won over the people of Greenland and Canada, balanced the federal budget, and created millions of jobs for Americans. Just imagine if that stupid Democrat had won instead.
The system works for a tiny minority at the expense of the majority? - well yes, that's true. All systems do that, sooner or later, when a tiny minority of people figure out how to play them. And the current undisputed leader of that tiny minority is Donald J Trump. He's done, and plans to do, nothing but extreme damage to the majority, because that's how he makes himself rich.
Farmers go bankrupt? Cheap land! Companies fail? Buy up their assets! People jobless? Cheap labor! US-dominated world order destroyed? Great news for his buddy Putin!
And that's the common thread here. As my dad used to say when checking bills, "if they were just bad at this, you'd expect half the errors to be in my favor." But with Trump, all the actions are consistent: they're all calculated to make rich people happy, and no one in the world, not even Elon, is richer than Putin.
This strikes me as being quite similar to the situation with software warranties.
Software in general is so complex with such high probability of bugs that if we forced software manufacturers to warrant that their software was free of defects, they would just pack you and go home.
But such is the usefulness of software when it does work properly, that at the end of the day we all have to take the risk, and build safeguards into society to handle the inevitable failures.
Perhaps people will start to learn to question what they see/hear on the Internet. If we are really lucky they will question what the BBC and mainstream media promote, their governments and worst of all political partys seeking election! Whilst, I have no particular love of Trump and Musk, I have yet to meet one of their detractors that has actually listened to an entire speech or interview, preferring to be told what they stand for by the BBC and a 3 sec sound bite. Meanwhile European & UK leaders call for WW3 - make it make sense!
I love LLMs but I don't believe what they tell me unless it makes sense and I can validate. I use one that gives sources, I would hope that is common? I'm probably some weirdo that actually checks those sources when its important or contraversial.
If everything that comes out of an LLM has to be checked and at least some of it discarded what useful purpose does it serve? People ask questions because they need answers. The LLM only seems to be useful if you already know the answer to check it.
In reality LLMs are going to be put in customer-facing positions where the customer is looking to customer service as the only definitive answer for a problem. When that's done without adding the disclaimer then the customer isn't going to believe they must check elsewhere and, in fact will have nowhere else to check. If the definitive answer comes with a disclaimer where do you go from there?
> In reality LLMs are going to be put in customer-facing positions where the customer is looking to customer service as the only definitive answer for a problem.
I think the courts have already passed judgement, if the customer service AI says something the company is on the hook to deliver.
Ie. If the AI says the $1m product is mine for $1 if I order now, it’s mine for a dollar.
I will actually upvote that.
I am not a fan of Musk & Trump, but they are doing the right wing stuff you would expect of them, not really anything surprising / unexpected (but obviously not great for non right wing people & even MAGA fans might begin to regret some of the stupid short sighted under researched "cost cutting" moves in years to come ).
I'm in UK* & despise the labour party far more on grounds of hypocrisy as they are targeting the disabled & poor & leaving the rich alone - right wing policies but voted in** on a pretend socialist ethos.
* Also a working class "proper" lefty old enough to have helped out in miners strike soup kitchens back in the day, so very disappointed by Starmer Labour.
** Not by me as it was obvious they were just red tories.
Yes, plain as day Libel.
The solution to avoiding Libel, as with newspapers and anyone else who gets stuck with such a charge, is to check your information. But LLMs can't do that, because they're not intelligent and have no understanding of what they're publishing.
Which should also make it fraud on a grand scale to call them AI.
It isn't merely ChatGPT.
Other models also create content that might breach GDPR such as Stable and other such things. The problem here is that (with a few minutes work) I can ask it to do something nasty and the AI will happily comply. Usually the model(s) available now have been sanitized but what about those already out there with content scraped from sources all over the place?
It literally is a complete minefield and this particular horse has left the barn a long time ago with many celebrities taking out 'Deepfake Insurance' to guard against some content they created in good faith being used to generate something unsavory eg from someone finding a 1990s vintage holiday/etc camcorder tape that shows a lot more than is in the public domain.
Had a word with some folks and legislation may well be incoming that bans certain *types* of LLM eg using models based on unethically or illegally sourced data if they can uniquely identify individuals who have copyrighted their likeness or other personal data.
On the flip side if LLM content is found and from long enough ago some folks just point out the diferences and call it out for what it is ie copyright infringement then send in the lawyers (tm)
This one, at least, should be a trivial fix. Foundational models can simply be equipped with safety rails to prevent them emitting sentences about any named individual. Then we don’t have to worry whether that information is right or wrong.
LLMs are simply the wrong tool to search information from the internet. That is what a *search engine* is for, clue is in the name. Identify an authoritative location for the requested info, and provide a link, without attempting to pre-process the data.
If people are using LLMs as search engines, then they are fools. There are plenty of good use-cases for LLMs, this isn’t one of them. But yes, fools will use it that way, so probably safety rails are required.
OpenAI are trying to argue that a guardrail to suppress the incorrect information is sufficient.
The victim of the hallucination objects to presence of the incorrect correlation being present in OpenAI's model. The victim is trying to exercise the right under the GPDR to delete incorrect information.
Using guardrails to suppress the supply of faulty information means that the model operator/owner will need to to have a lookup table of all the known incorrect information stored in the model. The contents of the lookup table will be reactive, this will end up being an example of "Falsehood flies, and the Truth comes limping after it" (Jonathan Swift https://quoteinvestigator.com/2014/07/13/truth/)
What happens if the guardrail for a specific piece of incorrect information is deleted?
LLMs are being "sold" as search engines. Like most technology very few people understand the problems of LLMs and most people are simply not interested enough to come to a view. I know and like a number of people who wouldn't understand the problems if you could persuade them to spend to be interested. If they trust you they may take your word.
I agree that there are many of good uses for LLMs but they are being hugely oversold.
I do take your point that it’s even easier, and more defensible, to do this at training-time. Just classify and remove all “Persons Name” in the training data. Or more sophisticated anonymisation, this isn’t 2010 any more, we have robust procedures for data anonymisation. The issue of removing “Isaac Newton” from the dataset, is trivially solved by adding a whitelist of “famous dead people, as defined by having an Encyclopedia Britannica article”. I just don’t see this is as a Hard Problem. There isn’t really a good reason for the LLM *itself* to know or encode people’s names or info. Just use a software agent to look it up on Google like a normal person, it’s not 2022 any more.
By the way, it’s important to realise this is more an issue with perception and feeling lied to, rather than personal data *actually* being stored. The relevant version of ChatGPT has 200 billion parameters = 200 GB. This just doesn’t *possibly* have room to store actual info about any significant proportion of the population, with dozens of bytes per person (including allegedly the names, ages of children, and town of residence). Otherwise we’ve accidentally discovered God’s compression algorithm. And Llama 7b can do it in 7GB, so it’s 28x better compression…….
“OpenAI are trying to argue that a guardrail to suppress the incorrect information is sufficient”
I say let them argue whatever they like, and fix it any way they like.
And the next time they accuse an innocent man of murdering two of his sons, give them a massive fine that matches the severity of their false accusation.
that "AI" isn't "intelligent" in any accepted sense of the word.
It doesn't understand whatever bilge it's producing. It's like someone who can vocalise the latin alphabet reading French phonetically without actually actually comprehending any of it. You could even learn the pronunciation and inflection to sound fluent, but still have no idea what you are saying.