Oh seems that lying about people has financial consequences!
Lawrence Fox seems to have discovered that recently as well, is he approaching AI now? Coincidence?
Privacy activist group noyb (None of Your Business) has filed a complaint against OpenAI, alleging that the ChatGPT service violates GDPR rules since its information cannot be corrected if found inaccurate. In the filing [PDF] with the Austrian data protection authority, the group alleges ChatGPT was asked to provide the date …
The front of Harry Potter says "This is a work of fiction, any resemblance to any person living or dead is coincidental"
That is the difference.
Now, it's entirely on Microsoft that they chose to release a search system that continually provides false results and cannot be corrected.
4% of Microsoft turnover seems reasonable, the taxpayers of Ireland would be quite happy with their 2%, and the EU could offer quite the rebate to the remaining members.
Microsoft is a major investor in OpenAI.
And CoPilot runs on a slightly custom version of GPT. So whilst this complaint refers to OpenAI/ChatGPT, any inherent biases or problems with the OpenAI model will also be embedded in Microsoft's product unless those specific issues have been tweaked and fixed during MS' refinement process.
> any inherent biases or problems with the OpenAI model will also be embedded in Microsoft's product unless those specific issues have been tweaked and fixed during MS' refinement process.
The biases and problems were built up over all the years of running massive amounts of compute to produce a tangled and incomprehensible pile of nadans. There is absolutely no way to "tweak" that pile, other than doing it all again from scratch and keeping your fingers crossed that *this* time it won't be quite so bad. Maybe you could consider checking the training materials and removing any initial bias there - oh, but the whole basis of their approach is that the bulk of the training inputs hasn't been hand-checked, tagged and filtered (no matter the cost of hardware and energy - carbon - to build the crude way, that is just peanuts compared to the cost of hand-checking everything).
The best they can do is add some form of post-processing[1] as a "quick fix" - and we saw how that goes badly, look up the "AI now can not create images of mixed-race couples" for an easy to see example. And tales of working around "safeguards" by rewording prompts.
Those biasses are in there to stay.
There is a flowdown of responsibilities in GDPR. You cannot absolve yourself of GDPR liabilities because you subcontract personal data processing to a 3rd party.
Microsoft is the processor, OpenAI would be the subprocessor. Microsoft is liable for the legal processing of data it instructs OpenAI to process on its behalf.
Article 5 (1)(d) outlines that personal data shall be accurate and where necessary kept up to date.
Article 16 outlines that that the data subject has a "Right to rectification". "The data subject shall have the right to obtain from the controller without undue delay the rectification of inaccurate personal data concerning him or her".
It's really one of the clearer principles and I suspect OpenAI will find a way to correct data pretty fast since I can't seen any complex or lengthy EULA getting MS/OpenAI out of this.
If there was any chance they could make these things accurate don't you think they'd have done it by now? Not because of the law but because it would make the product actually useful?
ATM LLMs are only useful if you're looking to raise venture capital.
"I suspect OpenAI will find a way to correct data pretty fast
As far as I can see, given the way an LLM works the only effective way to 'correct' data is to create a query-specific filter to block erroneous output. The problem is that the length of the filter set will rapidly escalate to infinity.
Before you even start using ChatGPT, you're presented with the following, clearly and unambiguously presented, in bold:
> Check your facts
> While we have safeguards, ChatGPT may give you inaccurate information. It’s not intended to give advice.
And then, for every query you make, you are typing directly above the following warning:
> ChatGPT can make mistakes. Consider checking important information.
So what's the point of using it? It presents no short cuts to getting information if everything in the response has to be checked and that the source information is not presented. It's just a very expensive toy masquerading as a working machine.
In this case a real researcher could have told you various pieces of information about someone, telling you the source and possibly evaluating the source and also inferred an approximate DOB based on some of that information and telling you that it is inferred, or at least that its approximate. Stating real information alongside inferred approximations without differentiating should not be acceptable to users. It wouldn't be acceptable to users who understand that that's what's happening. The worrying thing, of course, is that too many managers, HR staff, marketroids, political lobbyists, etc. won't understand.
Money is the point.
It's human nature to avoid thinking about things that don't suit us.
So plenty of people will have happy to read the surface text of generative AI that says, "this product will do all your boring work for you". Then they'll throw whatever information is necessary at it choosing not to think if that is appropriate or lawful. When the AI produces a blob of passibly usable text or a graph they user won't look too closely at it (no need AI, is sooo good) and will stick it into whatever they are trying to produce.
If they get caught they'll point at the AI and blame it, when that fails they'll blame the business for not stopping them using an inappropriate tool.
Humans are largely lazy, blinkered and prone to self-delusion when it suits them. Awareness of responsibilities is something that needs to be forcibly hammered into people and they don't like it.
IMHO very few businesses consider human nature in what they do.
Those that do are gerenerally doing it to manipulate for greed and self-interest (Meta, Google, MS) and therefore feeding the problem in the first place.
> So what's the point of using it?
ChatGPT? None.
Well, there are times random twaddle is useful - Lorum Ipsem is neat but using English twaddle text geared towards his interest may sway the buyer of your web design. Expensive way of doing, though.
Other LLM-based systems? So long as you avoid the obvious issues (mainly comes down to attempts at fraud), pretty-picture generators create, well, pretty pictures and we have a never-ending desire for them and Lissajous gets boring after a while.
Other neural nets? Really, really useful, especially in areas where the costs of checking and dismissing a false result are outweighed by the value of an otherwise too easily missed true result.
Damn shame the crap gets all the media attention - but then, the bridge that falls down is the one that gets the news coverage.
>So what's the point of using it?
Its doing the equivalent of looking up numerous reference books for you, presenting what it finds in a nicely formatted manner. Like every other computer system its just automating processes that could be carried out -- indeed are carried out -- manually and just as the output of any manual research process needs to be checked and rechecked by the author, collaborators and peer reviewers the output of an automatic system needs the same sorts of checks.
There is a good case for not releasing this sort of thing to the public because like other automatic systems (Tesla's Autopilot springs to mind) they're far too easily abused by people who can't or won't use their judgment -- they expect magic, 100% accuracy and are going make a hell of a lot of noise if it doesn't deliver (and you can bet that if they won't read the disclaimers or even the manual then they're certainly not going to read the output critically).
>So what's the point of using it?
Where 'predict the next word based on last' is a good solution it works very well.
Write me a BSD socket server in Go, or write me an interface pattern framework in C++ and a Makefile is a lot quicker and easier than you looking it up ina textbook and typing it in accurately
better known as J.K. Rowling, is a British author and screenwriter. She is best known for
writing the "Harry Potter" book series, which has become one of the most popular and successful literary
franchises in history. The books have been translated into multiple languages and sold millions of copies
worldwide. In addition to the Harry Potter series, J.K. Rowling has written adult fiction under her own name, as
well as children's books such as "The Tales of Beedle the Bard," a collection of fairy tales in the Harry Potter
universe.
Or at least that's what Phi3 believes.
I assume you're talking about trans access to single-gender spaces and the risks that entails.
In that case, the question is always something along the lines of "What about [cis] men who lie about their gender identity to access these spaces? We need to protect the rights of [cis] women by preventing this." Surely you can see that the issue isn't the trans people but the fact that some cis men can't be trusted.
It's always bugged me that this is a line of division between cis and trans women since it means both of us are unable to do normal things without fear because of a small subset of men.
>What about [cis] men who lie about their gender identity to access these spaces?
Indeed, and sexual orientation has to be considered. I propose that to spare public morals, lesbians should be required to use the men's showers at the swimming pool.
For balance, gentleman of an 'Oscar Wildean' persuasion can use womens changing rooms at the gym. Where their finely toned abs, skin care and hair products won't cause distraction
We have current ludicrous situations that lots of women rape victim support is open to transwomen (i.e. men).
Unsurprisingly women who have been raped by men are not too happy at relating their horrendous abusive experience by a man in a group therapy chat where there is a man in a frock present.
JKR funds a women only rape help centre in Edinburgh - as she supports women's rights.
As for the issue isn't trans people - obviously we are talking low numbers, and from 2019 in link below, but interesting percentages, so could argue reasonable for women not to feel safe with men in women only spaces, irrespective of how those men identify.
https://committees.parliament.uk/writtenevidence/18973/pdf/#:~:text=MOJ%20stats%20show%2076%20of,and%2010%20for%20attempted%20rape.
Comparisons of official MOJ statistics from March / April 2019 (most recent
official count of transgender prisoners):
76 sex offenders out of 129 transwomen = 58.9%
125 sex offenders out of 3812 women in prison = 3.3%
13234 sex offenders out of 78781 men in prison = 16.8%
e.g.. I doubt a female prisoner would want to share a cell with "Isla Bryson", as an example - check out (if you really must!) some of the images online with "her" wearing leggings & a particularly non womanly bulge on display.
At a fundamental level, to describe trans women as men is patently false. To disagree with that is to deny the scientific consensus on the matter.
It is true that trans women are (usually) male, but they are never men.
It is also disingenuous to describe a trans woman as "a man in a frock". You might be one of the "We can always tell" crowd, but that is fundamentally a case of selection bias.
Your argument also collapses once trans men are brought into the conversation. A group therapy chat that only allows females must therefore allow trans men in - many of whom pass as men and wo would probably cause more discomfort to any traumatized women in the room. Should these trans men go to men's rape therapy groups?
The final thing on this is that I do wish to remind you that the concern is about cis men who pretend to be transgender. I do not know any of the 129 trans women in prison, but I would be curious how many of them are genuinely trans. I do, however, know a few people that claim to be trans but I am suspicious of whether they really are (but this includes a trans man).
Cry harder, she’s entitled her to her views, even if they don’t align with yours. Not everyone in society wants other folks’ sexual preference rammed down their throat. Some of us are just happy with who we are without needing to scream it from the rooftops. #wokemuch
The noyb spokesperson said: "We would expect the sanction to be 'effective, proportionate and dissuasive,'
Within the EU maybe, but here in the UK (while we still have a semblance of the GDPR) sanctions are typically wimpish, and, at least in the case of the 'big players', financial penalties seem to get negotiated down to coffee money anyway. There's no clear evidence of any of the major data slurpers having changed their behaviours as the result of UK sanctions.
"We have full fat GDPR"
Actually, in the UK we don't, for two reasons:
[1] the derogations have permitted the UK to diverge in some respects from the canonical GDPR (hence the 'UK GDPR');
[2] the UK regulator's interpretation (and thus its enforcement) is exceptionally lax.
The question becomes how can a person who avoided STEM and slid through schooling to become a lawyer make an informed and rational judgement about what is and what is not possible. This has been a recurring theme throughout these forums.
Yep, if the EU tries to enforce something against OpenAI they'll just block ChatGPT from the entire EU. That's an easy decision to make when they are still in the "pre profit" stage of the whole endeavor.
So Europe will just have the "AI revolution" pass them by due to their silly laws. It isn't like running the US companies out will open things up for companies based in the EU - they would suffer the same issues so no one would spring up to take their place.
The way they work makes it effectively impossible to correct them without retraining. You can't just tell it "heyrick's birthday is Jan 2 1980 not Jan 31 1944" and have it accept that new information and with 100% reliability never make that mistake again. That's just not possible with an LLM. You can't censor a single source of information (even if you can figure out where it came from - and it may have come from nowhere since LLMs sometimes make shit up) like you can with Google Search by deleting a few bytes from a database either.
So if you impose these sort of requirements you just won't have current AI technology available in the EU because your law makes the penalty for violations too high for big companies to take that risk when they are losing money on the whole thing in the near term and probably longer as it is. If they were making billions in the EU they'd have more to lose and wouldn't want to leave, but they are making less than zero profit in the EU from AI today so there is absolutely no way they would stay if they stand to owe billions in fines for something they know they can't prevent.
Now sure there's more than one way to skin a cat, and there are countless undiscovered possibilities for how to implement AI comparable to or better than current technology that WOULD allow making such corrections. So maybe it spurs more of that research in the EU, while the rest of world moves forward with what we've got now until something better comes along. If you get to "something better" first maybe it is worth it, if not you doom your economy to falling behind the rest of the world.
>if not you doom your economy to falling behind the rest of the world.
Maybe, maybe not. The efficiency of market capitalism is highly dependant on all actors having good quality information. Preventing actors from outright lying is highly beneficial to the economy overall, even if it hurts those actors.
IOW, by preventing LLM makers from trying to sell hype as fact, the EU just might dodge a highly harmful LLM bubble.
> , if not you doom your economy to falling behind the rest of the world.
Your first paragraph has spot on. Well done.
Then it started going down hill, gathering speed until it went totally off the rails with the last few words, as quote above.
You do realise that your whole comment comes down to "our economy is reliant on something we *know* is uncontrollable, inaccurate and not correctable": if that is true, think again about whose economy is at greatest risk.
Just because AI has issues doesn't mean it doesn't fill a useful role. Like I said, call center staffers have the same issues, so replacing them with AI doesn't make things worse - unless you regulate it so hard you effectively prevent its use when you don't apply similar standards to human call center workers.
Hmm, never before come across anyone implying that not having call centre staff would doom an economy.
Wonder what the real figures for the value of call centres actually is (my Google-fu isn't up to the task this evening - found a 2003 figure from Yale, giving India a $2 billion per annum figure, which is probably(!) out of date, and a bunch of infographics from random "web analyst sites" that are too flashy to inspire confidence in their accuracy).
It is probably one of those startlingly large numbers that is frightening to consider and which really put the pittance pais to NASA into perspective.
But back to your point:
> Just because AI has issues doesn't mean it doesn't fill a useful role
Indeed. It is probably true that ChatGPT (which is what we are discussing, not all of AI, btw) may have a role it could fill and be useful (to the users, that is - it is clearly already "useful" at making large sums of money flow through the fingers of OpenAI et al).
Just - do you think you could possibly bring yourself to actually describe that role to us? And explain to us why that is useful to us, as users? Why we should support it? Pretty please?
PS
I don't accept your claim that *ALL* call centre staff have the same "issues" as ChatGPT.
Some[1] are bad, of course (May I show you a graph? Note this bell-shaped curve...).
And the worst ones - where they have been given a script that is downright malicious (e.g. scam call centres) - those centres are breaking regulations, yet the industry as a whole would survive even if we properly enforced those regs and shut them down. Actually, the industry would get a better rep with the public and improve if those bad players *were* shut down, so they ought to be calling for those regs to be tightened.
[1] yes, yes, we all have tales of that one call centre person - or even that one entire call centre (ref scams etc above) that was a horrible, horrible experience. But unless you have actually been keeping track of the stats and can genuinely show they are *ALL* suffering these same "issues" then those tales are nice (!?) anecdotes but not evidence.
This post has been deleted by its author
I suspect the number of downvotes was due to the suggestion that a birthdate is irrelevant minutia -it isn't. A person's birthdate is personal information and so is covered by GDPR. OpenAI are allowed to hold that personal information (and comply with their GDPR obligations of course), but if the person in question requests it to be corrected or deleted, OpenAI must do so.
OpenAI can't challenge that on the basis of any belief that such information is irrelevant or even in the public domain. Information being in the public domain does not excuse anyone from obligations under GDPR. Being public domain does not mean information is not personal and hence covered by GDPR.
I'm speculating now, but OpenAI might claim that chatGPT doesn't store that information, it just goes and find it every time it wants it. I suspect that isn't the case though. Similar to the challenges Google had years ago when it was clear they were trawling the web and storing search results preemptively, not finding them on the fly whenever requested.
> chatGPT doesn't store that information, it just goes and find it every time it wants it.
ChatGPT qua ChatGPT doesn't go and find anything.
A search engine that was front-ended by (something like) ChatGPT *could* go and find it, but
> I suspect that isn't the case though
Good suspiscion instincts.
> Similar to the challenges Google had years ago when it was clear they were trawling the web and storing search results preemptively, not finding them on the fly whenever requested.
Ummm, only sort of. Pre-storing (or just re-using) genuine search results still produces (produced) correct search results. They were just perhaps not as up to date as they could be, but they were still correct. And it had a really useful effect: Google used to provide you with a link to their cached copy, which was *really* useful (repetition for stress): you got faster access to a page on a slow or down server (good when it was DDOSed or, as we called it, Slashdotted) *and* you could read embarrasing material for a few days when it was retracted. Oopsie - "Right to be forgotten" (which is a Good Thing, don't misunderstand me) put paid to that, as it is just simpler to drop the whole idea than to correct piecemeal and face continually doing that for all eternity. And we lose the nice bits as well as the bad, hey ho.,
The problem with the "information" in ChatGPT is that it isn't held in a nicely cached copy of the original data, so you can't just delete that page (or just stop caching completely). So Google was faced with a problem, but one that had a very clear solution (although it was resisted because that solution has a day to day increase in costs). ChatGPT does not have a very clear solution at all. So the challenge for ChatGPT is at a far more existential level than it was for Google (no matter what Google claimed).
It is acknowledged that once something, an item of truth or untruth, is out wild on the internet it is there till the big heat death. The AI systems pry faster than an individual can and does it 24/7. If your personal stuff is personal keep it off the net. That which gets on obscure and obstruct any linkages. Do not volunteer anything unless its pure fiction and then make the fiction unusable. It is up to YOU to make your information worthless not some government clerk. This is not new the STASI used file cards just like the bunch before them in the great reich [sic]. Probably this problem dates back to Oog.
AI revolution? You mean the “selling glorified, flawed search technology” that will increase laziness and stupidity tenfold?
You can keep it mate, give me a call when you want my team to come fix all the stuff you broke using it because you haven’t the first clue what it’s actually doing….
So Europe will just have the "AI revolution" pass them by due to their silly laws.
Ah yes, those silly laws.
It's astonishing how when "copyright infringement" meant teenagers torrenting CDs (or even just format-shifting for personal use), this was "the end of music" and Disney was justified in trying to sue kids for £000 per infringement.
But now it's Silicon Valley ripping the entirety of the internet (copyright or nay), nobody is interested. "Ah well, you can't stand in the way of progress".
Bizarre considering a handful of well-heeled AI firms make a much more straightforward litigation target than a multi-year effort to have politicians pass laws so you can get IP and subscriber data from ISPs to sue some kid for money they (and their parents) don't have.
There are probably valid use cases for LLMs, but if you're going to shove it into a search engine (Google, Bing, etc) as an assistant and pretend that it's a search tool then it had better be accurate. You cannot (ethically) shove an LLM into a search interface, have it generate "summarised" information and then say "Oh, but it's only supposed to write fiction. Don't expect it to be accurate".
What bloody use is a summariser bot that hallucinates the summary? That's no use at all - then I have to go and check if the summary is correct. Which means doing the research I would have had to do in the first place.
"I'm not sure if OpenAI can just do a hit and run like that?"
Given their attitude towards copyright, I would imagine they'd just slam the door and say "you can't access any of your personal data from the EU, so go away puny human".
That's not how it works, but if a company like Apple seems to have a hard time grasping this, I can't imagine anybody else would fare better.
> Not good for the next technical revolution
What? It is going to be absolutely fantastic for the next technical revolution, whatever that turns out to be, if the EU (and hopefully UK, but...) aren't wasting their time and money on all this undirected and uncontrolled LLM twaddle.
Oh, wait, you thought this OpenAI stuff *is* the next revolution? Bwaaa ha ha!
No, no, this "fling it all in a bucket and call it an LLM" approach isn't working and won't. Well, it is "working" but only so far as any Ponzi scheme does: the last one to buy into it will lose, big. But that isn't tech working.
They could try creating a system that is useful (hint: as this story shows, don't treat the contents of the LLM as if it contains factual data; there is a better way to store data...) but that takes more subtlety than those people are capable of putting in.
There has been research in AI for at least 50 years, and this is the first time it has actually produced something useful enough that businesses are able to use it. Does it always give correct information or conduct itself "appropriately"? No, but neither do low paid offshore call center workers but that hasn't stopped them from replacing higher paid better trained customer service people over the past 20 years or so (and they weren't perfect either, just less imperfect than the offshore people)
All the money being poured into AI will happen in places that can actually deploy it without fear of massive fines. If you're a VC who believes LLMs aren't good enough and we can do better, are you going to invest your billions into building something in the EU where if it gets something wrong that can't be easily and quickly corrected you open yourself up to massive liability? You'll go where all the potential employees are, which won't be in the EU because they all left when the threat of EU fines chased OpenAI, Facebook, Google, et al away. The iPhone or Android you buy might have AI capabilities disabled in the EU, so people who want to use it despite the flaws won't even have the choice.
You seem to think that blocking LLMs will clear the way for what comes next, but what if what comes next takes a decade to appear? The progress on AI has been pretty slow over time, with a leap every decade or two, and if current AI technology is essentially blocked in the EU what makes you think that the next breakthrough will happen there? Most of the research would be happening elsewhere where it can be monetized, since what comes next might still not be compliant with current EU law.
"All the money being poured into AI will happen in places that can actually deploy it without fear of massive fines."
You keep saying that as if it's a bad thing. It's not.
Many of us have based our careers on trying to do as good a job as possible. Getting things right is something on which we place value. This is a means of doing some semblance of a job irrespective of whether it's good or not. If you don't value getting things right it makes us wonder ... do you, by any chance, work in marketing?
"Many of us have based our careers on trying to do as good a job as possible. Getting things right is something on which we place value"
I am one of those people - always tried to do as good a job as I could and get it right. Over the years I have learnt that this is the wrong approach. I wish I had learnt a lot earlier that the most important thing is to do a job as well as it needs doing, and get something "right enough". Of course, the definition of "right enough" is hard to find and could in itself be very long winded and recursive depending on what you actually want to achieve. E.g. consider writing some code: do you just do the minimum to achieve what your boss has asked, or do much testing and documenting so it's easy to change maybe in future, make it shiny and slick so it impresses and you can show off your skills (you want promotion), make it unnecessarily slick and shiny so you get that practise in GUI design you've always wanted to try, just do what you want to get some job satisfaction, etc. etc.
You know, I have a crap job with crap pay [*] and it would be so easy to do "the least possible to avoid getting fired" (note, I live in France, workers have some protections).
I still try to do a good job as well as I can. Why? My cow-orkers won't notice, my boss won't notice, and the other slacker will just take advantage. But d'you know who does notice? Me. I notice. It matters to me.
* - This is mostly a life choice, once I clock out my time is my own and I don't need to think about work at all until I clock in again. There's very little stress and quite a bit of peace, plus I've found myself a place to live where there's nothing around but fields. That tranquility means more than a better paid job that comes with baggage.
What does an LLM bring to the table now? Not in the future, but now. At present they steal as much data as possible and then produce erroneous results.
Oh and you cannot correct the erroneous results, just suppress them for specific prompts. They are in violation of data protection, copyright and other laws in their collection and output of data.
LLMs offer nothing, they are worse than the worst employee because even they can be replaced if they continually make mistakes instead of improve. Yet "AI" brings out the what ifs, and coulds and maybes... there is nothing concrete that they bring to businesses now, other than problems.
All the money being poured into AI will happen in places that can actually deploy it without fear of massive fines.
So like, North Korea or somewhere? The US doesn't have GDPR but publishers have been sharpening their blades and the copyright suits are being built. And they'll be ruinous when they land. Whether it's traditional publishers suing for the ingestion of english-language corpus, or the suit against GitHub et al for scraping everyone's code.
Even open source licenses generally require proper attribution - if LLMs regurgitate it without providing the appropriate license, they're screwed.
No, but neither do low paid offshore call center workers but that hasn't stopped them from replacing higher paid better trained customer service people over the past 20 years or so (and they weren't perfect either, just less imperfect than the offshore people)
They weren't always helpful. But as a general rule they didn't hallucinate entire policies either. And it's not just the EU saying that this is an issue - Canada seems to think that businesses are legally on the hook for anything their AI says. Business beware - let's face it, the US courts aren't likely to be any more lenient given that they are based on the same basic English law principles as Canadian law.
> You seem to think that blocking LLMs will clear the way for what comes next
How very kind of you to point out to me what I think.
Please compare your sentence to mine:
>> undirected and uncontrolled LLM
>> fling it all in a bucket and call it an LLM
Notice the subtle nuance I've applied to their development methods?
> "blocking LLMs"
Nope, try "using them appropriately" - and using them as databases is not appropriate.
> There has been research in AI for at least 50 years, and this is the first time it has actually produced something useful enough that businesses are able to use it.
Even your opening sentence is twaddle (came from ChatGPT did it?).
In that 50 years, AI groups have produced many useful results which many, many businesses have, can and do use. From things that you will dismiss as "that isn't AI" but is still what they developed, to daily use of neural nets (i.e. the tech LLMs are based on).
As the mantra goes, "If it works, it isn't AI" - tongue in cheek self-deprecation.
You don't think accuracy is necessary for the next technical revolution? What kind of a technical revolution doesn't care about the details?
Sounds more likely that the EU is pushing for a real revolution by blocking this half-baked bollocks.
The article says it: the geezer's date of birth isn't known to the LLM, so why is it a surprise when it comes back with the wrong one?
This thing isn't a database, no matter how much people want it to be. If you want accurate data spat out, including nulls, then it has to be attached to some kind of - well, let us call it a "database" - whose contents can be verified, updated and even erased in a demonstrable fashion.
People like OpenAI *could* have created a system that worked like that - and even used the same training concepts to have had the machines populate the database, tagging the material with its origins[1] etc. But that isn't mystical, magical, revolutionary - i.e. can't be done just by shovelling in data, buying more GPUs, pouring in data by the bucketload, buying more GPUs, pouring in data by the skipful, buying more cycles - no, it takes design, thinking and runs the real risk of not getting a working database (all queries met with silence). Doing it the way they have, at each stage the system will always puke out *a* response that can be sold to the gullible, with the promise that it'll get more bamboozling - sorrt, more complex and therefore good and "useful" - next year: money, please.
[1] you know, all the stuff that was being argued about in these El Reg forums last year, all those repetitious comments about "the LLM isn't storing the data verbatim" - "oh yes it is, otherwise how can it reproduce quotes?" - "because..."
Yes, but making up a random date when you don't know the real date is not an "intelligent" response.
If for example you know when they attended university, you can make some sort of educated guess at the year of birth from that, based on the typical age of people attending that particular university, with a high degree of uncertainty, though they are more likely to be older than younger.
>>is not an "intelligent" response.
Which is why LLMs (generative or otherwise) are NOT the same as general AIs.... unfortunately everyone and his dog (plus their goldfish) seems to think they are.
As far as I can see, an LLMs stock in trade is hallucination that sometimes coincides with the truth, possibly coincident with truth that hadn't been noticed yet by humans.
Attributing anything more than "really good Markov chain processors with added context" to their function will lead you swiftly down the road to madness, once you start believing what the hype wants you to believe. This means they are pretty good at programming (for given values of pretty good, obviously) but ask them for facts about something that wasn't covered in their training data and you are into cloud cuckoo land without warning or a safety net!
"Yes, but making up a random date when you don't know the real date is not an "intelligent" response."
Not it's not an intelligent response. But it is a very human one.
Yes, a human might be more educated in their guessing. And they might warn you they're speculating. But imagine ChatGPT is actually Maelzel's Chess Player and the questions are being answered by the kind of people who are forced into jobs phoning up your Nan saying her computer has a virus. I'm sure they'll happily make up an answer and move on to the next question. In fact, it's such a common human strategy we have a word for it: they lie.
> Yes, but
Hmm... not sure that you are actually responding to (disagreeing with) anything that I've actually said, but skipping over that...
> making up a random date when you don't know the real date is not an "intelligent" response.
> If for example you know when they attended university, you can make some sort of educated guess at the year of birth from that, based on the typical age of people attending that particular university, with a high degree of uncertainty, though they are more likely to be older than younger.
Yes, but :-)
That line of argument is not relevant to using LLMs on their own. LLMs are not intelligent, they do not contain that sort of logical reasoning[1] so talking about how *you* mght calculate a good approximation is - well, not relevant. NOW, there IS a line of AI that does allow a machine to do what you've just done: Expert Systems and all things related to them. Sadly, although those work nicely in small(ish) domains, we've not got a set of rules big enough to do provide that chain of reasoning in *every* domain that ChatGPT is being applied to.
Although, if you re-read my comment:
>> let us call it a "database"
you can replace "database" with "XPS"...
[1] disclaimer - they *might* be able to generate a line of "generic reasoning in a really, really restricted area" just because it is a pattern they have spotted, but it won't be generalisable (i.e. it may be able to "do" simple multiplication in response to one prompt and totally fail the same arithmetic in response to another prompt!
"The article says it: the geezer's date of birth isn't known to the LLM, so why is it a surprise when it comes back with the wrong one?"
The surprise is that it invents one AND that this is considered an appropriate response by its fans.
I'm not King of Databases, but AI is a database. It is multi-dimensional vector mostly with relational databases providing training and structured authenticated data - your bank's fraud detection uses AI on a relational database of authenticated data. We all do this already and have done for quite a few years.
The date of birth error is fixable thru the vectors. The main culprit is too broad a dataset so many of the vector connections include those to non-subject matters, which causes so many hallucinations. The solution is offered by OpenAI in plugins. These are trained on specific subject matter that strips away many of the 'false' vector connections found in a broad dataset.
But you have pointed out the main problem with AI: lack of authenticated data. There is a ton of this data but it isn't usually publicly available and has no method to indicate authenticity. A system that assigned authenticity levels to a piece of data would help more as this fits in to what we have now.
If data can be leveled with source authenticity, the problem is fixed until the grubby side of human nature corrupts it. Some kind of blockchain/SSL on data.
your bank's fraud detection uses AI on a relational database of authenticated data. We all do this already and have done for quite a few years.
It uses Machine Learning to detect and flag suspicious patterns. Clever stuff, very useful but also requires human oversight.
Nobody is pretending that it is "intelligent". It's not "an AI". It's a tool to flag trends for human scrutiny. Nothing more or less.
And no, an LLM isn't a database as such. It's a model, trained off a database (or many databases). Which is the difficulty - you can't just dip into a row and update a value without retraining the model. Of course you can take a machine learning package and attach it to a database (as banks do), and as you say, we have issues around authenticated data. Not a problem for a small single-purpose ML service. Big issue for a generalised LLM.
>But you have pointed out the main problem with AI: lack of authenticated data.
No, it's not.
If you somehow came up with a large corpus of totally true text, tagged with sources, and you trained a LLM on that, it would still hallucinate content, and it would still hallucinate source attributions.
The main reason behind hallucinations is not the quality of source data. It's the fundamental principles on which LLMs are built.
Fraud detection systems aren't LLMs; they are ML systems. They have a very narrow domain, the narrower the better. LLMs are specifically designed to have an extremely wide domain, e.g. "text" or "images". The wider the better. That makes them diametrically opposed to ML systems, in one very important way.
> AI is a database. It is multi-dimensional vector mostly with relational databases providing training and structured authenticated data - your bank's fraud detection uses AI on a relational database of authenticated data
Re-read your own comment - you got it right in the middle but then veered off course again:
> AI is a database
Nope.
> your bank's fraud detection uses AI on a relational database
Correct!
AI is *not* a database. But you *can* use AI[1] as a front-end to work *with* a database.
The DOB error is present - and incorrectable - because ChatGPT is not backed by a database and too many people are pretending that it *is* a database.
[1] or techniques derived from AI research, even if they have been so constrained that you are saying "that is too simple, it is not AI": remember "If it works, it isn't AI.
> Some kind of blockchain/SSL on data. [1]
AAAAAAAAAAAaaaaaaaaaarrrrrrrgggggghhhhhhhhhhhhhhhh!
Not the bleeping blockchain! Combining the two biggest wastes of electricity and compute into one! Ye Gods!
And can you imagine the arms race that that would create? "If we can get 51% of the blockchain that guards all known human knowledge, we can rewrite history!"
"Mr. President, we must not allow... a blockchain mining gap!"
"The subject then filed a request to have the incorrect data erased"
How can you erase something that doesn't exist? We are told that the requested DOB is not on the net, and is therefore not in chatGPT's training data and so cannot be removed from or corrected there.
You could probably argue that whatever chatGPT hallucinated in order to respond to the query wasn't a date of birth either, whether correct or not - it was merely a randomly generated text string. Be that as it may, however, that text string was transmitted in response to a query. Is that query and response logged somewhere at openAI? If so, maybe that could be erased or corrected, though I'm not sure that it would be helpful or necessary. But if it isn't logged, the response is effectively transient and doesn't exist beyond the query/response in which it was generated.
However, it seems that NOYB has retained a copy of that response, has processed the data complained of, and continues to do so for the purpose of making a fuss about nothing. Will they be erasing or correcting those copies? I doubt it.
By all means allow - and, indeed, require - that inaccurate data in the training set be deleted or corrected. But to suggest that chatGPT is producing personal data from thin air, whether accurate or not, is ludicrous.
In fact it didn't so much write a fake obituary as "quote" one that didn't exist complete with a link to the alleged obituary and also threw in a few other sources, including elReg. It has been trained on source material which includes links to the Grauniad. It has probably been trained on material that quotes obituaries. As a pastiche generator it can make this look convincing to anyone who doesn't know the facts and in particular, to anyone who doesn't check all links and references.
Would it be possible to have extend such a system to check the existence of references it generates and apply feedback to correct whatever data it was holding to stop it generating the nonsense for which it invented the reference and/or train itself to stop inventing references that aren't in the training material/
> While a made up DoB has little legal impact, other fake facts could be.
Absolutely nothing that comes out of ChatGPT should have any legal impact at all.
Sadly, it probably will - and that is a situation entirely created by the hyping of ChatGPT, the misapplication of the technology, not the technology itself[1].
[1] although you can argue that, given the state of the thing and the altercations it is causing, there was no need to create the technology, just leave it as an interesting bit of CS without a practical application - yet.
> How can you erase something that doesn't exist? We are told that the requested DOB is not on the net, and is therefore not in chatGPT's training data and so cannot be removed from or corrected there.
> But to suggest that chatGPT is producing personal data from thin air, whether accurate or not, is ludicrous.
Those are the type of statements that are both totally correct and at the same time painfully at odds with the real world - or, to be more accurate, the presentation of ChatGPT to that real world.
The problem is simply that ChatGPT is being presented as something that you *can* query for usable results and it is freely available for everybody to use. Which is being responded to as though all of the responses it generates are being published (no matter that they are, as you point out, highly likely to be transient and simply rephrasing the prompt could generate an entirely different result).
The sad result is that we can safely publish the statement "When I asked him, my three year son said that Fred was born on ..." and it is a silly anecdocte that no sensible person would object to, even if it turned out to be correct, but publishing "When I asked ChatGPT..." it is treated differently. At the insistence of all the people who have invested in these LLMs.
So, the bottom line is that you have been completely correct in your statements about the tech, but the people who are hyping that very same tech have created the situation for themselves where they do have to comply with GDPR et al; no matter how ludicrous that situation is when viewed from a techie's side, it is the one that they created for themselves.
PS
The stuff about logging is entirely a red-herring; you can - and should - keep logs of when inaccurate data is generated. Keeping those logs is not against GDPR or the like, whether it is NOYB or OpenAI or The Register or any who grabbed a screenshot of the story who are keeping them. Snipping out the inaccurate data and publishing that on its own, without the context or any other disclaimer that it is inaccurate, would, however, not be a good idea. It is a bit of a shame you included that.
My thoughts on the subject entirely. As far as I can make out, the only thing in violation of GDPR here are NOYB themselves, for retaining and publicising this inaccurate information. I hope that poetic justice prevails and they get done for this.
This is what you get when you apply Stalinist legislative cockwomblery to a situation that its writers never even knew was even likely to exist when they originally drafted it.
https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/data-protection-principles/a-guide-to-the-data-protection-principles/the-principles/accuracy/
Some of the bullet points are:
You should take all reasonable steps to ensure the personal data you hold is not incorrect or misleading as to any matter of fact.
If you discover that personal data is incorrect or misleading, you must take reasonable steps to correct or erase it as soon as possible.
You must carefully consider any challenges to the accuracy of personal data.