back to article Funnily enough, AI models must follow privacy law – including right to be forgotten

In order to comply with data protection regimes, AI chatbots and associated machine learning applications will have to be capable of forgetting what they've learned. It's not yet evident they can handle that requirement. Researchers affiliated with Australia's National Science Agency (CSIRO’s Data61), and Australian National …

  1. stiine Silver badge

    Question: what about my memory

    Or rather, someone else's memory? I went to college with a guy who had eidetic memory. He could flip through a book one time, then recite it to you. How does the GDPR deal with this?

    p.s. he never bought college textbooks. He'd borrow each one during the first class, read it from cover to cover, and give it back in the time it took to flip the pages.

    1. Paul Crawford Silver badge

      Re: Question: what about my memory

      How does the GDPR deal with this?

      It does not. He is a person, not a public-access search tool.

      1. that one in the corner Silver badge

        Re: Question: what about my memory

        > He is a person, not a public-access search tool.

        What if he was both? For example, signed up as a Mechanical Turk and automatically passed a query that had otherwise failed?

        I know, rather unlikely, who would ever do that etc, but similar scenarios have been discussed as a midway between a totally man-made knowledge base (expensive, slow) and one created the way that the current crop of LLMs are doing it (ingestion of everything in sight, as fast as possible, and fingers crossed something usable comes out of it).

        1. Killfalcon
          Devil

          Re: Question: what about my memory

          In that case, he'd just have to memorise the list of people he's supposed to not give out the personal details of.

          1. that one in the corner Silver badge

            Re: Question: what about my memory

            Nice response.

            From my (old and probably horribly out of date) reading, not all 'eidetic' memories work in a way that would allow them to cross-reference a list like that with, say, what they are 'reading' off the memory of a book page[1], so having that list wouldn't really help. But then maybe those people wouldn't be suitable as a Turk in the first place, certainly not for a general-purpose search engine. So, yup, yah called me out on it.

            [1] there being a difference between asking "what was on that list?" versus asking "Is this name on that list?" - the latter can result in running through the list from top to bottom looking for the name[2]. What we would call one of the "idiot savant" behaviours in the old days.

      2. Helcat Silver badge

        Re: Question: what about my memory

        I'd say it does, but not in regards to his memories, nor even his recollection: It'd apply to how it's used.

        Because memories are essentially secure - they're encrypted and are wiped if the storage media is tampered with - the only concern with data collection would be if it's excessive (aka spying on someone, which is dealt with separately in the law).

        But writing that information down? That is where GDPR can step in and say 'delete'.

        Now have this image of a cyberman called GDPR...

        1. that one in the corner Silver badge

          Re: Question: what about my memory

          Not the same Universe as the Cybermen (nor that other Universe with the Cybermen) but they prompted the thought:

          How will these principles be applied once we have abandoned the thinking machines in favour of Mentats?

          It is quite clear that not all Mentats are honourable enough to apply the suggested "list of events you must not remember" and amongst those that are honourable they would still be bound to recall everything in the name of Duty to their House: Fred was found innocent and the charges struck, but he may still be vulnerable to blackmail and hence a weak spot.

          1. Anonymous Coward
            Anonymous Coward

            Re: Question: what about my memory

            Sounds like you've been hitting the Spice pretty hard.

    2. Version 1.0 Silver badge
      Happy

      Re: Question: what about my memory

      I knew a guy like that, he'd just flip through the pages and effectively could "read" it in his brain when he took the exam - he passed every exam 100% so everyone thought he was supper smart.

      I had another friend who could not read words at all but you could call out about a hundred amounts, e.g four and six pence, a quid, eleven and a half shillings, threepence halfpenny, ten bob, a guinea and half a crown .... etc etc etc ... and a couple of tanners etc, he would instantly tell you the total.

    3. 0xrsb

      Re: Question: what about my memory

      I have to laugh, because you basically just asked how we regulate Mentats from Dune.

      1. stiine Silver badge

        Re: Question: what about my memory

        Do you have any suggestions (helpfully ignoring all of the treaties signed immediately after World War II)?

        Other than scale and accessibility, what's the difference between Google Search and a person with this ability standing on a streetcorner answering questions? Does GDPR require that he not answer embarassing questions about you if you tell him not to?

        1. Killfalcon

          Re: Question: what about my memory

          GDPR probably doesn't apply to a dude on a street corner, but slander, harassment, public nuisance, fraud, conspiracy, etc already do, depending on what said dude is saying.

          Scale does change both the impact and the legal details.

    4. Efer Brick
      Thumb Up

      Re: Question: what about my memory

      You're way up the bell end curve with that anecdote

      1. stiine Silver badge

        Re: Question: what about my memory

        Is that good? Or bad?

        A mentat....I had never thought of him in that light...

  2. Pascal Monett Silver badge

    "Eliminating hallucination from LLMs is still impossible now"

    C3PO is starting to look positively sophisticated.

  3. NeilPost

    7 Data Protection Principles

    It’s not just right to be forgotten, it’s Data Protection Principles (and also other legislation like HIPPA).

    In case anyone has forgotten their mandatory training - or skipped the video …

    GDPR sets out seven key principles:

    Lawfulness, fairness and transparency

    Purpose limitation

    Data minimisation

    Accuracy

    Storage limitation

    Integrity and confidentiality (security)

    Accountability

    These principles should lie at the heart of your approach to processing personal data.

    OpenAI … I’m looking at you here. Data-Masking or Guardrails on output is unlawful. You must remove the data. The law overrides ‘technical limitations’. Like it, or not.

    1. JohnMurray

      Re: 7 Data Protection Principles

      Maybe, just maybe, that 'hard-to-forget' is a design feature......after all, nothing I have seen makes me optimistic that big-tech really wants to comply with data protection - voluntarily.

      1. NeilPost

        Re: 7 Data Protection Principles

        The only way is with strong regulation and effective fines. The UK Information Commissioners Office has shown itself to be impotent and completely ineffectual.

        See British Airways who got a £183m fine reduced to £20m belly-aching about no money and COVID.

        They should have been hit with a deal of 10% of future profitability for 20 years in exchange… not a nigh on 80% fine discount. I did not get a pass/discount on money owed during Covid… just a deferral.

        1. NeilPost

          Re: 7 Data Protection Principles

          https://www.itgovernance.co.uk/blog/british-airways-faces-sky-high-183-million-gdpr-fine#:~:text=More%20than%20two%20years%20after,a%20£20%20million%20fine.

          The BA £183m to £20m scandal if anyone interested… and this is before you talk about AI. Straightforward data breach, in scope of GDPR.

          BA Exec’s/Lawyers must have been pissing themselves laughing. Doubly so as they still had a full paying job and were not on furlough… and about to be fucked over on a fire and rehire too - unlike the aircrew and other minions.

        2. Cynical Pie

          Re: 7 Data Protection Principles

          The BA Monetary Penalty Notice (it is not and never has been a fine) in part was reduced because of Covid and the collapse of the travel industry and in part because for some daft reason the ICO sees fit to say what size they might issue a MPN for even before they have been through the statutory process.

          The suggested MPN was based on a percentage of their turnover at the point it was announced. By the time the ICO had gone through the statutory process the backside had fallen out of the airline business and BA's turnover was much much smaller ergo their MPN was smaller.

          That said the current incumbent IC, John Edwards, is a charlatan and more interested in soundbites and cosying up to the Government and big business than actually enforcing the law he is paid handsomely to oversee.

      2. dajames

        Re: 7 Data Protection Principles

        Maybe, just maybe, that 'hard-to-forget' is a design feature....

        I don't know ... methinks the people doing this so-called AI research find it hard enough to get machines to learn at all. I don't think unlearning is even on their radar, let alone something they've deliberately made hard.

        1. that one in the corner Silver badge

          Re: 7 Data Protection Principles

          Real researchers, as a group if not any specific individual, are interested in unlearning (or, at least, they were). Admittedly, for correcting erroneous data - including on-fly-recalibration and non-monotonic reasoning - rather than following "right to be forgotten".

          These guys doing the hype for the money? Nah, they don't give a damn. Don't think they even care if their systems grind to a halt in a year, due to regulation or anything else. Just so long as they can cash out in time.

          Bard will just be another Google Beta that vanishes and so on.

          1. Zippy´s Sausage Factory

            Re: 7 Data Protection Principles

            Yep. I'm wondering how much longer OpenAI will be around. Probably it'll declare bankruptcy and sell all its assets to Microsoft... which sounds like a worse outcome to me.

    2. Lee D Silver badge

      Re: 7 Data Protection Principles

      "Purpose limitation" is the killer.

      Was I reasonably notified, explicitly, that they were going to use my data to train an AI system that anyone in the world can query?

      Because if not, it really doesn't matter about "right to be forgotten"... they shouldn't be having the data in that system in the first place.

      1. NeilPost

        Re: 7 Data Protection Principles

        It will probably weasel its way into legalese T&C’s.

        See Apple’s infamous T &C’a as a case in hand of pages and pages of BusyWork legal obfuscation..

    3. Helcat Silver badge

      Re: 7 Data Protection Principles

      Safest approach is to simply not store the personal information in the first place.

      Oh, there'd be exceptions: Public figures, in particular politicians. Authors names when in relation to their work. Basically GDPR allows for that kind of data collection. But not Mr Brown and his tweet about how unfair his council tax is. That can be anonymized.

    4. Cynical Pie

      Re: 7 Data Protection Principles

      Being pedantic but Accountability isn't a DP Principle.

      It's Art 5(2) of GDPR which makes it a requirement for companies to be able to demonstrate compliance with Art 5(1) of GDPR which contains the 6 principles.

    5. jmch Silver badge

      Re: 7 Data Protection Principles

      "The law overrides ‘technical limitations’. Like it, or not."

      That depends on whether the limitation is really 'technical' in the sense that there is a theoretical way to do it differently, or if it's a fundamental limitation (eg cannot implement encryption backdoor that only the 'good guys' can use). With LLMs I get the impression that there is a fundamental way in which even the designers do not understand how the training data is being mashed together during training, nor can they identify a single item of data in the whole corpus. Therefore the only way for LLMs to comply with right to be forgotten is to remove data from training sets.

      That means that either all previously-trained LLMs have to be retrained at very frequent intervals*, or they have to be fundamentally redesigned so that the trained model can be searched for particular data to be stripped. I'm not even sure the latter is even theoretically possible with the current type of LLMs.

      Then we are also forgetting the hallucination bit - Even if all my personal data has been stripped from the LLMs, it is possible that it hallucinates information about a person who happens to share my name. There's nothing at the moment that can protect against that.

      *This is anyway probably good for the LLMs to keep up with 'current events'

      1. Ken G Silver badge

        Re: 7 Data Protection Principles

        Yes, like that journalist whose obituary was referenced.

        Even if all real information is removed, they can make up fake information at any level of detail.

  4. Neil Barnes Silver badge
    Stop

    a clear disconnect here between law and technical reality

    Then reality is clearly in error.

    If the mechanisms in use do not comply with the rules applied to them, then they should not be used until they do (or some well-meaning idiot changes the rules).

    1. Dan 55 Silver badge

      Re: a clear disconnect here between law and technical reality

      That is not disruptive. Disruptive is doing whatever you want and lobbying for it, if you lobby enough you won't have a problem finding well-meaning idiots.

    2. jonathan keith

      Re: a clear disconnect here between law and technical reality

      For 'well-meaning', read 'bought and paid for'.

    3. that one in the corner Silver badge

      Certify Your Corpus

      > If the mechanisms in use do not comply with the rules applied to them, then they should not be used until they do.

      Totally agree.

      However (oh no, here it comes, another rambling post!):

      Can we make a clear distinction about what "the mechanisms" are and control each appropriately? Just 'cos there is the risk of over-reaction, not separating the players from the game, throwing the baby out with the bath water, and creating another "AI Winter".

      Please, still put the boot into OpenAI, Bard and that other one - they've deliberately[1] pissed on GDPR etc (just group them all together as "privacy" for the moment). But, despite their own hype, they aren't the be-all and end-all.

      Now, as with every other system, we've got data collection, data storage, data transformation and data retrieval happening. In (the current crop of) LLMs the first is creating a training corpus, the second the actual training run, the remaining two are munged together when the model is used. Trivially, to comply with privacy, you have precisely two choices: don't feed your system with vulnerable data in the first place or make sure that the stored data can be found and eradicated as required (and without trashing your system as a result[2]). We want to be sure that The Rules reflect the two options (or at least The Procedures Required to comply with The Rules).

      The (current) LLMs are, by their very nature, incapable of the second option: some of them have proven to be tweakable (the Rank-One Model Editing (ROME) algorithm) but that isn't anything to rely on. The current Reg article notes some alternate ways to structure the models that will help but we aren't there yet (which is why the risk of another AI Winter is a concern, as that'll shut down the bulk of research into said structuring).

      So, right now applications of LLMs[3] can only be managed via the first option. So:

      We need to have certification applied to the training data and enforceable requirements on the systems that use certain certifications of the training data - plus a very large rolled-up newspaper applied to the existing suppliers of the training data[4] to get them certified. The requirements would then be along the lines of:

      * All systems must identify the corpora used and their certification (this is the big change from the current situation)

      * No certificate? Can only be used in-house for limited purposes (e.g. research into option two; demos to PHBs that using this stuff will get them sued), no public access to the model, publicly released responses from it allowed only in research papers with full attribution of the corpus (enough to ensure the results can be replicated, only within another house)

      * Certified clean of infringing data (e.g. only uses 18th Century novels)[5]? No restrictions on use.

      * Known to contain a specific class of data (e.g. medical records)? Restricted access to identified classes of people, must detail the start and end dates of the data collected, where it was collected, intended use; a stated expiry date (set to comply with the relevant usage - e.g. European data expires within the "right to be forgotten" time) - at the end of the expiry on the corpus, it must be updated and re-certified, any models trained on it must be deleted and new ones trained from the re-certified corpus (and there is an opportunity for supplying automation to users of the models)

      * Variants of the "medical data" are required, for example data from a proper double-blind study will be accompanied by any appropriate releases by the members of the study and won't have an expiry date.

      * And so on[7]

      [1] either it was deliberate or they were all lying through their teeth about how expert and knowledgeable their teams are - or both, of course.

      [2] if you just go around cutting bits out of the Net then it is very likely that you'll just increase the rate of hallucinations: if you pose a query to one of these models, you *will* get a reply out; if it can't synthesise something "sensible" because the highly-correlated paths have been broken then it'll just light up the less likely paths and bingo "Donald Trump rode a brontosaur in the Wars of the Roses" when we all know that it was Abraham Lincoln on a T-Rex in the Black Hawk War.

      [3] and they are going to be applied, however one feels about that, whilst there is the perception (founded or unfounded) that there is money to be made by doing so. Well, duh, but I wish applications were better thought out than that.

      [4] yes, no doubt the well-known names have done a lot of collecting (sscraping) themselves, but they also pulled in pre-existing text corpora; if you are making your own LLM there are suppliers from which you can get a raw-data text corpus or a "pre-trained" model that has already been fed on "the general stuff" (and you are expected to continue training on domain-specific texts).

      [5] or any other more sensible[6] criteria that you feel will comply with the concept of "doesn't contain iffy data" or even "has a less than k% chance of containing iffy data" on the grounds that everything in real life has risks and we're looking at managing them.

      [6] unless you are doing linguistic research, in which case this is a perfectly sensible corpus

      [7] if I try to continue listing variants this will never get posted in time[8]

      [8] oi, rude!

      1. Orv Silver badge

        Re: Certify Your Corpus

        You talk about an AI Winter like it'd be a bad thing. So far AI doesn't seem to be good at anything except threatening people's jobs. It's not performing any function that we need and couldn't get any other way. This whole business is a bubble that needs to be punctured ASAP.

        1. that one in the corner Silver badge

          Re: Certify Your Corpus

          Yes, AI Winter is a bad thing - it is a drying up of funding into actual research and has sod all to do with all the over-hyped bollocks that is going on now, with "AIs" that are pretty damn useless (the risks from hallucinations for starters makes the current deployments stupid).

          We want research into pretty much everything to continue at a steady pace, so that people can make a career out of it and actually get research done. Chopping off funding - or just the fear of doing that - causes research to slow down, even stop, and it becomes hard to pick it up again (e.g. if all the experienced tutors retire).

          AI research is a bit of a weird one: one of the old sayings was "if we've figured out how to do it, it isn't an AI question any more". In other words, even if you don't believe that "hard AI" is achievable the spin-offs are worth having (spin-offs from many other domains are also highly valuable, it is just that, having realised you could make an MRI, you don't turn aroynd and say "oh, that isn't nuclear physics anymore" [1]!)

          [1] oops, bad example. Even I've just used the current term "MRI" which does, indeed, drop the scary "nuclear" word!

  5. SonofRojBlake

    ""The Right to be Forgotten may very well be a well-intentioned regulatory protection, and many would argue that it is an important right to be protected. However, there is a clear disconnect here between law and technical reality.""

    "Technical reality" is a very slanted way of describing a tool someone(s) designed and built.

    This is NOT a case of "well the law says the land can't go more than 500 feet above the sea, but look, there's a mountain". It's a case of "the law says a building can't go more than 500 feet above this street, but we, y'know, built one anyway. Deal with it."

    The arrogance is staggering.

    1. Zippy´s Sausage Factory

      "However, there is a clear disconnect here between law and technical reality."

      That's a bit like inventing a new weapon that can kill someone, killing someone and then claiming that it's perfectly legal because the law as written didn't encompass the new reality of your new weapon.

      Laws don't work that way. AI has some uncomfortable learning (and unlearning) to do, whether the shyster snake-oil-salesmen in charge of it like it or not.

    2. Cxwf

      If I may, it’s closer to “the law says you can’t put anything more than 500 ft above street level, and I know you are trying to operate an airport here, but your airplanes are still covered by the law”. You have exactly one way to comply with this law, which is to shut down the airport. That’s better than having no rational way to comply (the mountain example), but only slightly.

      If society comes to the conclusion we’re better off without LLMs entirely, you won’t hear any complaints from me, but if you want to have them and regulate them, we will need to find a different approach.

      1. SonofRojBlake

        It's not really like that though is it? LLM operators could comply with the law in more than one way. Sure, they could simply shut up shop entirely, but the law doesn't require them to do that at all. It does require them to ensure that training data doesn't include [$certain data]. That's far from impossible, it's just time consuming and expensive, to which their response is "we can't possibly do that, it would affect our business", to which the correct response is "tough shit".

        1. Anonymous Coward
          Anonymous Coward

          " it would affect our business","

          Which, in plain English, means "We'd get less profit". No more, no less. Trampling peoples' privacy to get more profit is the name of the game here, again.

          "Too bad" would also do as a response.

  6. Jason Bloomberg Silver badge

    The day after the day before

    It's not just the right to be forgotten; there's a need to keep up with facts as they are.

    Fact yesterday may be that so-and-so was a convicted paedophile drug-dealing terrorist murderer. Fact today may be that they are entirely innocent and a victim of a horrendous miscarriage of justice.

    What's an AI trained only on yesterday's facts going to say about so-and-so?

    1. that one in the corner Silver badge

      Re: The day after the day before

      > What's an AI trained only on yesterday's facts going to say about so-and-so?

      This highlights an issue with the way these programs are *used* more than anything else (and, yes, how they are being hyped, which is worst).

      I'm sure you were just going for the quick comment, but, like so very many, you are saying "an AI is going to say about X" as though the intent of every AI is just to be a glorified search engine for the entirety of published information, an AI is just as good as another and X can be anything under the sun.

      The 200 kilo gorillas (OpenAI et al) are trying their damnedest to promote that view[1] but we also know full well that they are the ones that will wriggle their way out of regulatory retaliation - it is anyone smaller, who is doing sensible, targeted smaller-scope use of these models that will be hit proportionately hardest, potentially bringing them to standstill.

      But the affordable way to create a targeted model is to get a pre-trained model (buy it in or there are freebies out there) and then shovel in all the domain-specific data relevant to your usage.

      Immediately, the problem is obvious: the pre-trained model contains all sorts of guff. You only want it for its general ability to take in queries in natural language and synthesise results within your domain. So long as that is all that the thing is used for you only have to worry about updating your domain knowledge and issuing new models appropriate to that (which may well be "20 years" if the model is accompanying a piece of machinery that is expected to just keep on chuntering away, with the maintenance suggested by you AI assistant). But what happens when someone - maliciously or otherwise - presents a query outside of your domain and it tells you that Old Fred was, indeed, convicted as described because the pre-training corpus dragged in a pile of court records in order to gain that clever-sounding verbose way of talking?[2]

      [1] mainly because it is the easiest route for them to take, sheer laziness: just suck up everything and if the pile of nadans gets too big just buy more processor/GPU/RAM power - i.e. throw money at it.

      [2] maybe the AI assistant is also writing letters to the owner of the machinery, advising them - in perfect legalese - that they need to pay for repairs 'cos they didn't do the maintenance properly and it ain't the manufacturer's fault, don't get any funny ideas.

    2. Anonymous Coward
      Anonymous Coward

      Re: The day after the day before

      And it might have zero information about so-and-so but following some leading questions may "hallucinate" that they are a convicted paedophile drug-dealing terrorist murderer and generate convincing looking links to paywalled newspapers to confirm that thought.

  7. J.G.Harston Silver badge

    Also, with copyright holders insisting that permission is needed to use their works, and likely to refuse such permission, AIs are going to end up getting trained on some very skewed data.

  8. Gordon 10 Silver badge

    Academic bollocks

    The principles in GDPR are nearly all qualified including the so called “right to be forgotten” this article seems to skip that key point. There is no universal requirement for a processor to delete your PII.

    For a privacy breach to be shown the following has to happen.

    Firstly you’ll have to show there is a reasonable probability that a LLM was trained on your PII. (If OpenAI says no what are you gonna do?)

    Secondly you’ll have to show it retains that information. ( can you show the LLM has retained memory that reliably returns your PII)

    Thirdly all the way OpenAI or whomever will be throwing legitimate use and other justifications around like confetti.

    I can’t see this working anywhere except perhaps Germany.

    Fourthly if you seriously want to go after a LLM producer you’ll have far more luck down an Automated processing angle…

  9. Tron Silver badge

    It won't stand up in court.

    Builders' regs change but that doesn't mean all houses have to be knocked down and rebuilt whenever they are. Laws cannot be applied retrospectively, for just this reason. An AI can no more unlearn a component part of its 'education' than a human being can. You can block an AI from repeating specific text, but only by having a localised database of all the things it has to not say, which would be amusing when it was hacked. Rather like the government posting online a list of criminally offensive words and phrases, and accusations that we cannot repeat.

    Not that anyone is going to be rolling out new tech in the EU anyway, or in the UK when the current bill passes. To misquote Orwell, if you want a picture of the future, imagine a Government lawyer stamping on a PC - for ever.

    1. Anonymous Coward
      Anonymous Coward

      Re: It won't stand up in court.

      But if GDPR predates the generative large language model in use, then it's not retrospectively.

    2. SonofRojBlake

      Re: It won't stand up in court.

      "An AI can no more unlearn a component part of its 'education' than a human being can"

      The difference is that if a human learns something as part of their education that they're not supposed to know - that it's ILLEGAL for them to know - then there are certain ethical problems inherent in simply deleting that instance of Homo Sapiens and starting again from scratch with a new one. There is absolutely no ethical reason not to simply destroy the offending AI and retrain it on a compliant learning dataset... and to have to do that every time something non-compliant is found in that dataset. Oh, it's time consuming and expensive? Tough shit Mr. LLM operator, perhaps you'd prefer a job in a coffee shop.

    3. jmch Silver badge

      Re: It won't stand up in court.

      "Laws cannot be applied retrospectively, for just this reason. An AI can no more unlearn a component part of its 'education' than a human being can."

      Perfectly true, also completely irrelevant. GDPR came into force in 2018, and all the LLMs in question post-date that, and therefore have to comply with it.

    4. Anonymous Coward
      Anonymous Coward

      Re: It won't stand up in court.

      Gordon Brown in the UK retrospectively changed a Tax law on trusts. Increasing the tax from a very low percentage to 40%. Basically, by the time my mother's trust 'matures' it will be worth zero after the new taxes and administration charges. That screwed a lot of people in the UK.

  10. that one in the corner Silver badge

    Everyone says my data is covered by GDPR but is it?

    GDPR gets flung around a lot in these discussions, but then so does scraping of publicly available information, including stuff posted on FaceBook etc.

    My understanding is that GDPR and equivalents are concerned with what I do with data that I have asked you to supply, for some specific reason or other. For example, I ask for your email and an address when you join a local astronomy society because I am going to send you a paper newsletter every two months, with alerts on interesting observations via email. You agree to these uses, even ask to read my data retention policy (you get one more newsletter and emails for two weeks once your subscription ends, then we delete it) first. GDPR in force.

    You decide to put your email and home address up onto a public forum and I, along with every Tom, Dick and Harry, can read it. You have handed that over of your own free will, it hasn't been leaked from anywhere, no-one has breached GDPR to make it readable to all. Does the GDPR even get involved in that situation?

    If yes, can you give a citation, please. Seriously.

    If no, then anyone reading (or "scraping") it can do so with impunity.

    1. that one in the corner Silver badge

      Re: Everyone says my data is covered by GDPR but is it?

      The same question applies to the US HIPAA regulations.

      HIPAA keeps getting bandied about as though it prevents all transmission of US health data except for use by doctors and insurers, even to the point of filling YouTube with videos of people screaming "you can't ask me that, HIPAA it is illegal to ask me that".

      But HIPAA is quite clear that it only refers to the transmission of medical data *from* doctors, insurers and other specifically named "entities", in order to stop them selling the details on so that Stanner know you are in need of a new stairlift. If you have passed any of your medical data onto anyone else (say, FaceBook) then it is fair game. If a supermarket infers your medical status from your shopping habits (the apocryphal " congratulations on your pregnancy" stories) they can, and will, sell that on.

      It certainly appears that citing training on medical information isn't against HIPAA *unless* you can demonstrate that the data was supplied by a "covered entity". Good luck with that, especially given the point that the LLM may have just hallucinated - or even more simply, just correlated its data - when it printed out that you have a dicky heart and a limp due to undiagnosed gout.

    2. Vincent Ballard

      Re: Everyone says my data is covered by GDPR but is it?

      Citation: Consolidated text: Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A02016R0679-20160504&qid=1689352839415

      Note in particular Article 6: processing is lawful only if one of the enumerated conditions applies. Suppose I post my e-mail and home address on a public forum so that e.g. my friends can send me birthday wishes and you wish to scrape that for training your LLM. That doesn't meet the requirements for consent unless I explicitly authorise that purpose; it's not necessary for any contract between us; it's not something that you're being forced to do by a legal obligation; it's not protecting anyone's vital interests; it's not being carried out in the public interest or as an exercise of official authority; and it's not necessary for any legitimate purpose you have because an LLM can perfectly well be trained without it.

      1. that one in the corner Silver badge

        Re: Everyone says my data is covered by GDPR but is it?

        Thank you for the citation - I shall read it when next back on a decent screen.

        It is heartening to see someone good enough to answer a direct question posed in these comment pages.

    3. Anonymous Coward
      Anonymous Coward

      Re: Everyone says my data is covered by GDPR but is it?

      Down vote! Idiot! You never ask a question and show that any of us don't know what gdpr says!

  11. Felonmarmer

    Add another AI to solve it.

    One way to do it would be with another AI trained on negative data, a list of proscribed results that the first AI passes it's output to and the second AI assesses that output against it's database of blacklisted info and automatically redacts the output.

    It would need to store info on what needs to be forgotten though so would fail that test, but surely when a request to be forgotten is put in, that request is stored somewhere also? How under the current system do you check that the forbidden knowledge is not relearned?

    If the redacting AI has a much smaller set of info than the main AI, then it could be used as an interim measure until the main AI LLM is updated, solving the time issue.

  12. This post has been deleted by its author

    1. Ken Hagan Gold badge

      Re: Opt-In

      It gets better. The form includes a question asking for your evidence that they have leaked your info, so you can't opt out until after the first breach.

    2. Anonymous Coward
      Anonymous Coward

      Re: Opt-In

      "Just because something is observable to all and sundry does not mean that it's public domain. I read a copyrighted book, doesn't mean I can plagiarise it."

      You can not plagiarise - you can not copy it directly.

      But you can do whatever you want with the information, process any data you find any way you want.

      "Harry goes to Hogwarts" is allowed, copying the chapter word for word is not.

      "Lil Endian has a big nose" is allowed, just repeating as-is the blog post where you describe the shame of schnozz jokes is not.

  13. StewartWhite Bronze badge
    Mushroom

    AI will not have to face a legal reckoning

    We can all complain and wring our hands about how AI/the companies that own AI systems should be made to face the consequences of their behaviour/potential illegality but in reality it's not going to happen.

    Western governments are in the pocket of Big Tech and even were that not to be the case politicians in general are stupidly starstruck about anything tech so will gladly fawn over the Wizards of Oz pulling the levers to bring the magic of world peace and prosperity through AI or whatever the next tech snake oil is.

  14. Omnipresent Silver badge

    if they can make money at it

    there are no rules. The rules will be broken, and already have in many cases (facebook).

  15. Frogmelon

    Soon all your functions will be mine

    Send in Tron... He fights for the Users! :)

  16. Dropper

    Bigger Problems

    Sure this is a big problem, and as more countries and US states adopt these laws, the problem will only get larger.

    But I would have thought the largest problem AI faces is copyright theft. Whether it's used to generate copy, photos or video - eventually people are going to notice how often their content is being stolen.

  17. Anonymous Coward
    Anonymous Coward

    will have to be capable

    will = should, i.e. irrelevant, cause 'in the meanwhile...'

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like