back to article Judge says Meta must defend claim it stripped copyright info from Llama's training fodder

A judge has found Meta must answer a claim it allegedly removed so-called copyright management information from material used to train its AI models. The Friday ruling by Judge Vince Chhabria concerned the case Kadrey et al vs Meta Platforms, filed in July 2023 in a San Francisco federal court as a proposed class action by …

  1. Anonymous Coward
    Anonymous Coward

    There's a pretty good argument here

    I think there's a reasonable argument that the weights for one of these LLMs is essentially a lossy-compressed copy of the input data. That they can regurgitate snippets of that data is merely a symptom of this deeper problem. If true, this is could be really bad for them (legally speaking) because courts have held in the past that simply loading data into memory from disk constitutes making a copy (this is why you can own a hard-copy book but with software you merely own a license to it). That the compression is lossy doesn't help that much; the RIAA will go after you for sharing an mp3 no matter how low the bit rate. I'd be curious to hear from a copyright lawyer about why none of the plaintiffs have been using this angle of attack.

    1. Richard 12 Silver badge

      Re: There's a pretty good argument here

      Most likely because of the "AI" marketing. The judges and juries don't understand what's in the magic box.

      Proving that they emit copies of significant parts of copyright works should be sufficient to prove infringement.

      Showing that Meta and OpenAI removed the copyright information proves wilful intent, as they knew it was infringement and didn't want the model to emit the copyright details of the stuff they nicked.

      1. Alex 72

        Re: There's a pretty good argument here

        It is likely most readers on here know Large Language Models (LLM's) are part of deep learning which is technically in the field of Artificial Intelligence (A.I.) but not the AI people outside the computing science field have seen in Science Fiction or have any awareness of. By allowing the marketing people to call this AI the professionals in the firms promoting this may have been complicit in over hyping tech stocks and confusing the type of citizens who would sit on a jury. This may be the only reason actions like this takes months or years to get to trial and do not simply result in an immediate court order to stop using models trained on this data and stop training new ones unless and until an agreement with rights holders is reached or laws are changed.

        P.S. I am sure in some cases the professionals tried to resist and explain this to leadership, given where we are today if they did leadership did not listen, nothing new there then.

        1. DrXym

          Re: There's a pretty good argument here

          The more I learn of AI the more bullshit it becomes. An LLM is basically just billions of weighted parameters that were trained on lots of content. Once trained, it's just a loop that builds up an output set of tokens based on the input set of tokens. It's practically mechanical aside from a little bit of randomness thrown in to make the responses vary. The only "intelligence" to it is whatever rules the developers programmed into it for training and also for vetting the response, i.e. guard rails.

    2. Roland6 Silver badge

      Re: There's a pretty good argument here

      I am a little surprised the claimants have not asked for full disclosure of Meta’s training data.

    3. FrankAlphaXII

      Re: There's a pretty good argument here

      I think that's too complex for most juries to understand so if it might go to trial I wouldn't base a case around it. For something that's likely going to never be heard by a jury, maybe, but it would depend on the judge.

      I'm not a lawyer but I am really unlucky and wind up on jury duty every couple of years, and usually for complex civil stuff like this too, my last trial took 3 1/2 weeks. I can guarantee you at least 3 people on a jury, enough to cause a mistrial, will not understand the argument at all. Trying to explain that in a jury room to someone who isn't technical would be exceptionally difficult.

  2. Eclectic Man Silver badge

    What if ...

    ... a student uses an AI to generate an essay, dissertation or thesis, which used copyright work to generate the LLM, and then submits this? Is the student guilty of

    a: plagiarism

    b: breach of copyright

    c: cheating

    d: all of the above

    ?

    1. vtcodger Silver badge

      Re: What if ...

      "a student uses an AI to generate ..."

      And if that selfsame student goes to the library and generates the same product without the help of AI, is he/she guilty of doing anything other than practicing research? And isn't research exactly what the student is supposed to do?

      Perhaps we're looking at fundamental problem with the concept of "intellectual property". With the exception of Trademark, IP seems to me a quite nebulous concept with an enormous gray area between clear "theft/misuse/abuse" and equally clear "fair/proper use". I have a lot of doubts about the ability of legal systems -- current or future -- .to resolve IP issues equitably.

      On the other hand, I suppose it keeps lawyers employed. I reckon that's a good thing. Lord knows what mischief they'd be up to if they weren't arguing about IP issues.

      1. Brewster's Angle Grinder Silver badge

        Re: What if ...

        If a student using a library didn't properly attribute the work of others, and tried to pass it off as their own, then would be guilty of plagiarism.

        1. TDog

          Re: What if ...

          Having been accused of plagiarism by the open university (the quote was referenced, in italics rather than plain text, but comitted the heinous crime of not having an end quote, I am able to speak about the processes that the OU used. It used an automated software detector without any human overview. It generated an automatic warning without any human overview. I checked the OU's minutes of their executive board meeting which authorised this use of software and it was noted in the minutes that things like this could happen. Strangely there was no discussion as to mitigation nor was there a clear path to follow to resolve such issues.

          Sort of,

          "Oh look, you're fucked", followed by

          "Oh look, you are fucked".

      2. Anonymous Coward
        Anonymous Coward

        Re: What if ...

        This is assuming that the AI will return the same kind of research quality as a library or i.e google scholar. So far it doesn't tend to but this may change in future.

        We just tend to assign low marks for the students (mis)using AI because objectively their work is of a matching low quality.

        Using AI *as well* as proper research is absolutely fine. Though amazingly, students seem polarized. They either misuse it, or don't use it at all.

        As for "stealing work", I don't like that research, paid for often by the tax payers is gatekeeped by publishers. There are a number of "pirate" research gateways that provide everything as *ehem* open-access.

        (AC for a reason).

      3. Roland6 Silver badge

        Re: What if ...

        However, the library will have paid to have the copyrighted material available to students etc.

        We know with a reasonable level of certainty the LLM authors have not paid for the copyrighted material they have used and make available through their tool.

      4. doublelayer Silver badge

        Re: What if ...

        If the student goes to the library which has legal copies, then copyright infringement is out. If they cite their sources and quote appropriately, then plagiarism is out. On the other hand, someone who wants to cheat and plagiarize without citation can do that just fine from the library, though it's not as fast as doing it online. An LLM can throw you into the plagiarism camp even if you didn't want to because it frequently quotes without citation. That makes it hard to choose it if you want to be sure you're not going to plagiarize.

        As for cheating, it's cheating whether you have a program write you an essay or if you pay your friend who actually studied to write one for you. It was as possible and as well-defined manually as it now is automatically. No, reading it later and correcting a few details doesn't make it not cheating.

    2. jdiebdhidbsusbvwbsidnsoskebid Silver badge

      Re: What if ...

      It depends ... Based on my understanding of the UK copyright law, I think the answers would be:

      a: plagiarism: yes if the student claimed the work as their own and didn't acknowledge the use of an AI. (Probably breaching the college/school/uni's terms even if they did acknowledge it.)

      b: breach of copyright: depends how much copyrighted material is used and how it is presented. "Fair dealing" for example allows duplication of limited amounts of copyrighted work for criticism or review, but not to reproduce and claim as your own.

      c: cheating: probably, for the same as a above.

      In my opinion, LLMs that use copyrighted material (without a license) in their training sets are guilty of all the above.

    3. Oninoshiko

      Re: What if ...

      D

    4. DrXym

      Re: What if ...

      Well if they use AI then potentially they're plagiarizing everything that went into the model all at once.

      But it would be a *brave* student who used it for anything more than a hint while doing their own research because it would emit bland, repetitive, hallucinatory crap that would soon become obvious to someone reading it.

  3. Herring` Silver badge

    Make it stop

    These models were trained on material entirely created by humans. When the models make it uneconomical for humans to create that material, there will be no new ideas, no new art, no new music, no new literature. Everything will be a pastiche, a shallow derivative of things gone before. And the point of the human race is lost.

    Still, nice weather today.

  4. Norbert-

    Title is wrong

    *They must defend AGAINST the claim. Defending the claim would be saying they have to agree with it.

  5. mark l 2 Silver badge

    I wonder how many British authors copyrighted works have been ingested by Meta, OpenAI etc for their LLM?

    As if they get took the matter court in the UK for copyright infringement the 'fair use' excuse wouldn't wash as although there are exemptions for using computers to data mine copyrighted work for research purposes, that doesn't apply if its for commercial use.

    1. Ben Tasker

      > As if they get took the matter court in the UK for copyright infringement the 'fair use' excuse wouldn't wash

      True, but the UK Govt is currently consulting on (and quite keen) to implement a opt-out exemption so that AI companies can consume whatever they want without fear of things like copyright holders objecting

    2. Andrew Scott Bronze badge

      on occasion i use the weird spelling "colour". Think that came out of some uk book i once read. should i be sued for plagiarism every time i use that spelling?

  6. tiggity Silver badge

    So Meta are happily allowed to grab copyrighted material from torrents.

    Using it to train "AI" is still "theft" (to use the piracy is theft message

    of copyright holders))

    Yet there's frequently cases of random "nobodies" facing huge charges for torrents (be they music, books, films etc.)

    Good* to see the one law for the big boys & one law for everyone else still in place.

    * Obviously not good at all, but its typical of how wealthy companies / individuals rarely face the full force of the law

  7. bitwise

    A copy of a person but with the "left bias" removed

    So, anything that tech billionaires decide is "left bias" is removed (probably being in favour of workers rights etc)... so now they can make copies of us that are arseholes, great.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like