back to article Writers sue Anthropic for feeding 'stolen' copyrighted work into Claude

Anthropic was sued on Monday by three authors who claim the machine-learning lab unlawfully used their copyrighted work to train its Claude AI model. "Anthropic has built a multibillion-dollar business by stealing hundreds of thousands of copyrighted books," the complaint [PDF], filed in California, says. "Rather than …

  1. Yet Another Anonymous coward Silver badge

    Slippery slope

    "Claude could not generate this kind of long-form content if it were not trained on a large quantity of books, books for which Anthropic paid authors nothing."

    And had those authors been influenced by books they had read?

    I suppose if you limit English Lit to nothing newer than Shakespeare you should be OK but it's going to limit the prose in the next Tom Clancy.

    What it's going to do to film studies when no potential future director can view any film newer than 1928? Certainly put a crimp in Mr Tarantino's oeuvre

    1. Gene Cash Silver badge

      Re: Slippery slope

      While I kind of agree with you, I'm willing to use any stick available to bash AI back into nonexistence.

      I'm tired of the BS around and spewed by AI, so I'm willing to start carpet-bombing.

      1. HuBo
        FAIL

        Re: Slippery slope

        Well, I think the article does a good job of expressing the related nuance 3 times. Once with "OpenAI at the time argued, Training AI models using publicly available internet materials is fair use"; then with: "That position has been supported by the Association of Research Libraries (ARL)"; and then: "Tyler Ochoa [...] said [...] using copyrighted content for training probably qualifies as fair use". The key bit being that it's ok to use those for training, but the output of the LLMs cannot be allowed to reproduce them verbatim (eg. 1st sentence of AC kommentard below, in "This really isn't going to work...").

        Some of the students in my classes had a hard time understanding the difference when completing homework assignments and project reports, or answering exam questions (eg. plagiarism). They failed out of those classes (hadn't learned anything but how to copy stuff).

        1. martinusher Silver badge

          Re: Slippery slope

          The rules for verbatim use should be exactly the same as the rules for a human author -- short quotes for illustrative purposes are OK provided appropriate attribution is given. (Its what footnotes and a bibliography is all about -- and its one of the primary purposes of an editor is to verify that appropriate credit is given.)

          But I reckon these people want to gatekeep stuff like the music publishers do. My main interest is older -- 'classical' -- music which gets me off the hook somewhat because a lot of modern works are derivative and generally "NCV". (Hold on -- a lot of older works belong in the same category. You've got to experience Victorian pop music to understand just how bad music can be!) Despite this the music publishers have prevailed, its got to the point where a particular rhythm or chord becomes property to be bought, sold, rented and generally fought over in court (rarely by the original musicians, of course). The prose publishers want to erect their tool booths, too. Which is a pity for our culture because if we were brutally honest with ourselves then nearly everything we do is derivative from previous works, we're just not honest enough to say so. (At least Baroque composers saw copying as a complement -- music was largely improvised and reuse was rampant.)

    2. Filippo Silver badge

      Re: Slippery slope

      >And had those authors been influenced by books they had read?

      Sure, the LLM is doing nothing wrong, which is why nobody is suing the LLM.

      Each and every Anthropic employee is absolutely free to read all those books and then write books using the skills they've acquired. Nobody is arguing against that.

      Downloading those books, using them as input to a computer program, using the output to setup another computer program, and charging for the ability to run that program, however, is not reading and doesn't even look like reading, not even if you squint really hard.

    3. John Robson Silver badge

      Re: Slippery slope

      Being influenced by the ideas or style of books one reads as part of a much wider education is somewhat different from a computer constructing a "language" model based on published works.

    4. boblongii

      Re: Slippery slope

      That's a weasel excuse and a half, an a fairly obvious troll.

      Aside from anything else, if someone writes books in the style of another author they ARE regarded as a hack; it's not accepted from humans and it certainly doesn't have to be accepted from a machine.

      Secondly, while authors are influenced by books they've read they generally are books they have paid for. The issue here is copyright. Google opened the door with their Books project and the "AI" companies are hoping to repeat their success.

      1. Yet Another Anonymous coward Silver badge

        Re: Slippery slope

        > if someone writes books in the style of another author they ARE regarded as a hack; it's not accepted from humans

        And yet movie critics gush over how a new movie recreated the look of Hitchcock, or that classic shot from John Ford, or had a baby carriage falling down a big set of stone steps

  2. Anonymous Coward
    Anonymous Coward

    This really isn't going to work...

    ...copying has to be verbatim. It also has to be a substantial percentage. And not be transformative.

    The best they will be able to achieve is to sue end users of the NN, if they somehow manage to get substantial chunks of copyright material out of the thing. Hint: that's only going to happen by deliberately asking for it.

    Your average user of generative AI have statistics on their side. There are enough combinations of words such that every generation run will be unique from a "number of atoms in the universe" perspective...

    1. WUStLBear82

      Re: This really isn't going to work...

      The allegation isn't that the AI replicated their work, it's that the AI was trained using accumulated textual material that included ebooks that were not agreed to be licensed, including their works. The fact that the stolen ebooks were republished, without permission, in a repository on the internet, does not necessarily make them freely usable in US law.

      1. Anonymous Coward
        Anonymous Coward

        Re: This really isn't going to work...

        You don’t need a license to train NN. You need a license to distribute duplicate work under Copyright law.

        Unless there is a signed contract for something not covered by Copyright law, no dice there.

        1. boblongii

          Re: This really isn't going to work...

          So you think that Anthropic paid for all those books? I'd like to see the receipts.

          Leaving that aside, the storage and transformation of text is almost certainly going to run afoul of the restrictions placed in most books.

          I don't think these lazy parasites can win; they certainly should not be allowed to win.

    2. gnasher729 Silver badge

      Re: This really isn't going to work...

      First sentence is wrong. Copying doesn’t need to be verbatim. If it’s not, it is a derived work which is just as protected. Just try publishing some books about a young magician named Perry Hotter and the evil Morteveld.

  3. Sorry that handle is already taken. Silver badge
    1. boblongii

      Re: This is getting out of hand

      Who cares what Google do? Their search engine has been useless for years now.

      1. Sorry that handle is already taken. Silver badge

        Re: This is getting out of hand

        I agree with your second statement. In response to the first, Google's search market share is ~90%.

  4. IGotOut Silver badge

    For those defending AI copyright theft...

    don't complain when you lose your job....and stop complaining job has been offshored to a cheaper location, it's no different.

    If you make something and the Chinese rip it off and sell it for a fraction of the price, putting your company out of business, though shit, no different.

    Just because it doesn't affect you, doesn't make it right.

    Oh and this bullshit "all humans copy"

    If you directly copy someone's work you get sued

    If you are inspired and write your own, that takes skill and learning.

    1. This post has been deleted by its author

    2. Sorry that handle is already taken. Silver badge

      Re: For those defending AI copyright theft...

      If you are inspired and write your own, that takes skill and learning.
      And the critical distinction here is that an LLM is just a statistical model. It doesn't arrange words (or pixels) in any intelligent way; it simply does so in a way that resembles what it's seen before, without further insight. Unlike a human artist, as you rightly say.

      1. ibmalone

        Re: For those defending AI copyright theft...

        Indeed, an LLM work has to be derivative, as that's its only source of input. A person can call on their lived experience, an LLM only has the training corpus (of one particular format, here textual work), if that is made primarily of work which was used without permission then how can the results be legitimately used?

        Besides this, the argument that it's what people and so it should be treated the same is more fundamentally flawed. There is no problem with the law being different for people and corporate-owned LLMs. The law is supposed to be for people after all. If, until someone has the sense to write that in, we want to ascribe the difference to inspiration then that's fine.

        1. Anonymous Coward
          Anonymous Coward

          Re: For those defending AI copyright theft...

          Copyright allows for both quoting, and transformative works.

          LLVM are transformative, and are allowed to quote chunks perfectly within the rules.

          1. John Robson Silver badge

            Re: For those defending AI copyright theft...

            "allowed to quote chunks"

            But they don't do that "within the rules" since they don't know that they're quoting and therefore can't appropriately attribute the quote.

            1. Anonymous Coward
              Anonymous Coward

              Re: For those defending AI copyright theft...

              Attribution is not a requirement under Copyright law either. That’s something you should do in School work.

              1. John Robson Silver badge

                Re: For those defending AI copyright theft...

                You've never worked in publishing then - because you do have to attribute appropriately.

                If nothing else failing to do so potentially deprives the original author of their rights under copyright law.

  5. nematoad Silver badge

    Tough.

    ...enter into licensing arrangements with large publishers and other content providers. Doing so, however, makes the costly process of model training even more expensive.

    That's just the cost of doing business.

    You use someone else's work, you pay. You have to pay for any premises you rent and you certainly have to pay taxes on your income. Unless you want to go down the Elon Musk route and stiff your suppliers and say "Sue me."

    Note: That usually doesn't work with the tax man.

    After all, you would be the one screaming "Piracy" if it happened to you.

    1. Filippo Silver badge

      Re: Tough.

      I hate it when corporations try to argue that they should get away with something dodgy because otherwise their business model won't work.

      Oh, look, I have a business model where I steal jewelry and sell it for cheap! But if I have to pay for it instead, then it's no longer viable! Please make theft legal!

      Whether something increases your costs, or even if it invalidates your entire business model, is completely and utterly irrelevant to whether it's legal or not.

      1. John Brown (no body) Silver badge

        Re: Tough.

        Exactly what I was thinking when I read "makes the costly process of model training even more expensive." in the article. Well boo hoo, they're making a "product" and now they've been told they can't use stolen/free raw materials any more, they have to pay for the raw materials just like every other industry.

  6. LybsterRoy Silver badge

    -- That automated generation of prose is only possible by training Claude on people's writing, for which they've not received a penny in compensation, it's argued. --

    I must have missed it in the article but how much were they asking for? I think the purchase price of one copy of each eBook used would be fair. After all if I bought an eBook, read it, and then wrote my own that's all I'd have to pay.

    1. Zippy´s Sausage Factory

      We don't know how big a collection it was trained on, nor how much damages they will get.

      I think the problem isn't necessarily the training, but the creating similar works. If all AI did was summarise and answer questions, nobody would care very much. The fact that it can generate books, art and music "in the style of" specific authors, artists and composers is what, rightly, concerns the creative industries.

      People want AI to be doing the mundane jobs that help them do the creatice work, rather than doing the creative work and condemning the humans to the mundane tasks.

      1. Jedit Silver badge
        Mushroom

        "People want AI to be doing the mundane jobs"

        Indeed. You're not the first person to point out that 70 years ago SF writers imagined a utopian future where computers would take care of all the drudgery while humans were free to pursue creativity, but the future we wound up with is a the exact opposite.

        I'll add my voice also to the chorus saying that Misanthropic can fuck off, along with all other "AI" companies. It's not even really AI, just a slightly more sophisticated version of Eliza. It can't create anything, it can't make logical connections based on the material it's given. All it can "learn" is how to say the same thing that a creative human has already said, but in a slightly different way. It serves no purpose other than to let companies stop paying their employees for real work, and it will only cause stagnation.

      2. Sherrie Ludwig

        People want AI to be doing the mundane jobs that help them do the creatice work, rather than doing the creative work and condemning the humans to the mundane tasks.

        Exactly, I don't want AI to do creative stuff, I want a semi-intelligent Roomba to clean my bathroom.

    2. John Brown (no body) Silver badge

      "I think the purchase price of one copy of each eBook used would be fair. After all if I bought an eBook, read it, and then wrote my own that's all I'd have to pay."

      That might be a valid point if they had gone and bought the books in the first place. The allegation is that they used pirated sources, so retrospectively paying "the purchase price of one copy of each eBook used" is no longer a satisfactory outcome. Just as a bank robber doesn't get to walk away free and clear by offering to give back the stolen money.

  7. John Robson Silver badge

    Impossible

    "The AI giant claimed it would be impossible to train AI models without using copyrighted content."

    Really?

    You phone up a handful of major newspapers and ask if you can license their back catalog of articles for training.

    It's not actually that hard.

    In terms of books, it's harder, but there is no reason to just ingest stuff without licensing it. If it's too much work to ingest a certain work, then don't ingest it. There is a significant body of work which is out of copyright (70 years past death) which you could use without paying licensing.

    If you think copyright is too long then feel free to challenge the duration and lobby to reduce it.

  8. Grogan Silver badge

    Those Anthropic "Claude" bots are intrusive, they ignore robots.txt directives (most bots except legitimate search engines do), and abusive. I started noticing them back in May, hammering the shit out of my web forum and photo gallery. We're talking like 400 to 500 simultaneous bots all coming from different IP networks within Amazon's cloud.

    If AWS is going to allow that kind of tomfoolery to take place on their network, fuck em. I have blocked every single CIDR block used by AWS with kernel rules. We're talking /9, /10, /12 bit subnets etc., millions of IP addresses. I don't have to care about that, for there are no internet clients (butts in chairs with web browsers) in there. I've been thinking of dropping the rules to see if they were still a problem, but... nah. AWS can stay blocked.

    It's no joke, I like to allow large publicly viewable attachments and gallery posts for my members, and these asshole bots rack up tens of thousands of downloads.

    It's not even that I mind the content being scraped, it's the bloody abuse.

    1. James Hughes 1

      We had the same problem with some Google AI bots. Forums were taking ages to ban people (spammers). We disabled access to the Google bots, and something that was taking 45 seconds now takes 4s, and the forum is much more responsive.

      I didn't have a problem with a Google Bot accessing the data every hour or so, but this was multiple bots, multiple times per second, which is basically pointless as the actual data wasn't changing that fast. All it did was get the bots access revoked so now they have nothing at all.

  9. hairydog

    All AI seems to be is automated plaigiarism. Yes, there are issues about paying for the input, but it goes deeper than that.

    Already any google search returns loads of web pages that are clearly AI summaries of someone else's work.

    Soon the copied summaries will drive the original works into extinction, and so machines will almost entirely feed on the output of other machines.

    It would be good if Google could identify AI content and provide a way to exclude it from search results.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like