back to article Media experts cry foul over AI's free lunch of copyrighted content

Tech companies should compensate news publishers for training AI models on their copyrighted content, media experts told senators in a hearing this week. The US Senate Committee on the Judiciary quizzed leaders from media trade associations and academia on how generative AI affects the journalism industry. Journalism has …

  1. Catkin Silver badge

    Ad Revenue

    Wouldn't splitting the profits (at a negotiated percentage) clear this up? Personally, I find most news sites unreadable without an ad blocker. The last time I peeked, I was greeted with a banner ad taking up the top third, inline ads that take up half of the remaining space and a pop up, auto playing video; all of them pushing utter rubbish. Not that it diminishes their legal standing but if publishers are concerned that their target audience would rather read the output of a hallucination-prone text mincer, then perhaps they should examine why that is.

    On that note, kudos to The Register for having manageable ads.

    1. big_D Silver badge

      Re: Ad Revenue

      If they AI company could in some way tell, which sources were used for which answers. I believe that, due to the way the LLMs work, unless there is a specific chunk of text that is a verbatim copy, it would be nearly impossible to tell, whether the answer came from NYT, WAPO, Medium, The Guardian or some private blog...

      At the moment, AI is a parasite that will kill off its host. There are certainly some uses for AI, but if it continues like this, there won't be any current affairs sources left to plunder in the future.

      They have to learn to become a symbiot with the creators of content. Without fresh information, the AI will quickly become useless, but if it then fails, we won't have any traditional reporters to fall back upon.

  2. Zippy´s Sausage Factory

    "Coffey put forward the idea that tech companies should build a searchable database cataloging all the websites that have been scraped. AI companies may argue that it's too tricky and cumbersome to sort through the huge amounts of text they have amassed over time."

    This sounds like a good idea. The ability for a site to opt out, and have their data deleted from the AI's training data within a set, but short, timeframe (say 48 hours) must be included in this, however.

    AI produces what, surely, any good set of lawyers is going to get classed as "derivative works". Once that gets established as a legal precedent - and I'm pretty sure it will - AI will have to licence the content they use for training data and pay for it. As they should have been doing all along.

    1. Catkin Silver badge

      If a future law or judgement is in their favour, it seems like robots.txt would be a perfect existing solution that would require no further work on the part of a website (as opposed to having to check) and wouldn't be as damaging to model producers or the environment (from having to recompute an entire model).

    2. doublelayer Silver badge

      "The ability for a site to opt out, and have their data deleted from the AI's training data within a set, but short, timeframe (say 48 hours) must be included in this,"

      It wouldn't work. Once they start training their model, the content is baked in there if the software chooses to retain it. You can't strip it out afterward. Nor am I comfortable making this opt out. Here is my suggestion.

      When you want to train an AI model, you get an explicit license for everything you throw in. If you want this book, you find out who owns the rights to that book and ask them for a license to train your AI on it. If it's in the public domain, you're good and can use it. If it's under a license that permits you to use it for your commercial purposes, you're also good. If they give it to you for free, great for you. If they want money, negotiate with them for how much. If you find it's too expensive to negotiate with individual authors for individual books, feel free to try to negotiate with a group of them en masse. Some authors might not agree to that. Too bad for you, you can't use those books until those authors die and the copyright period after death has lapsed, or you can always come back and try to negotiate some more. Find someone else's book. Replace book with site, song, or any other thing that can be copyrighted.

      1. catprog

        You can still delete it from the training set for future training runs.

        Did you have a licence to incorporate eveything you read when you wrote your comment?

        1. doublelayer Silver badge

          What my brain does and what an LLM does are different. I know it. You know it. Everyone posting here knows it. Stop trying to pretend that they're the same in order to get a different result.

          The point is that deleting the data for future training runs is not enough. I don't get to have free copies of your copyrighted data until you complain, then it's just fine. I'm supposed to not do that in the first place. The only acceptable option is getting the legal right to stuff you use before you use it.

          1. catprog

            I am not saying they should get free data.

            My argument is if they legally aquire the data (I.e no copyright laws have been broken and they have a copy of the data) then it does not matter if the training is human or computer they are allowed to train using it.

            -

            So far I have not seen a law that makes a distinction between human or computer. If everyone knows that it is diffrent htey would be able to show something that makes it diffrent.

            Some of the attemps at claiming this diffrence I have seen include.

            1) It is just diffrent

            2) Humans look at a other things as well not just copyrighted materials.

        2. bean520

          > You can still delete it from the training set for future training runs.

          The problem is, it was still in the training data from the previous run, which will still affect the output.

  3. Strahd Ivarius Silver badge
    Trollface

    I plan to train AI for generating videos

    Can I get a free pass for downloading content from Disney+, Netflix, Amazon Video and all other similar sites?

    1. MrDamage

      Re: I plan to train AI for generating videos

      It's called a tricorn hat.

    2. catprog

      Re: I plan to train AI for generating videos

      Yes. For Netflix the cost is $7/month and you can keep as much in your brain as you can remember.

  4. Pascal Monett Silver badge
    Trollface

    "US senators want to know"

    What, they don't have any lobbyists to pay them to know that ?

    1. amanfromMars 1 Silver badge

      What US Senators and All Need To Know

      "Ye Gods", says Generative AI ...... what part of "All of your information and nonsense belongs to us" does humanity not understand ‽ .

  5. amanfromMars 1 Silver badge

    Plonkers'R'Us whenever Presenting as Moronic Idiots on a Roll

    If tech companies get their way, and the courts decide that generative AI doesn't violate copyright, they should still pay publishers for using their materials anyway, LeGeyt said.

    "These technologies should be licensing our content. If they're not, Congress should act," he urged senators.

    Oh dear ..... such ignorance shown and arrogance shared in those two short paragraphs is surely tantamount to criminal ransomware being endorsed as a legitimate means of payment for future human/Learned Large Language Learning Machine content/entertainment/experiences.

    Going down that rocky road is a slippery slope with vast collections of rabbit holes to fail to avoid and fall into and have to negotiate entirely novel terms with in order to be given any chance to survive and prosper and enjoy life ..... but not as you used to know it.

  6. werdsmith Silver badge

    I've read loads of stuff on the internet and learned loads from it. Didn't pay for most of it because it didn't ask me.

    So who do I owe money to now?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like