back to article Reddit: If you want to slurp our API to train that LLM, you better pay for it, pal

In a move seemingly designed to stop being used as a free training library for large language models, megaforum Reddit said it's going to begin charging companies who make excessive use of its data-downloading API. "As a platform with one of the largest corpus of human-to-human conversations online, spanning the past 18 years …

  1. An_Old_Dog Silver badge

    They Might Not Need No Stinkin' APIs

    Massively-parallelized, data-rate-limited screen-scraping might fly under Reddit's radar. Not that megacorps wouldn't try to save a bunch of money by trying such tricks.

    1. johnfbw

      Re: They Might Not Need No Stinkin' APIs

      Given the whole db is probably less than 1Tb (it is text only) I bet there are plenty of copies flying around that have walked out the office with ex-employees

  2. Yet Another Anonymous coward Silver badge

    You want skynet, cos that's how you get skynet

    Train an AI on Reddit and it IS going to try and destroy humanity

  3. Killfalcon Silver badge

    The note on users owning their posts is interesting: long experience with the sort of web forum that predated Reddit made me think that was the norm. Your posts are _your IP_, but when you hit 'post', you grant the forum a license to display it to others, edit it (within reason, see the rest of the TOS), etc.

    This obviously varies a bit by jurisdiction - I know when GDPR first came out we had a look at our rules and concluded that EU users did have rights the yanks didn't - but on the whole, it was easier for us to treat everyone as if they had the same GDPR rights unless they're being pricks about it. But yes - in general, posting somethign on the internet doesn't make it public domain, far from it. It makes it publically viewable, but, well, so are billboards, but Coke still own their logo, y'know?

    1. Yet Another Anonymous coward Silver badge

      But you could still use the picture of the billboard to teach a self driving car what billboards look like.

      Otherwise you are going to be licensing every image of every stop sign from the local authority that put it up, and they are going to need a license from the sign company that made it.

    2. katrinab Silver badge

      The main problem with claiming copyright on your users' posts is that they may not have the capacity to assign the copyright to you.

  4. tiggity Silver badge

    Oh dear

    Admittedly my view of reddit is a bit skewed (I'm sure there's some worthwhile content there) as my main interaction with it is when friends send links to reddit pages that are truly dreadful for me to cringe at ("dreadful" IMO - obviously views vary on what is good / bad content) - the sort of content where you are not sure whether to laugh or cry that contributors could write such deranged / divorced from reality / normal social convention comments.

    Hate to imagine an "AI" that had a substantial amount of its training data from reddit.

    1. GioCiampa

      Re: Oh dear

      I'm assuming the "AI" gets to ask "Am I The Asshole" on a regular basis...

  5. Dan 55 Silver badge

    This will also seemingly make life impossible for most third party apps, just like Twitter did.

    So Reddit saw Twitter's enshittification and thought it was good.

  6. Brewster's Angle Grinder Silver badge

    Typo. It should read: "Crawling Reddit, generating value and not returning any of that value to our users shareholders is something we have a problem with..."

  7. J. Cook Silver badge

    I wonder if this will hinder the youtube channels that contain nothing but text to speech of various reddit posts, which are gathering ad revenue for as little effort as possible.

