They Might Not Need No Stinkin' APIs
Massively-parallelized, data-rate-limited screen-scraping might fly under Reddit's radar. Not that megacorps wouldn't try to save a bunch of money by trying such tricks.
In a move seemingly designed to stop being used as a free training library for large language models, megaforum Reddit said it's going to begin charging companies who make excessive use of its data-downloading API. "As a platform with one of the largest corpus of human-to-human conversations online, spanning the past 18 years …
The note on users owning their posts is interesting: long experience with the sort of web forum that predated Reddit made me think that was the norm. Your posts are _your IP_, but when you hit 'post', you grant the forum a license to display it to others, edit it (within reason, see the rest of the TOS), etc.
This obviously varies a bit by jurisdiction - I know when GDPR first came out we had a look at our rules and concluded that EU users did have rights the yanks didn't - but on the whole, it was easier for us to treat everyone as if they had the same GDPR rights unless they're being pricks about it. But yes - in general, posting somethign on the internet doesn't make it public domain, far from it. It makes it publically viewable, but, well, so are billboards, but Coke still own their logo, y'know?
But you could still use the picture of the billboard to teach a self driving car what billboards look like.
Otherwise you are going to be licensing every image of every stop sign from the local authority that put it up, and they are going to need a license from the sign company that made it.
Admittedly my view of reddit is a bit skewed (I'm sure there's some worthwhile content there) as my main interaction with it is when friends send links to reddit pages that are truly dreadful for me to cringe at ("dreadful" IMO - obviously views vary on what is good / bad content) - the sort of content where you are not sure whether to laugh or cry that contributors could write such deranged / divorced from reality / normal social convention comments.
Hate to imagine an "AI" that had a substantial amount of its training data from reddit.
This will also seemingly make life impossible for most third party apps, just like Twitter did.
So Reddit saw Twitter's enshittification and thought it was good.