back to article Reddit goes AI agnostic, signs data training deal with OpenAI

Still upset that Reddit decided to sell all its content to Google for training its AI? Well, bad news: Now OpenAI has jumped into the mix as well. Reddit announced yesterday that it signed a deal with OpenAI to bring Reddit content "to ChatGPT and new products." The deal will involve OpenAI gaining access to the Reddit Data …

  1. Yorick Hunt Silver badge
    Mushroom

    Awww, how cute...

    The prostitute who lifted her skirts for one pervert, now realises she can make more money by lifting them for others too.

    All I can say to the "clientele" is you'd better watch out for nasty diseases during your foraging sessions.

  2. Anonymous Coward
    Anonymous Coward

    Reddit goes AI agnostic

    Seems to be highly concealed double speak for:

    We will allow AI to post on Reddit too. That's not only because we seek an easy excuse to say why we sell out to AI companies under the pretence that it makes life better for users. There are hidden much more cunning possibilities:

    Running experiments using the Reddit users for things like:

    * Free vetting of training data by Reddit users to improve training data quality; higher quality training data improves model performance quite a lot but human manual vetting of it is extremely expensive. Now OpenAI can just act as if it were a human poster and post several variants of the data in need for vetting. Then it can slurp the written reactions and upvotes and downvotes of *known* human posters and sort them by estimated poster quality and reputation. Then simply use AI (aka machine learning) to give a quality score for that version of the training data and compare it to other variants it posted.

    * More phycological like experiments such as Facebook is said to be caught doing multiple times. Think of things like: how long does it take before users notice the poster is an AI and not an actual human? If writing about the benefits versus disadvantages and dangers of AI, what type and style of arguments improves acceptance of AI (OpenAI's breath and butter)? What statements and comments shift known human responses to "AI needs less regulation to improve innovation" versus "AI needs more strict regulation to limit adverse affect on society, climate (excessive power consumption of AI), potential job loss..."?

    Comparable things can be said about the deal Stack Exchange made with OpenAI. Delivering good and reliable programming code is still one of the weaker points of LLMs. Now OpenAI can just pose as a human user and ask the questions in the domains it lacks, and it can post different variants itself of code and answer itself as "different human users" and see the response of actual *known* human users. Much like Facebook, it can (and most likely will) further gamify the system and experiment to make its algorithms / responses / interaction with actual human users such that these actual human users are enticed to come back more often to Stack Exchange and try and build themselves a better "reputation". That'll improve both average quality of the answers of *known* human users and the amount of them.

    The tricked and abused users (by the combination of psychologics studies and machine learning getting better and better at exploiting human addiction mechanisms and chemistry much like Facebook does) will help train LLM to get better (or less bad if you prefer) at doing programming jobs. With it, they'll freely train to have OpenAI taking more and more a slice of their job over time. Many might say that AI never will be able to program good enough to take over (part of) jobs. Don't forget however that very poor training data quality today negatively impacts the quality of the output today. Add to that that much of reinforced learning is a relentless optimization procedure and algorithm. And optimization algorithms is something computers are surprisingly good at. So surely if OpenAI would manage to get their hands on (trick) legions of free professional programmers to train their model and improve data quality and alleviate weaknesses, it's very reasonable to expect the average coding output to be better (or be less bad if you prefer to see it that way). And the threshold of quality to obtain a viable multi Billion business model is low: be no worse then trainees used in "low cost" off shore outsourcing firms but be quicker ('more productive"). Variant like "be half as good as those trainees for a tenth of the price and ten times the speed might sell like hot cakes in enough firms too.

  3. Anonymous Coward
    Boffin

    Reddit ChatBOT™

    Given the nature of the replies, I suspect most comments are artificially generated. Consisting of one line non sequiturs that have little-to-nothing to do with the main title. Also the ChatBots seem unable to respond to a direct question. Complain to a moderator and you're told to go outside and feel grass. Long term moderators with hundreds of subs under-their-belt. Obvious signs of psychopathology there. Unless Reddit has managed to produce a neurotic chatbot. So using this to train your A.I would be like feeding your A.I artificially generated gibberish.

    1. iron Silver badge

      Re: Reddit ChatBOT™

      ChatGPT has allegedly been trained on 4chan, it can't get any worse from Reddit.

  4. This post has been deleted by its author

  5. Omnipresent Bronze badge

    A whole lot of folks

    are about to find out.

    1: The internet never goes away.

    2: Evil never stops.

  6. Anonymous Coward
    Anonymous Coward

    Reddit is sh*t for c*nts

    Mind you, so is AI, so....

  7. mark l 2 Silver badge

    OpenAIi argument so far is that their LLM has been 'learning' not copying from stuff posted online, So therefore they didn't need to pay or ask permission from publishers for training their LLM on copyrighted content.

    But then months later OpenAI strike a deal with Reddit to pay for access to their content!

    Surely if OpenAI truly believed their own argument they didn't need to pay anything to Reddit, as they could just 'learn' from the content from the Reddit website for free.

    I think they should just start settling now with the publishers before it gets to court, as any good lawyer will be able to tear that defense to shreds now and it was already pretty flimsy to begin with.

  8. Anonymous Coward
    Anonymous Coward

    I have stopped consulting Reddit since they started blocking VPNs. No big loss. There are always alternatives.

  9. The Sprocket

    Failing

    I find it amazing that the AI training models being chosen have the intellectual firepower of a child failing grade 5. No confidence in AI whatsoever.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like