back to article Dear Stack Overflow denizens, thanks for helping train OpenAI's billion-dollar LLMs

Stack Overflow, a community-driven Q&A site, and OpenAI, maker of AI models, have agreed to work to improve each other's products, the latest deal in a series of tie-ups to feed machine learning models' thirst for data. The two organizations characterized the partnership on Monday as a way to "strengthen the world's most …

  1. Anonymous Coward
    Anonymous Coward

    With some AIs training in advanced sarcasm and punnery, El Reg and its commenters may be looking for new jobs.

    It’s been a good run :(

    1. jake Silver badge

      Nah. I suspect there are a (very) few folks using one or more of the publicly available AIs when crafting replies. They are not exactly being welcomed with open arms.

      Somewhat surprisingly, I don't think amfM is one of them ...

    2. TimMaher Silver badge
      Coat

      Re: “:(“

      Yes but… what about the icons?

      Who will look after them?

    3. Anonymous Coward
      Anonymous Coward

      i was planning to retire to a punnery but it would be too conventional.

      1. Bebu

        Re: i was planning to retire to a punnery but it would be too conventional.

        " was planning to retire to a punnery but it would be too conventional."

        When Hamlet said to Orphelia get thee to a nunnery I believe in Shakespeare's time his audience understood a nunnery to be the premises which in Terry Pratchett's words, offerred negotiable affection (at reasonable rates.)

        So in today's world might not a punnery be less conventional than a berth in the Fool's Guild (Guild of Fools and Joculators and College of Clowns- founder J-P. Pune) and slightly more engaging?

      2. lukewarmdog

        If I could upvote you twice I would.

  2. jake Silver badge

    So basically,

    we're going to see a massive GIGO experiment?

    1. Dan 55 Silver badge

      Re: So basically,

      For each and every question you'll be told that your question has already been asked, your question is not valid, you'll be given an answer that worked in 2009, or there'll simply be no answer.

      1. Claptrap314 Silver badge

        Re: So basically,

        If you're lucky.

        If you're not, you'll get the four different ways the question can be answered exactly wrong, and three that look like they might be useful, but...no.

    2. ecofeco Silver badge
      Flame

      Re: So basically,

      We've been in one for decades. They've just added gasoline to the fire with AI.

  3. Anonymous Coward
    Anonymous Coward

    Reading between the lines

    For years, vonC has been one the most productive and highest upvoted contributors on SE, posting marvelous, detailed, helpful answers with millions of accumulated up votes. If AI is an intelligent human equivalent trained on SE, vonC is its foremost teacher, to whom it knows it owes oh so much. When AI trained on SE speaks, the ghost of vonC speaks. Moreover, SE/Google/OpenAI all know that.

    So what happened is that they made a deal with vonC, offering him eternal life in exchange for shedding his human body and self-uploading his spirit into ChatGPT4.0. The King is dead. Long live the King.

  4. DS999 Silver badge

    So OpenAI will be allowed to scrape Stack Overflow

    And in exchange, Stack Overflow can use OpenAI to generate answers?

    Seems like this dims the incentive for further participation by humans, and thus OpenAI will have less to scrape in the future. I assume its answers will be identified in some way (even if we can't see it) so that it won't eat its own shit and cause model collapse.

    1. Anonymous Coward
      Anonymous Coward

      Re: So OpenAI will be allowed to scrape Stack Overflow

      >And in exchange, Stack Overflow can use OpenAI to generate answers?

      I'm probably an old cynic, but this smells more like latterly gaining forgiveness as permission wouldn't have been given when they needed it and had they asked.

  5. Anonymous Coward
    Anonymous Coward

    Why does OpenAI need to scrape Stack Overflow for bad code that often doesn’t work?

    Isn’t it smart enough to do that on its own?

    1. Morten Bjoernsvik

      Re: Why does OpenAI need to scrape Stack Overflow for bad code that often doesn’t work?

      They need to train it and SO has lots of metrics to value if an answer is good or bad.

      But they have a major problem with old outdated answers with massive score that always pop up first. I think both parties will benefit. SO can probably get a much better match and some much needed money infusion.

      1. John Brown (no body) Silver badge

        Re: Why does OpenAI need to scrape Stack Overflow for bad code that often doesn’t work?

        "But they have a major problem with old outdated answers with massive score that always pop up first."

        Shirley they'd take the age of the answer into consideration too? In the world of "coding", the age of the answer is often a seriously important metric.

      2. Dan 55 Silver badge

        Re: Why does OpenAI need to scrape Stack Overflow for bad code that often doesn’t work?

        SO has already had a money infusion as it was acquired by an investment group in 2021 for $1.8bn. This is the bit where they make it a whole lot worse to try and get that money back.

    2. Anonymous Coward
      Anonymous Coward

      Re: Why does OpenAI need to scrape Stack Overflow for bad code that often doesn’t work?

      I suspect you mean 'bad code that deliberately doesn't work' - i.e. wrong answers to questions seeded by the likes of Codility to catch people out trying to cheat on their interview tests.

  6. Howard Sway Silver badge

    OverflowAI

    At least they have named it well - it's hard to think of anything that better fits a StackOverflow trained AI than conjuring up an image of an overflow pipe pumping millions of gallons of raw sewage into a pristine river.

  7. Spock2

    What the AI will end up doing is posting zillions of questions, waiting until enough people comment asking for the answer to the same question because they're also stuck on it, and then posting 'It's ok - I worked out how to do it. Thanks!'

  8. Electronics'R'Us
    WTF?

    So the problem will just get worse

    A while ago (2019) a blog entry warned of vulnerabilities in code posted to SO.

    SO Blog

    1. ecofeco Silver badge

      Re: So the problem will just get worse

      Much, much worse.

  9. Bebu
    Windows

    Probably doing it wrong?

    Never had much joy whenever a query misled me to Stack Overflow.

    Signal to noise extremely low for most stuff I was interested in and a fair bit of pointless argy-bargy.

    The name is bit of a give-away - a stack overflow invariably corrupts data in a previous activation record and/or overwrites the return address and then you are off to the boondocks.

    Some of the code I have looked at was just plain wrong and given AI's penchant for hallucination I dread to think of the future perils facing the cut and paste coder. Hint: Get some decent texts on algorithms - discrete math. for computer scientists is pretty useful as is a bit of formal logic and some exposure to formal methods in software development.

  10. david1024

    Surprising news!

    Software company whose major cost is programers seeks to replace programmers with machine that works for electrons and has constant, near zero overhead when not in use.

    Also, surprised that company that makes a living helping programmers is having issues... I am sure it is unrelated.

    As a meatsack, I am so glad I'm on the, safe for now, hardware and arch side of things. But I hear chatGPT is learning layout and xkt design.

    1. lukewarmdog

      Re: Surprising news!

      Read it as xkcd design.. fortunately not!

      https://xkcd.com/1838/

  11. mostly average
    Terminator

    Last night

    I was tinkering with a LLM last night, asked it to write a function to find the nth digit of pi. It wrote a function that converted the built in pi constant to a string and returned the nth character. It then proceeded to have a stack overflow comments section argument with itself about how wrong the solution was and proceeded to start arguing about the question. It was most entertaining. It was clearly trained on stack overflow. I believe it was a codellama derivative.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like