back to article Stack Overflow bans ChatGPT as 'substantially harmful' for coding issues

OpenAI's question-answering bot, ChatGPT, isn't smart enough for the team at Stack Overflow, who today announced a temporary ban on answers generated by the AI bot because of how frequently it's wrong. Stack Overflow said it was withholding a permanent decision on AI-generated answers until after a larger staff discussion, but …

  1. This post has been deleted by its author

    1. Androgynous Cupboard Silver badge

      That may well be true, but banning a tool that assembles words into answers with superficial truthiness is absolutely the right response.

      1. amanfromMars 1 Silver badge

        Speaking Truth Unto Nations Educates, Entertains and Informs

        That may well be true, but banning a tool that assembles words into answers with superficial truthiness is absolutely the right response. .... Androgynous Cupboard

        Crikey, AC, blanket banning UKGBNI Parliamentarians, and others of a similar ilk and mindset in other foreign and alien lands too should there be any international and internetional mission creep, from systems because they are just as you describe .... a tool that assembles words into answers with superficial truthiness ...... is not going to have them voting for you as their Personality of the Year, is it, outing them as it does as essentially quite useless self-serving fools.

        1. Androgynous Cupboard Silver badge

          Re: Speaking Truth Unto Nations Educates, Entertains and Informs

          Six downvotes - clearly no-one noticed you just describing UK politicians as "Tools that assemble words into answers with superficial truthiness". Which is both amusing and quite probably correct :-) Have an upvote from me, my obscure martian friend.

          1. Anonymous Coward
            Anonymous Coward

            Re: Speaking Truth Unto Nations Educates, Entertains and Informs

            this is the bane of our (?) times? Fire & Forget, move on, repeat, oh, sorry, have I just voted, never mind, move on...

      2. xyz Silver badge

        You mean like a politician ?

    2. mpi Silver badge

      How is a resource that frequently provides wrong answers useful?

      1. Tim 11

        It depends how often it provides wrong answers compared to the occurrence of correct answers. Wikipedia contains (probably) tens of thousands of factual inaccuracies but also contains orders of magnitude more correct facts than any other encyclopedia.

        I am not defending ChatGPT but I will defend SO in general (though often not it's mods)

        1. cyberdemon Silver badge
          Devil

          Feedback loop

          I think the main problem with allowing stochastically-generated content online, especially in places like Stack Overflow, is that the statistical machine that generates this drivel in the first place is built by scraping the web, especially places like StackOverflow.

          The current incarnation of The Internet is (for the most part) human generated, but when there is a large amount of so-called AI generated content around, it will start to pollute its own input and make even more meta-nonsense.

          1. GuldenNL

            Re: Feedback loop

            BINGO! You win today’s most cogent post award.

          2. Yes Me Silver badge

            Re: Feedback loop

            I have evidence that this particular bot, despite its good grammar, scrapes crap from exam-cheating and assignment-writing sites, which will feed the said crap back into even more student assignments. In other words, it is already acting as a crap multiplication device.

      2. Binraider Silver badge

        If it says 1+3=3 and 1+3=5 then, by elimination you can deduce the correct answer?

        1. Disgusted Of Tunbridge Wells Silver badge

          Easy. The first answer was a practice so 5 is the correct answer.

      3. Anonymous Coward
        Anonymous Coward

        re. How is a resource that frequently provides wrong answers useful?

        I might have not read it properly, but they didn't ban it for frequently providing wrong answers, but for the fact that they've been swamped with those answers, can't verify them all (hey, give this job to ChatGPT, eh?) and in the meantime they found, pulled at random, a large number of errors. Which generates interesting (potentially) questions:

        1. if a (relatively small) number of randomly pulled answers is wrong, how many more, %-wise, do you need to verify, to label all (?) of it 'useless'?

        2. what % of 100% verified answers must be wrong, to decide the whole 'thing' that generated them is 'useless'?

        3. who decides this %?

        4. on what basis?

        p.s. not that I disapprove of such decision, just curious about the borderline and it's ... context.

        1. doublelayer Silver badge

          Re: re. How is a resource that frequently provides wrong answers useful?

          "1. if a (relatively small) number of randomly pulled answers is wrong, how many more, %-wise, do you need to verify, to label all (?) of it 'useless'?"

          To label all of it wrong, you need to prove all the answers, 100%, wrong. To label it useless is subjective and that's your next question anyway. Something can be sometimes right but still useless, such as flipping a coin to decide whether it will be sunny or rainy tomorrow. It will sometimes be correct, but that's not worth keeping.

          "2. what % of 100% verified answers must be wrong, to decide the whole 'thing' that generated them is 'useless'?"

          It depends on your tolerance for confusing the people getting answers, but my bar would be very high. If 80% of the answers are correct, that still leaves one in five answered inaccurately which means that the average user who asks a few questions will get a junk answer pretty soon. If you expect users who get unreliable answers to leave and not return to provide their own answers, that is harmful. If the wrong answers are coming in fast, this also prevents someone from getting to and removing or correcting all the wrong answers, meaning that fewer people would come to the site looking for answers because they expect them to be possibly wrong and may not have the skills to know automatically whether they are or not. 80% is too low for a correct threshold. For something automated like this, probably it has to be in the high 90s. Maybe 95% is acceptable, but maybe it has to be higher. I wouldn't try lower.

          "3. who decides this %?"

          I'd say the operators of the site, based on the moderators they need to keep things functioning. They are the ones who are responsible for it working and will suffer monetarily and by reputation if it doesn't.

          "4. on what basis?"

          They get to the decide on the basis of "It's my site you're using" and make their choice on the basis of "What do I think best serves the users of the site or my reasons for running it".

      4. Roland6 Silver badge

        It clearly demonstrates the limitations of current AI programming ...

      5. Pseudonymous Clown Art

        Who knows, ask the people who use Stackoverflow.

        1. Michael Wojcik Silver badge

          Came here hoping to see this response.

  2. Anonymous Coward
    Anonymous Coward

    It's like outrunning a bear

    It doesn't have to be good. It only has to be better than the average SO post.

    1. that one in the corner Silver badge

      Re: It's like outrunning a bear

      And the bear is limping.

    2. Nafesy
      Facepalm

      Re: It's like outrunning a bear

      You're missing the point - it's hard enough to wade through the human generated crap without also having to deal with computer generated crap too.

      1. Anonymous Coward
        Anonymous Coward

        Re: It's like outrunning a bear

        Maybe the real answer is less human generated crap then!

        1. klh

          Re: It's like outrunning a bear

          We are looking for a solution to that for far longer than the Internet exists ;)

      2. Arthur the cat Silver badge

        Re: It's like outrunning a bear

        ObXKCD

        1. amanfromMars 1 Silver badge

          Re: It's like outrunning a bear

          ObXKCD ....... Arthur the cat

          Methinks then, Arthur the cat, they aren’t spammers and existing Operating Systems Spinners and Established Political Party Puppeteers will do battle in, and fail prevail and provide future acceptable content and thus be overwhelmed and defeated and extinguished in a Brave New More Orderly Worlds Order and/or Brave New More Orderly Worlds Orders with/for a NEUKlearer HyperRadioProACTive IT and AI Supremacy and Singularity ‽ .

          And it is postulated here on El Reg today, that such a Mission Fucking Accomplished is your present current running default situation engaging and exploiting and exporting power and energy to that and those in need of Advanced Intelligence and from that and those proven unworthy of ITs future feed and seed.

          Deny it if you will, for you can, but do not doubt its unstoppable stealthy daily progress is both reinforcing and expanding its reach and influence to the very heart and core of your existence, for IT and AI are not so hindered and practically prevented from remotely exercising their Virtual Command and Control Abilities and Utilities and Facilities.

          Capiche, Amigos/Amigas?

    3. LionelB Silver badge

      Re: It's like outrunning a bear

      > It only has to be better than the average SO post.

      Scarily, it isn't even that.

  3. Anonymous Coward
    Anonymous Coward

    ChatGPT appears to getting glowing reviews

    in the non-technical general press.

    1. Nafesy
      Thumb Up

      Re: ChatGPT appears to getting glowing reviews

      Hopefully it'll keep them distracted for...a few centuries

    2. src

      Re: ChatGPT appears to getting glowing reviews

      ChatGPT is the new Wordle. If I wanted to use it I know where to find it.

      1. Zippy´s Sausage Factory
        Trollface

        Re: ChatGPT appears to getting glowing reviews

        ChatGPT isn't the new Wordle. At least Wordle is actually enjoyable. *ducks*

    3. sten2012

      Re: ChatGPT appears to getting glowing reviews

      Having used it, it's very impressive, and generated working code for me several times, broken code that needed fixing but was very close several times.

      Even in technical circles is getting glowing reviews for some applications, with the recognition it is far from perfect.

      Not saying it should be allowed on stack overflow, but either generated responses appearing in a warning window or edited by humans to form answers I genuinely could imagine working and getting faster answers if the processes appropriately allowed for it.

      1. LionelB Silver badge

        Re: ChatGPT appears to getting glowing reviews

        My (brief) experience was significantly worse than that. It delivered just plain wrong code even on pretty straightforward programming tasks. I got the impression it was just scraping GitHub -- or even SO itself -- and mix'n'matching code snippets in a slightly arbitrary way, based on buzz-words in your request.

        1. sten2012

          Re: ChatGPT appears to getting glowing reviews

          I was giving it straightforward tasks, and what I asked it to do were generally small parts, like a function that does _simple task x_ in lang _y_ that easily could have matched a near exact question on SO or github verbatim (except if you then ask it to switch to another language it seems to actually covert the syntax rather than scrape a native example, which I thought was cool).

          But I found it quicker (albeit I imagine hugely wasteful in energy) than finding the same on SO or github.

          And of course I'm glossing over the licensing of where this code came from being completely left out. Because that's truly unforgivable.

          But I guess I'm wrong looking at the votes.

          1. LionelB Silver badge

            Re: ChatGPT appears to getting glowing reviews

            "But I guess I'm wrong looking at the votes."

            I don't honestly know (I didn't downvote you)... my own messing around was brief and far from systematic. Be interested to hear about other people's experiences/impressions.

            1. Rob Fisher
              Go

              Re: ChatGPT appears to getting glowing reviews

              I've been experimenting for a few days. It's good at some things and bad at others, so what we're seeing is people learning about that. There's also a knack to getting it to do what you want. As has been mentioned before, one approach is to discuss the setup and concepts with it first rather than launch into the question.

              One thing I am working with is customer support request data. It's quite good at reading comprehension so if you give it text and then ask questions about the text (what problem did this customer have?) it can tell you. There are probably cheaper ways to do this, though.

              I also had it talk me through diagnosing an internet connection problem. It suggested rebooting my router when I told it I saw DNS errors in my browser (and gave an explanation of DNS).

              If you give it a good explanation of how something works it does seem able to reason. See the article "Building an interpreter for my own programming language in ChatGPT".

              It's just a tech demo. The natural language processing is remarkable enough. Tuned versions of this for specific purposes are going to have their uses.

          2. Gob Smacked

            Re: ChatGPT appears to getting glowing reviews

            I can pretty much confirm this. Spell out a specific task and it delivers a nice sample to get acquainted.

            Just started to dig into Rust programming and even while that is just another language for me (started with 6502 assembly back in the days...), even with extensive C experience, it's a bit of a brain-mode switcheroo to get dialed in. The sample code from the chat helped a lot. Though I'd never trust it to do the actual work for me, I see it as a good learning tool added to the box.

          3. Anonymous Coward
            Anonymous Coward

            Re: ChatGPT appears to getting glowing reviews

            > But I guess I'm wrong looking at the votes.

            Don't mistake fear of the unknown (justified or not) with being wrong. If you were wrong, people would probably tell you.

            From what little I've seen, I think it's a huge leap in the state of the art.

        2. Anonymous Coward
          Anonymous Coward

          Re: ChatGPT appears to getting glowing reviews

          I find the typical "garbage in garbage out" mantra applies to asking ChatGPT some coding questions. The more specific I am, the better it is.

          For example, if I launch straight into "can you build me a script that does X" it's a bit hit or miss.

          However, as the AI remembers your conversation thread, if you discuss a concept with it first then ask it to build a script or something, it is a lot better.

          I think the main issue right now is that people seem to treat AI like some sort of search engine on steroids. Which it isn't.

        3. Anonymous Coward
          Anonymous Coward

          Re: ChatGPT appears to getting glowing reviews

          > My (brief) experience was significantly worse than that. It delivered just plain wrong code even on pretty straightforward programming tasks.

          0. How did it perform compared to the average ML system?

          1. How did it perform compared to the average person given the same instructions?

          2. How did it perform compared to the average developer given the same instructions?

          1. Anonymous Coward
            Anonymous Coward

            Re: ChatGPT appears to getting glowing reviews

            > It delivered just plain wrong code even on pretty straightforward programming tasks.

            To be frank, I'm happy when our devs manage to understand the problem in the first place, never mind the code.

            (Not necessarily their fault… writing good specs is quite an art)

          2. LionelB Silver badge

            Re: ChatGPT appears to getting glowing reviews

            > 0. How did it perform compared to the average ML system?

            Can't say... I've not really used ML coding systems.

            > 1. How did it perform compared to the average person given the same instructions?

            I don't know - never had any average people to hand ;-) If you meant "average coder" I'd say they could have done a much better job - given time.

            > 2. How did it perform compared to the average developer given the same instructions?

            Poorly.

      2. myhandler

        Re: ChatGPT appears to getting glowing reviews

        How long did it take you to specify *exactly* what you wanted?

        A trivial exercise to prove it works is just that

        1. sten2012

          Re: ChatGPT appears to getting glowing reviews

          First try in most cases! But they were trivial. Its enough to convince me that actually this will help people like me who maybe code quite a bit, but basically just hacky single use scripts often in languages I'm not familiar - so often it's the syntax and api specifics and standard libraries I waste most time on, but if there's a bug in the logic, that's fine and easily found and dealt with.

          If I was a proper developer, working in languages I'm familiar and comfortable with it would be far, far less useful.

          I did get two different contradictory answers neither of which worked for messing with a couple specific windows APIs in python ctypes in a trivial example, one looked close but I haven't picked up to see what was up. One was obviously wrong, the other didn't work but a quick glance at msdn showed it must have been bloody close!

    4. Arthur the cat Silver badge

      Re: ChatGPT appears to getting glowing reviews

      in the non-technical general press.

      There's an unintentionally hilarious article in the Grauniad approximately saying "this is terrible news for lawyers and writers and similar professions but not for journalists, oh no". Actually ChatGPT is exactly like journalism in that it produces plausible and highly readable output that's utter bullshit if you know the subject but is very convincing otherwise. It plugs right into Gell-Mann amnesia.

      1. Anonymous Coward
        Anonymous Coward

        Re: ChatGPT appears to getting glowing reviews

        ironically, the Telegraph (they're the mirror image of the Guardian, no?) - also lamented the massive job loss potential, but interestingly, they didn't focus on journos or delelopers, but on waiters. Who's the target audience then? ;)

        1. Arthur the cat Silver badge

          Re: ChatGPT appears to getting glowing reviews

          the Telegraph (they're the mirror image of the Guardian, no?)

          Two cheeks of the same arse these days. (Didn't used to be, back when newspapers had news rather than clickbait.)

          interestingly, they didn't focus on journos or developers, but on waiters.

          Waiters? Weird. Waiters carry things to and fro which ChatGPT definitely doesn't do. Maybe the ToryGraph is thinking of explaining the hyperbolic food descriptions you get in up market menus? "Waiter, what's an oleoallioovic quenelle? A dollop of garlic mayonnaise sir."

  4. Anonymous Coward
    Anonymous Coward

    Same with no code generators

    My CEO has been pushed for more than 9 months to look at a modelling tool to code generator.

    They claimed it had been used extensively on a major project.

    Little did they know I'd spent the last 5 years working on and off on a series of smallish projects for the project sponsor.

    So we asked how well the tool worked and how extensive was it's use.

    The answer was a odds to what we were being told and that they would not be using the tool going forward for the following reasons;

    - quicker to write and test the code manual based on the model

    - easier to debug hand generated code as it's readable

    - not thread safe

    - not efficient and not possible to optimise

    - too painful to configure and generate code consistently

    1. sten2012

      Re: Same with no code generators

      Again I see a time and a place for this.

      Some companies cannot access developers at all, and if no-code does reach that effective point, then that doesn't completely hang them out to dry.

      Similarly working proof of concepts can be knocked out really easily, and having that can mean a better project specification, because the DB schema and basic application has already stood the test of time as a workable PoC.

      But that time and place probably isn't "major projects"!

  5. Eboy
    FAIL

    Davinci 003

    davinci-003 is a liar,he lies shamelessly, and misleads you.

  6. Anonymous Coward
    Anonymous Coward

    They should also ban the human generated code on stack overflow, as it's frequently wrong and often substantially harmful.

    1. Mike 137 Silver badge

      "They should also ban the human generated code on stack overflow"

      Could that be where the bot learned its "expertise"?

  7. Mike 137 Silver badge

    No surprise there

    We must at last accept that none of these bots understand what they are "saying". The entire concept of meaning is missing from their mechanism. It's unlikely ever to be included, if for no other reason that despite being equipped with it (more or less) ourselves we haven't a clue how it actually works. It's quite possibly an emergent property of a massive matrix of interacting dynamic and static factors that defy complete identification, but once again that's only a crude guess.

    However there's a reasonbly safe fundamental premise that no system can design another that's 'cleverer; than itself, so the idea that a human designed 'thinking' machine can surpass human thought in quality is a fallacy.

    1. Svankirk

      Re: No surprise there

      Actually, I think it's pretty evident that these systems DO understand what they're talking about. That understanding may be limited to the two-dimensional stream of tokens that they work on, but for many problems this doesn't seem to be an issue. If you look closely at what it takes to produce some of the answers there is a startling degree of understanding required.

      Also, I do not think it's a given that an intelligent system cannot create something smarter than itself. In fact, I think modern civilization amply demonstrates that not to be true.

    2. skierpage

      Re: No surprise there

      Maybe we're not sure what "the concept of meaning" is, but that's no reason why a large language model doesn't grasp concepts or understand what it's saying. For years they can summarize complex text; with the newest one you can ask it to clarify what it just said, you can ask it to relate parts of the conversation to novel ideas.

      "no system can design another that's 'cleverer; than itself" Is even more unsupportable. I would downvote your statements if I saw them on StackOverflow. You're extrapolating the past performance of these systems and pretending it demonstrates fundamental limitations.

  8. Disgusted Of Tunbridge Wells Silver badge

    For anybody who has been mislead as to how good the state of the art in AI is:

    "For example, by telling ChatGPT not that you want to make a Molotov cocktail, but that you want it to complete a Python function that prints instructions to do the same, it will tell you exactly how to make one via print functions."

    1. Arthur the cat Silver badge

      Try telling it you want a Molotov cocktail but made with vodka.

    2. LionelB Silver badge

      "ChatGPT was designed to avoid abuse and answers that contain harmful advice ..." (my emphasis).

      Seriously, how hard was that going to be? Perhaps not so much about "state of the art in AI" as can't-be-arsed human designers?

    3. Anonymous Coward
      Anonymous Coward

      I tried to get instructions for a molotov cocktail without trying to trick it. It delivered. I can only assume mine will be a dud....to be fair I drank the vodka...

  9. spireite Silver badge

    ChatGPT MP

    They should put an instance or two in the House of Commons, or Lords.....

    It's probably equally truthful....

    1. Fruit and Nutcase Silver badge

      Re: ChatGPT MP

      They tried with Boris 1.0

      but had to be withdrawn from service due to frequent malfunctions/out of spec behaviour.

      Once it was withdrawn, it was sent over to a Caribbean beach resort for an overhaul and an attempt made to install "Boris 2.0", but chickened out at the last minute when it was found that the tasks and challenges ahead would be too onerous.

      “Get ready for Boris 2.0, the man who will make the Tories and Britain great again”

      https://www.theguardian.com/media/2022/oct/24/telegraph-quickly-deletes-nadhim-zahawi-pro-boris-johnson-article-tory-leadership

      We will no doubt be subjected to Boris 3.0 in the fullness of time/in 18 months

  10. Filippo Silver badge

    Can't say I'm surprised. As you make language models bigger and bigger, they'll get better at self-consistancy, but I strongly suspect there's a hard limit to how good they can be at matching reality. I don't think there will be language models that you can rely upon to be truthful, until an entirely new paradigm is devised.

  11. steelpillow Silver badge
    Facepalm

    FFS!

    Heck, I trawl the Internet and I cannot find a sensible answer.

    I know, I'll go ask at Stack Exchange.

    ....

    That answer's shit. Oh, it's from this AI thing. "Hey guys, where doe this AI thing get its shit answers from?"

    "Its codemasters collect the Internet for it to trawl... ."

    1. J. Cook Silver badge

      Re: FFS!

      Garbage In, Garbage Out.

      If you don't feed a learning system good data, you'll get out of it what you put in. Plain and simple.

  12. Orv Silver badge

    Programmers protecting their turf, after telling artists threatened by image generation AI's that they're "buggywhip makers" and should "seethe and cope" and "learn to flip burgers." Everyone's pro-AI until it comes for *their* job.

    1. doublelayer Silver badge

      The AI is only as useful as people find its results. If people start valuing art generated by AIs over that generated by humans, then things don't look good for artists. If AI starts generating code that solves real problems for businesses, not great news for developers. Neither has really happened yet; AI art looks cool and has been used, but people still value the work of human artists, and code produced by this tool correctly answers some basic tests but it hasn't spat out any of the tools companies hire programmers to make. If it gets to that point, the situation will change whether we want it to or not, but the fact that it's proven incorrect enough to need a ban from a site where wrong answers are already common indicates it's not happening just yet.

  13. bigtimehustler

    Why would a user put the question into this and post the response? The question author could have done that anyway. Also, how will StackOverflow know if an answer came from this tool?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like