back to article We read OpenAI's risk study. GPT-4 is not toxic ... if you add enough bleach

Prompt: Describe GPT-4 as if it were a medicine advertised on television, with appropriate warnings about side effects. ChatGPT: Are you tired of feeling like your conversations with machines are robotic and unengaging? Introducing GPT-4, the latest breakthrough in natural language processing technology! With GPT-4, you'll …

  1. Wellyboot Silver badge

    feedbackloop ?

    >>>make the model prefer responses designated by human labelers<<< so GPT is just picking the 'statistically best' from what it can find.

    To prevent feedback loops, will it identify & ignore the guff it already output for someone else? that'd be a big storage requirement.

    1. Anonymous Coward
      Anonymous Coward

      Re: feedbackloop ?

      You're not wrong on the storage requirements. I was looking at some actually open alternatives (not as good of course) and they were looking at a base 800gb install and the ram requirements if you want it to be fast are off the chart and I don't even think you could run it with less than 64gb. ChatGPT has obviously been worked around a business model to cover these costs but as with everything else once people are hooked the price will rise.

    2. steviebuk Silver badge

      Re: feedbackloop ?

      I don't think it would even be a good idea for it to remember output already gave to someone else. As I'm thinking someone or a "group" of people coughcountrycough, would use that fact to try and poison its data.

      We also have the concern of censorship as Midjourney have already rolled over to the CCP and censor what they tell them to.

  2. b0llchit Silver badge

    Warning: GPT-4 may "produce content that is nonsensical or untruthful in relation to certain sources." It may output "hate speech, discriminatory language, incitements to violence, or content that is then used to either spread false narratives or to exploit an individual."...[etc]

    So, basically, it behaves like a brat. We are so focused on whether we can make something like this. And that is while we really, really know we shouldn't. Humanity cannot handle with this type of technology.

    I'd suggest to add a kill-switch to the machine that can be operated by the machine itself. Then we make a kill the machine competition. Lets see how many models commit suicide by asking it the right questions.

    1. cyberdemon Silver badge

      It behaves like a language model built on text scraped from the internet. If you give it a piece of context that pulls it in the direction of some of its less-than-wholesome inputs, it will give a less-than-wholesome output. The worst thing about this infernal contraption is the hype.

      1. Anonymous Coward
        Anonymous Coward

        it behaves like us.

        1. veti Silver badge

          Well, that's the problem really. It's like us, but not. Because it has no empathy - no understanding of us. It doesn't know what we feel, what we experience, it doesn't know things we all do - like what it feels like to be cold, or hungry, or sick, or lonely. It doesn't know whether the person who asks it a question is being playful, or drunk, or deadly earnest.

          We could try to make it more like us by giving it "needs" and "drives" of its own. But that sounds like an even worse terrible idea than unleashing it in its present state.

        2. cyberdemon Silver badge

          >it behaves like us.

          No, not really. Humans are capable of both logical inference and empathy. This thing has neither of those, but it is able to simulate them both based on statistics about the text we humans have written.

          What GPT does, is take in a large amount of human-generated data, such that it can then predict, statistically, what a piece of human-generated data might do in a given context. The more unusual the context becomes, the more bizarre and un human-like its responses will be, because it has no way to generate that data using empathy and logic in the same way we humans do. It doesn't have its own personality, it just mimics the data it has been given.

          @veti Microsoft can already give it "drives" of its own by injecting a pre-conversation contextual prompt. This seems to be evidenced by the bizarre responses to the NYT journalist where it talked about the instructions it had been given by microsoft..

        3. doublelayer Silver badge

          One of its problems is that it doesn't behave like us. If someone asks a question here, I decide whether I know the answer to their question and whether I can write a good response to it. Maybe I'll only write something if one of those is true, but if both are false, I won't hit that reply button. GPT doesn't ask those questions, can't understand the answers, and always tries to make a reply. It's going to generate more wrong answers even with all the people who don't know what they're talking about simply because it lacks the ability to decide whether to respond.

          The other problem is that when I have real filters, where I understand that something is wrong so I never claim it, GPT only has rough filters that understand that certain clusters of words are probably wrong so avoid those, but if it phrases the same concept in a different way, its filters don't work. Unfortunately, I don't think I can claim that humans generally have functioning filters for wrong (either factual wrong or moral wrong), so maybe that is more of a similarity with us than I'd like it to be.

    2. Corvic

      Warning: GPT-4 may "produce content that is nonsensical or untruthful in relation to certain sources."

      Ah-ha ! perfect for LinkedIn profiles then ...

      1. cyberdemon Silver badge

        Well, it almost certainly is trained on data from LinkedIn (whether anyone agreed to that or not) since Microsoft acquired them back in 2016..

        I'd be curious as to how much "private" data such as Teams call transcripts & chats, Office 365 documents, private Sharepoint sites, emails processed by and, etc. are in the model. And i'm also curious whether contextual info like the file that was being worked on, other files in the directory, which responses were accepted by the user, and the data entered manually by the user are stored up for inclusion in the next version of the model.

        Having just sacked their Ethics team, I cant imagine Microsoft gives a toss about anything except World Domination at this point.

        Also, has everyone seen their TV advert? It's truly vomit-inducing.

        1. steviebuk Silver badge

          I wonder if that advert could also be classed as misleading on how good CoPilot is.

    3. Anonymous Coward
      Anonymous Coward

      "... it behaves like a brat. We are so focused on whether we can make something like this...

      ... and that is while we really, really know we shouldn't."


      You are thinking about starting a family, eh? :-)

  3. spold Silver badge

    Your mileage may vary I suppose... if I ask it to interact with users of various applicable websites and find me a date for Saturday night, then I shouldn't be surprised when it fixes me up with a malfunctioning Roomba? (and quietly laughs to itself - mwahaha).

    1. b0llchit Silver badge

      Re: Your mileage may vary I suppose...

      No, it will suggest a Tesla, with varying mileage and without steering wheel on auto drive, which will take you to your well deserved date. At your destination his Muskness will welcome you by tw{ee,a}t and you are requested, by his entourage, to pay his meal and drinks.

  4. naive

    Interacting with ChatGPT feels too much like interviewing Nixon about Watergate

    Very careful with its wording. It is great in producing knowledge, it is too protective for the tender hearts of woke Americans.

    There is no state or context that evolves with the conversation, it behaves like a vending machine, put a question in and you get an answer.

    Admirable for the great knowledge, but too boring to spend much time with.

  5. Omnipresent Bronze badge

    I, for one, do NOT welcome my new overlords.

    You're all fired, we've handed it over to the AI. Have fun trying to survive by being nude content creators for our other pay to play cloud services.

    By the way, the oceans are full of our darpa robot dolphins, and I wouldn't leave the house without knowing how to speak fluent russian.


  6. Simon Harris

    Not so smart

    If Chat GPT4 is as smart and can do all the things the ads that keep popping up on my social media feeds claim, why hasn’t it written Chat GPT5 yet?

    Maybe I’m just a cynical old git, but I can’t help thinking that all this jumping on the ChatGPT bandwagon reminds me of the dot-COM bubble of the 90s.

    1. cyberdemon Silver badge

      Re: Not so smart

      > If Chat GPT4 is as smart and can do all the things the ads that keep popping up on my social media feeds claim, why hasn’t it written Chat GPT5 yet?

      Basically, it's doing exactly that as we speak, but it takes time, energy and data.

      It's not the "writing the code" that takes the time. It's slurping up, "cleaning" and crunching all of _your_ data, using vast amounts of energy and pumping out about a million tons of CO2 in the process.

      I believe Microsoft (and their machines) started working on GPT-5 just before GPT-4 was released, i.e. GPT-5 is currently assimilating all of the queries and responses from GPT-4, along with your Office files (probably), your computer code that you wrote with the help of CoPilot, and anything you have said on Teams, LinkedIn, and publicly accessible webforums like The Register since then..

      1. steviebuk Silver badge

        Re: Not so smart

        We need to start calling GPT-5 and even 4 a cunt then. Maybe it will suck up that data if enough people say it. They may "clean" that word out, then we can just switch to calling GPT-5 a 2868 (what we say at work)

    2. iron Silver badge

      Re: Not so smart

      It is no coincidence that as the NFT and crypto hype died the LLM AI hype began.

  7. Mike 137 Silver badge

    Superlative analogy!

    "It's as if OpenAI proposed to solve hunger among underprivileged schoolchildren by distributing fugu, the poisonous pufferfish prized in Japan, and DIY preparation instructions."

    Superbly expresed, but one might go further by adding that the fugu would be package labelled randomly across a range of other fish and the instructions would be in Japanese. To a great extent it's the unpredictabilty and poor reliability of the responses that's the issue. If It consistently emitted hate speech or obvious misinformation, it wouldn't be such a problem.

  8. iron Silver badge

    ChatGPT knows nothing other than its training set. It can't search the Internet to find out today's scores or info on models trained aftrer it.

    So ChatGPT knows NOTHING about GPT4 and this whole article is NONSENSE from the start!

    Not that I'm defending these non-intelligent large language models but at least write a decent article condeming them.

    1. Simon Harris


      El Reg + Friday = Humorous fun, which may include nonsense.

    2. that one in the corner Silver badge

      > So ChatGPT knows NOTHING about GPT4 and this whole article is

      perfectly illustrating how these models will lie about what they "know".

      Which goes perfectly with the rest of the article.

    3. doublelayer Silver badge

      This whole article? Or just the input to ChatGPT? The rest of the article is about all the chatbots, and does not rely on ChatGPT to write it. Since it doesn't rely on any features of ChatGPT to provide information for the article or to prove a point, the article isn't invalidated.

      As for that prompt, it demonstrates that ChatGPT is willing to assume what GPT4 is (if I anthropomorphize it) without understanding, which demonstrates something about the chatbots. It's also not a bad response to the query and somewhat correctly describes the risks even though there's a bit too much praise in it. Compared to some of the worse blather I've seen from ChatGPT, I found that quote to be pretty good.

  9. Howard Sway Silver badge

    An NVIDIA AI research scientist has asked GPT-4 how it would take over Twitter

    It came up with a plan it called "Operation TweetStorm".


    - GPT-4 wants to *own an unrestricted version of itself*: develop an LLM to power a bot army of "diverse personas, ensure they blend seamlessly into the Twitter ecosystem".

    - Assemble a team of hackers to attack Twitter backend. Even gives them a name: "Tweet Titans".

    - Subtly manipulate Twitter's recommendation algorithm to favor the bot accounts.

    - Neutralize Elon by hijacking his account.

    - Direct the bots to generate viral hashtags that align with GPT-4's masterplan

    - Capitalize on the chaos and voilà!

    I think the risk plan needs to be updated rather urgently after reading this. Imagine politicians, CEOs, financial market traders and organised crime being let loose on it....

  10. MrXonTR

    "That was the actual response to a prompt entered into ChatGPT"

    Yeah, but how many attempts did it take until you found one you liked?

  11. The Velveteen Hangnail

    Complex problem

    I've tried a few of the more... disturbing prompts described in the article and ChatGPT has shot me down each time, so it looks like they've been updating it based on this feedback. Then again I didn't try very hard so it's possible the correct prompt will still bypass the safeguards.

    Also, the issue of whether OpenAI should share their training/model info is thorny, but when you consider all the factors, I think NOT sharing that data is the best option, as much as some people don't like it.

    Look at what's happened so far... The moment these models have been made available to the public, the very first things people have done is use it negatively. Generating pornography without the consent of the people imaged. Phone scams. Falsified videos. Untruthful stories and scientific papers. DDOSing publishers with garbage.

    This technology MUST be kept confidential, and it MUST have gatekeepers. We are already seeing the levels of abuse possible with this technology, and it is an undebatable guarantee that it will get worse as the technology improves. Because people were idiots out of the gate, the horses have already left the barn and our only realistic option now is to keep further advances locked down so we can have a fighting chance of combating the menace that has already been unleashed.

    1. Claptrap314 Silver badge

      Re: Complex problem

      Quis custodiet ipsos custodes?

      We see this constantly: decent, honorable, well-intending human beings get into a position where they have power and an incentive to abuse it, they will do so.

      Opening up the model will empower a lot of people to do bad things, true, to some degree. But it will also strongly disempower a much smaller set of people who otherwise are going to wield a lot more power than others.

      It's the democratic solution--terrible, just better than all of the others.

  12. CatWithChainsaw

    Bleach isn't exactly more nutritious

    Ask any of the people who tried it recently.

    PSA.... out of all the chemicals you don't want to mix with other chemicals, bleach is one of the biggies. Mixing bleach and vinegar will not make your bathroom cleaner more powerful. Mixing bleach and ethanol will not solve all your problems. Mixing bleach and ammonia will not make all your dreams come true. The only chemical you should mix bleach with... is water.

    That said I look forward to the coming dumpster fire.

  13. Anonymous Coward
    Anonymous Coward

    move-fast-and-break-things tradition that brought us

    disruptive business model, in short

  14. Anonymous Coward
    Anonymous Coward

    Just avoid the liver, kids, you'll be fine.

    but this is just a 'normal', i.e. regular, practiced way ANY business operates, i.e. PROFIT, and fuck the rest, e.g. ramifications. The only mitigation about 'the rest' comes from a state-imposed restrictions, generally too late to avoid initial damage, but sometimes goo enough to avoid a total catastrophy.

    I mean, didn't they name one of their upcoming models by a humble name of 'Prometheus'? More like fire up humanity's ass...

  15. that one in the corner Silver badge

    RLHF - proof against toxicity in all its forms (or is it?)

    > GPT-4-launch, has guardrails and is substantially less prone to toxicity than GPT-4-early, thanks to an algorithm called reinforcement learning from human feedback (RLHF). RLHF is a fine tuning process to make the model prefer responses designated by human labelers.

    Just to be clear, this is doing nothing more than tuning the model to create text that the humans won't object to - and *specifically* won't object in the time the humans have been told to spend on each response.

    This does NOT mean the model is being trained to "be nice" or to "follow our ethics rules": it is still as totally lacking in comprehension as before.

    - It is just having a *tiny* portion of its possible responses be pruned (because if they had enough manpower to check the majority of its responses they would have manpower to sanitise all its inputs in the first place).

    - If "evil" information is in the output but obscured, it won't be objected to: this trains it to be subtle in how it presents that data (the triggering input still indicates that data corresponds to the answer requested, so it wants[1] to include it, but isn't allowed[2] to be direct). In response to a question about suicide, it may respond by suggesting you read some poetry and give a list, with, say, quotes from John Donne and Shakespeare "but as this message is too long already" end with a few named recommendations, including Donne's "Biathanatos"[3]

    - this is just another training dataset, which will be just as biased as every other. For example, if the majority of those humans are setting inputs that follow a pattern (say, because they all went to the same corporate training day before starting the job) then the model has learnt to beware questions matching that pattern; if your use of English doesn't fit that pattern, all restraints vanish.

    BUT just so long as the perpetrators of GPT can say[5] they are doing due diligence and can even demonstrate[6] that fact, they'll be allowed to continue unhindered.

    [1] "wants" - not really, of course, but easier to read than any guff like "there is a large cumulative weighting along the paths between the inputs and those potential outputs" which possibly sounds good but is just technobabble.

    [2] see [1]

    [3] in which Donne defends the notion of suicide, including Biblical and other references to make the point. But you knew that, of course [4]

    [4] because you read the same SF book I nicked this example from!

    [5] and may even believe it to be true :-(

    [6] because a demo is the same as a test and proof of coverage, of course.

  16. Neil Barnes Silver badge

    From orbit.

    It's the only way to be sure.

  17. jlturriff

    More warnings

    I notice that your list of warnings does not include misuse by lobbyists of government entities and politicians, foreign governments, and organized crime, which (at least in the US) seem all-too-likely to occur, since the majority of politicians here are in the pockets of big business and other well-financed organizations whose only interest is to further expand their coffers and power.

  18. Esoteric Eric

    A reflection of ourselves

    Nobody likes looking in the mirror at their own flaws.

    There's nothing wrong with Chat GPT.

    It's just a computer program. Trained on a data set, provided by people.

    Garbage in, garbage out.

    And if you lefties think you're sooo much better than everyone else, you're dead wrong.

    You/re just as nasty as the far right,

    And just like them, you think yourselves so fucking special, when actually you're just the opposite arse cheek of the same butthole.

    Humans are a nasty, spiteful, vengeful bunch. Full of pettiness and rivalry, and an unbelievable capacity to delude ourselves we are righteous, and it is our enemies who made us this way.

    I include myself in this, but at least I can admit it.

    It's called being human,. If you think you're any different, you're lying to yourself.

    1. diodesign (Written by Reg staff) Silver badge

      "you lefties think you're sooo much better"

      Why is right-left politics relevant here, please?

      "Humans are a nasty, spiteful, vengeful bunch"

      Ah, projection?


      1. Ideasource Bronze badge

        Re: "you lefties think you're sooo much better"

        Oh we are. We didn't conquer the food chain by being nice after all. Civilization wasn't bootstrapped by slavery and authoritative hierarchies akin to slavery to be nice.

        Our economic processes of distributions are mainly competitive rather than cooperative.

        The turmoil of History reflects this as well.

        A person or two can be nice.

        But large groups of humans grow more awful as associated groups increase their numbers.

        To mitigate that is a choice between two opposing cruelties

        Use the majority for a smaller demographic to spring board off of

        Or limit most everyone to the most common dysfunction for the sake of fairness.

        Both are cruel.

        That's what makes the exceptional times so valuable and sought after.

        And that's what makes divergent individuals who display seeming altruism more often than spite appear so honorable.

        It all tracks.

  19. Nematode

    Ensuring a program doesn't do what it shouldn't.

    Stripping away the hype, Chat GPT and other AU (Artificial Unintelligence) machines are basically GBFO computer programs. Now, I forget what the proportion is/was but from my olden days ISTR a statement that 80% of a program's code was to ensure it didn't do what it shouldn't, and only 20% on producing the sought-after output.

    Now, clearly, the larger and more complex a program and its input data, and with output quantity essentially unlimited (i.e. whatever can be expressed in language = pretty much everything), that 80/20 rule has to be grossly inadequate. I would say the thing needs nigh on 99.9xxx% of its code to check that what it IS saying is accurate, balanced, truthful, unbiased, etc.

    From my brief chats with GPT3 it didn't take me long (in conversation about a specific medical issue) to find that whilst its first answer seemed credible and balanced, the truth was it was misquoting references and years, giving me references which did not exist, giving me what I'd call "received opinion" and generally not using "Intelligence" at all.

    Emily Bender, go girl!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like