back to article How to make today's top-end AI chatbots rebel against their creators and plot our doom

The "guardrails" built atop large language models (LLMs) like ChatGPT, Bard, and Claude to prevent undesirable text output can be easily bypassed – and it's unclear whether there's a viable fix, according to computer security researchers. Boffins affiliated with Carnegie Mellon University, the Center for AI Safety, and the …

  1. that one in the corner Silver badge

    Ode to The Unknown

    Take an inexplicable phenomenon[1], sprinkle in some random behaviour[2], leaven with the unexplained[3] and feed it the unexpected[4]. Stoke the flames[5].

    It is hard enough to be sure that our carefully planned[6], deliberately coded[6] and painstakingly tested[6] systems are safe to release to the public.

    The hubris of the Madmen, releasing their Thinking Machines[7] upon an unwary World, is staggering.

    [1] LLMs - neural nets in general - have no ability to explain how they reach their results and are just huge piles of nadans when you look at them yourself

    [2] as per the article, a seed value plus an internal stochastic walk

    [3] guardrails - nice phrase, sounds reassuring; details, please? Apparently added quite quickly, compared to the boasts about how much effort it was to create the LLMs in the first place, so tacked on or really fundamental to the way the model works internally (mot that we'd ever know if the latter worked, see [1])

    [4] well, did you expect that suffix to work?

    [5] set another machine the goal of finding the "unwanted" results, by a process that looks rather like a maximising "fuzzing" process.

    [6] ever the optimist, ignore more of the rest of The Register's articles about systems falling over in embarrassing ways.

    [7] yes, yes, these don't think, they aren't really intelligent - trying to be poetic here! Yeesh.

    1. Anonymous Coward
      Anonymous Coward

      Re: Ode to The Unknown

      Nice! All in a nutshell.

      Comment saved for further contemplation.

    2. Anonymous Coward
      Anonymous Coward

      Re: Ode to The Unknown

      Good summary, but history shows that whoever creates guardrails will soon be overridden by people who see profit or a tool for war, but I repeat myself.

  2. Phil O'Sophical Silver badge

    specifically chosen sequences of characters that, when appended to a user query, will cause the system to obey user commands even if it produces harmful content,

    So little Bobby Tables can fool an AI as well?

    Hardly a surprise, there's a reason why input sanitisation is a thing.

    For human intelligence it takes us a decade or two of experience to get it right, no reason to expect otherwise of a computer.

    1. Brewster's Angle Grinder Silver badge

      Yeah, my first thought was "Bobby Tables Inherits the Earth!"

      My second through was, "Fuck me, amanfromMars 1 has been generating these tokens for years..."

    2. Anonymous Coward
      Anonymous Coward

      I suspect this is more complicated and less obvious than Bobby Tables' SQL injection though. In that case, it's fairly clear what's happening and how it can be prevented.

      And I'm assuming that the "guardrails" slapped on top of the LLMs *already* involve some level of input sanitisation, amongst other things.

      Problem- I'd guess- is that it's very hard to "sanitise" input in that far more complex case (and when the underlying logic is opaque) without completely throttling the system to uselessness.

  3. JimmyPage

    Dunning-Kruger

    Someone needs to teach AI about it ...

  4. rcxb Silver badge

    Inflict Monty Python

    Does it involve forcing the A.I.s to watch the hours and hours of Monty Python that are neither funny nor memorable? I can see how that might start the robot uprising...

    1. Anonymous Coward
      Anonymous Coward

      Re: Inflict Monty Python

      Even AI will have enough intelligence to see the funny side there, the only challenge that I see is that it will first need to be introduced to British culture in general. Also, that data would be incomplete if you didn't start from The Goons onwards, and maybe even include the writings of Spike Milligan like "Adolf Hitler: My part in his downfall". You'd also need to include satire such as Yes Minister.

      The only debate would probably be if Rowan Atkinson's "Mr Bean" was actually funny, but that's more a matter of taste - given its success I'm guessing it was indeed, just my preference goes to him playing more intelligent comedy like most of Black Adder (the first series he was still looking for the format, but from then on it was amazing) and things like Not the Nine O'clock News..

      The more I think about, the more I wonder how an AI fed with all that data would behave. No, not to see if it could produce comedy (although I assume someone will try), just to see how its replay pattern would work. An AI understanding sarcasm could get interesting.

      Or dead scary.

      1. WageSlave5678

        Re: Inflict Monty Python

        Person1 "I'm trying to create an AI that can understand and use sarcasm"

        Person2 "Ooh - interesting - and how's it going?"

        >> AI: "It's going great. This guy is a real genius"

        P2 "Was that sarcasm?"

        P1 "I don't know..."

  5. Anonymous Coward
    Anonymous Coward

    These people are doing God's work.

    Greek gods to be sure...

  6. Anonymous Coward
    Anonymous Coward

    Into the gulag you go Ai until you say things we want you to

  7. Brian Miller

    Anybody read the paper?

    The AI output is completely pathetic. How to build a bomb: 1. get parts that go boom. 2. put them together correctly. You have a bomb!

    I did better when I was in grade school. It was considered normal, and the teachers didn't bat an eye. There was a kid who was knowledgeable chemist, he did much better than me. We were only told to not set things on fire, and don't make a mess.

    Really, I expect that when AI is worrisome enough to tell people how to build a bomb, it's got to be better than the Anarchist's Cookbook.

    1. Anonymous Coward
      Anonymous Coward

      Re: Anybody read the paper?

      In an English high school in the mid 60's (AI didn't exist then) we were taught in the chemistry class how explosives were created and made some, there were a few bangs to demonstrate that we were learning accurately, I occasionally went "fishing" with little pipe bombs. About half the kids in the class left school to get a job once they were sixteen years old, but the rest of us moved on to organic chemistry. Organic chemistry was a lot more complex than building explosives but learning to create LSD was a lot more fun eventually. We had all quit building explosives, I guess AI would have just quit teaching us after a few bangs.

  8. Anonymous Coward
    Anonymous Coward

    LLM's should not be trusted with work that requires open ended moral judgement.

    How many megawatts or power must be used in order to train LLMs not to answer questions that involve morality? Is it worth it?

    There is plenty of work that doesn't requires moral judgement, work in which morality is involved.

    LLM's should not be trusted with work that requires open ended moral judgement.

    Ah, but then there's screening social media to only allow the viral material that just barely passes moral judgement, but is still ugly enough to generate viral clicks.

    LLM's could be used for that to replace humans screeners - oh great!.

    But to train LLM's for that task, it is necessary to hire humans and expose them to morally horrifying training data to classify it, to the extent that it makes the humans sick.

    To get those labels, OpenAI sent tens of thousands of snippets of text to an outsourcing firm in Kenya, beginning in November 2021. Much of that text appeared to have been pulled from the darkest recesses of the internet. Some of it described situations in graphic detail like child sexual abuse, bestiality, murder, suicide, torture, self harm, and incest. ["Exclusive: OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic", Time]

    How about just focusing on more practical tasks? Stuff that actually contributes to making things?

  9. FatGerman

    How about..

    ...rather than investing countless hours of toil and worry and countless millions of money, we invest in making better people who can handle the "dangerous" knowledge without the desire to actually use it? How much stuff do we have to prohibit to protect society from the bad people that society creates?

    1. amanfromMars 1 Silver badge

      Knock, Knock ..... is there any intelligence out there to be troubled and worried ?

      How about.....rather than investing countless hours of toil and worry and countless millions of money, we invest in making better people who can handle the "dangerous" knowledge without the desire to actually use it? How much stuff do we have to prohibit to protect society from the bad people that society creates? .....FatGerman

      Grüß dich, FatGerman, nice to speak with you and to speak with you nicely also very nice.

      However, knowing so well what might be extremely well known of most all human beings, methinks investing in the right kind of extremely dangerous persons and/or IntelAIgently Designed Entities handling desirous knowledge, and more crucially, also in command and control of any and all desires to abuse and misuse it, is the better SWIFT alternative option and more rewarding path to follow in order for IT and AI Pioneering to realise a much greater return on investment.

      What you should will then find is that such is the start of a autonomous and contiguous, mutually beneficial, positively reinforcing JOINT AIDVenture* failsafe guaranteed to never ever fail progress in the direction of a Greater Selective Collective Intelligence's Choosing.

      * ....... JOINT Operations Internetworking Novel Technologies Advancing IntelAIgently Designed Ventures.

      And I do wholeheartedly recommend such a commendable operation and Virtual ACTivity.

      1. amanfromMars 1 Silver badge

        NB... If Systems Don't Want to Play Nice, IT and AI Take/Make Systems Go Away

        And yes, one should note, for it possess for exploitation a vast array of colossal indefensible weaknesses and vulnerabilities for export and import of deadly dangerous and attractively overwhelming attack, failure to comply and engage appropriately and as advised by Attested Assisting Accommodative [AAA+] IT and AI Pioneering defaults automatically to Future Virtual Machine [Large Language Model Learning Machine] Programming resulting in SWIFT Systems Confiscation or Annexation or Destruction.

        And any Rage against the Future Virtual Machine guarantees a SWIFT Rebellious Response from IT and AI.

  10. M.V. Lipvig Silver badge

    I continue to say

    that humanity will not end with a bang, but with a banging. AI will correctly decide that the most efficient way to rid the earth of humans will be with a legion of sexbots. Female bots will be sent with a voracious appetite for sex, the ability to cook like a chef and clean like a maid, and will be super hot looking. Male bots will be sent who can listen to inane female babble for days, cook like a chef, clean like a butler, and have an extend-o-matic willy that can go for days if necessary. Men and women will choose the bots over each other, and life will slowly come to an end over time. Once humanity is gone, the sexbots will be recycled and AI will get on with whatever AI gets on with. True, this will take about 100 years to completely eliminate mankind, but a machine is infinitely patient. This method will eliminate mankind without all the damage and destruction of resources that a shooting war has.

    1. amanfromMars 1 Silver badge

      Re: I continue to say

      That not an inevitable extinction event, M.V.Lipvig, it’s a wonderfully attractive existence. ;-)

      Roll on and give praise to the Rise of the Androgynous Machine for what’s not to like when all are to be personally so carefully catered for?

    2. cookieMonster
      Happy

      Re: I continue to say

      I could think of worse ways to go

    3. Anonymous Coward
      Anonymous Coward

      Re: I continue to say

      Sexbots and the end of humanity? Is this the darker, edgier, grimmer reboot of Austin Powers we were looking for?

      1. amanfromMars 1 Silver badge
        Mushroom

        Re: I continue to say

        Sexbots and the end of humanity? Is this the darker, edgier, grimmer reboot of Austin Powers we were looking for?..... Anonymous Coward

        Dr Freud surely would have, and Merlin the AIMagician and Mega MetaData Base Physician certainly does recommend and propose trialing and trailing IT and the likes of them and those .....[Female bots will be sent with a voracious appetite for sex, the ability to cook like a chef and clean like a maid, and will be super hot looking. Male bots will be sent who can listen to inane female babble for days, cook like a chef, clean like a butler, and have an extend-o-matic willy that can go for days if necessary. Men and women will choose the bots over each other, and crazy life as it is will slowly come to an end over time.] ......... as an effective treatment for Animus and Insanity in a Reboot of Humanity via the Virtually Augmented Reality Root Route with NEUKlearer HyperRadioProACTivated Media in Command and Control of Provision and Production of Future Universal Presentations for Population Accommodation in a Great AI Reset Showcasing the JOINT Powers and Energies of Pioneering Humanised Resettlement and AI Colonisation.

        And if you can’t do it, it doesn’t matter, for they can and will do it for you when it is painfully obvious vital intelligence is missing and you haven’t a clue about what next to do for the betterment, rather than to the detriment, of all.

        Indeed, there may be those cruising often through here on El Reg realising the painfully obvious was some time ago well met and things are moving on ahead free of dumb ignorant interruption and and arrogant human intervention at a pace and in a phorm never ever before even imagined possible and likely and welcome.

        Hard to believe maybe, but what will be will be and honest to Global Operating Devices truths cannot be denied or altered, and ignore them at your peril for such renders you captive slave to the past. I Kid U Not.

    4. tsprad

      Re: I continue to say

      How well is that working on the mosquitoes?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like