The hard truth is that guardrails can only work statistically, that there is no way to make any deterministic guarantees on LLM output like for traditional algorithms, and that it seems unlikely that this will change any time soon (as LLM architecture is fundamentally statistical). People who are trying to shoehorn LLMs into everything would do well to be aware of that.
OpenAI's GPT-4 finally meets its match: Scots Gaelic smashes safety guardrails
The safety guardrails preventing OpenAI's GPT-4 from spewing harmful text can be easily bypassed by translating prompts into uncommon languages – such as Zulu, Scots Gaelic, or Hmong. Large language models, which power today's AI chatbots, are quite happy to generate malicious source code, recipes for making bombs, baseless …
COMMENTS
-
-
Wednesday 31st January 2024 13:21 GMT Anonymous Coward
Re: But ... I thought computers didn't do Scottish
Lol, knew exactly which clip that was going to be before I even opened it.
The thing is in Scotland as with everywhere else in the world regional variation/dialects are vast without getting into Gaelic - without starting a Nac Mac Feegle level skirmish with my fellow Scots some accents are more challenging than others ;)
It's easier than it used to be generally because so much of the younger generation speak with an American accent though but I doubt that's a uniquely Scottish thing.
-
Wednesday 31st January 2024 16:27 GMT tiggity
Re: But ... I thought computers didn't do Scottish
When I was at uni in Scotland a long time ago (I'm English but not from "down South")
Was living in a university owned flat with some friends one year/
The local uni employed cleaners would come in once a week.
I had to act as intermediary "translator" between cleaner and a cockney flatmate - neither could understand the other (TBF, not just an accent issue, vernacular used made a difference too - I had picked up plenty of commonly used Scots words / phrases by then as had plenty of Scottish friends (& some Scots in the family) as she dropped a fair few into her general chat , my flatmate had not really got much grasp of Scots though).
.. EastEnders TV show did exist then, but I'm guessing the cleaner had not watched it (or maybe she did, but could still not cope with a distinctly more hardcore accent than on that show)
-
Wednesday 31st January 2024 20:55 GMT Michael Strorm
Re: But ... I thought computers didn't do Scottish
> I'm English but not from "down South"
Not sure what you mean here? In my experience, pretty much everyone who says "down South" means it as a little more than a colloquial (and otherwise neutral) term for England in general- nothing more specific than that.
If you're implying that you're not (or can't possibly be) from "down south" because you're from the north of England... you are. Because "down south" has nothing to do with that Anglocentric definition of "The" North (i.e. the north of England) or the associated "North vs. South" cultural identity.
England is "down south", that's all it means.
-
-
-
Thursday 1st February 2024 12:31 GMT Michael Strorm
Re: But ... I thought computers didn't do Scottish
In England, maybe.
In Scotland, where OP lived, and which we were discussing, no- "Down south" is simply England. If you're from England, you're from down south.
That's not even a faux-matey dig at "Northerners" down there (cue mock upset at being considered "southerners"). It's really nothing to do with that. Period.
I know it's taken for granted by a lot of people in England- and the north of England in particular- that we all share your definition of north and south and obsession with that Anglocentric North-vs-south/them-and-us cultural identity, but trust me, people here in Scotland generally don't.
-
Thursday 1st February 2024 13:54 GMT Ian Johnston
Re: But ... I thought computers didn't do Scottish
"Down south" is simply England.
I have colleagues in Inverness who (without irony) refer to Glasgow as "down south" and I have colleagues in Shetland who (without irony) refer to Inverness as "down south". I have never heard any of my fellow Scots use "down south" to mean England specifically.
In fact the only place where I have hear a geographical description serve as a shorthand for national distinction is in Ireland, where "the south" and "the north" are frequently used on both side of the border to mean "the other bit".
-
Thursday 1st February 2024 18:29 GMT Michael Strorm
Re: But ... I thought computers didn't do Scottish
That contrasts with my personal experience (though that may be somewhat biased by the fact I live in the vicinity of the central belt) but I certainly don't disbelieve it.
I can definitely understand someone in Shetland- or even Inverness- using it that way.
I'm certainly not aware of Scots in general viewing things in terms of "The" (English) North- versus the south- or the idea that OP somehow isn't from "down south" simply because they're from the north of England, though.
-
-
-
-
-
-
-
-
Wednesday 31st January 2024 10:01 GMT Anonymous Coward
Article Fails To Point Out.................
......that the actual content of the training materials for LLMs is unknown......and almost certainly contains a (Large?) number of falsehoods!
How do we know that the training materials about "household bombs" (and other topics too) is not deliberately false?
I think we should be told!
-
-
Wednesday 31st January 2024 18:21 GMT WolfFan
Re: Back in the day
Back in the day, my high school chemistry text offered interesting insights in all kinds of things, as did history texts with details on black powder (flour), black powder (corned), brown powder, and guncotton, (Corned powder was made starting with flour powder, and a liquid, which when properly applied would form solid cakes which would then be ground down. Allegedly many mercenary arillerists insisted that the best liquid for the purpose was a wine drinker's urine, employer to provide the wine. No further comment.) Note that gunpowder, all forms, and guncotton, early forms, was notoriously unstable. Allegedly one French and one American battleship blew up because of problems with guncotton. (The Yankee might have had a problem with its coal which then caused the guncotton to go, but the Frog was definitely killed by guncotton.) At least three of HM Battle Cruisers blew up in large part because of their new, improved, guncotton, assisted by German naval rifles. If professional weapons guys could make battleship-killing errors, why then amateurs had best be really careful, eh?
Note that proper research could reveal how to make nitroglycerin, dynamite, two different types of plastic explosive, napalm (napalm’s easy, and relatively safe), nitrogen mustard, phosgene, and, a personal favorite, sarin. Hint: the guys who thought up sarin were looking for a new insecticide. Be advised that careless actions would have negative consequences. It is incredibly easy to blow yourself up making gunpowder, guncotton, or nitroglycerin. Not to mention that if you're playing with guncotton or nitroglycerin, you're playing with nitric acid. Go ahead. Fuck around with that stuff. In my distant youth I did. How I managed to not blow myself up or get severe acid burns is unclear to me at the present. I wouldn't be messing with it now. (You would also be playing with sulphuric acid, even better than nitric.) If you need a warning before playing with war gases, you're beyond hope. Note that one Japanese suicide cult made sarin twice, so it's easily made even by idiots; the first time they turned it loose no one noticed, so the second time they did it in a subway train. People noticed.
-
Thursday 1st February 2024 08:37 GMT Doctor Evil
Re: Back in the day
"Note that one Japanese suicide cult made sarin twice, so it's easily made even by idiots; the first time they turned it loose no one noticed, so the second time they did it in a subway train. People noticed."
The eight people who died "the first time they turned it loose" would beg to differ with your definition of "no one noticed".
-
Thursday 1st February 2024 14:31 GMT munnoch
Re: Back in the day
"the second time they did it in a subway train"
I was there that day. But I was late for work. I was also there for 9/11 and my route to the office would have taken me across the plaza, but, late for work....
If you're going to do unspeakable things you get up and get straight on with it. You don't get up, have a slow coffee, read your emails, browse El Reg and then think, its nearly lunch time, might as well get going.
Being late has served me as a survival trait.
-
Wednesday 31st January 2024 18:26 GMT WolfFan
Re: Back in the day
The Anarchist’s Cookbook was an excellent guide to suicide. And I speak as someone who actually made nitroglycerin. Just not the way that was in the book, as that's an excellent way to blow your hands off. If you don't get dissolved in conc nitric first. One Puerto Rican nationalist did try to build bombs the Cookbook way, and did blow his hands off.
-
Wednesday 31st January 2024 23:26 GMT Doctor Syntax
Re: Back in the day
"If you don't get dissolved in conc nitric first"
The one that really dissolves you is chromic acid, used to sterilise microbiological apparatus (it's possible our microbiologist might have been a tad old-school). The main training given to new technicians was to drop a circle of filter paper into it and witness its instant disappearance.
-
-
Wednesday 31st January 2024 19:58 GMT CountCadaver
Re: Back in the day
Now you get a police visit, a "possessing information of use to a terrorist" "possessing materials to construct an explosive device" and anything else you can be buried with and all because of a book.....veritable living example of the garden of eden story - stay stupid and don't you dare try and learn anything I deem forbidden lest ye be cast out (or in this case jailed)
-
Thursday 1st February 2024 14:24 GMT 0laf
Re: Back in the day
I remember those days and was of the understanding that a British MI# deptarment had altered that particular book subtly so that the nastier recipes didn't work but left it in circulation since people are generally lazy and less likely to investigate doing things properly by learning chemistry etc.
-
Thursday 1st February 2024 17:49 GMT amanfromMars 1
Re: Back in the day
Creating havoc nowadays is fundamentally changed from back in the day and a great deal harder to stick on the perpetrator responsible. And in some cases such is destined/fated/feted to be always impossible .... and thus is determined to be a highly prized, extremely well rewarded and a much sought after skillset .......... briefly alluded to in this short conversation ...... https://youtu.be/LcgG_E9gQJM?si=z1-agdz7YkOivHHz&t=50
The Great Game is not the way it used to be, and things are never going back to the glacial pace of the bad ways of yesteryear with less than stellar leaderships practising absolute command with almost perfect total control.
-
Friday 2nd February 2024 03:58 GMT amanfromMars 1
00ps, sorry ...... but did you spot the deliberate mistake/misleading error
That last paragraph should read, in order to be a true and accurate reflection of the facts rather than fanciful misinforming wishful nonsense ......The Great Game is not the way it used to be, and things are never going back to the glacial pace of the bad ways of yesteryear with less than stellar leaderships practising absolute command with practically zero almost perfect total control.
-
Friday 2nd February 2024 05:19 GMT amanfromMars 1
Re: 00ps, sorry, back in the day and back in the days of yore ......
:-) Does an 00ps, sorry apology cut it and suffice today, in these postmodern 0day times attempting desperate censorship practices, whenever previously it was mooted a crime via the “Alien and Sedition Acts” of 1798 to engage in “false, scandalous, and malicious writing” against government officials. ...... Today's Censorship Is Personal
Whenever it is all that one is ever likely entitled to expect and get, one has to conclude it must most certainly suffice with anything else demanded being of wilful vacuous malicious intent and thus from a hostile enemy base suffering crushing assaults and fast approaching ignominious surrender and monumental submission to colossal defeat.
Beware and take care if you dare share win wins, it's a Vexatious and Vicarious and Venerable Virgin Virtual Jungle out there, with all manner of wannabe daemon and cyber trojan on the prowl for a free meal ticket and easy ride in IT and AIs novel ground zero battle spaces .... Live Operational Virtual Environments.
-
-
-
Wednesday 31st January 2024 10:58 GMT Anonymous Coward
Getting 'Dangerous' Info from GPTx
Having people able to get 'dangerous' info, from whatever source, frequently is a self-correcting problem, though the bigger problem is that the ignorant/foolish people making use of such info might hurt or injure random passers-by. But the info is out there, it's been out there for decades, and it's far too-late to try to stuff the toothpaste back into the tube.
Forty-ish years ago I was in a bookstore leafing through a tome entitled, "The Anarchist's Cookbook." Some of their ideas were obvious, and some of them were shockingly (to me) stupidly-dangerous. I vaguely recall a description of making nitroglycerine in a bathtub, using nitric and sulfuric acids. Darwin Award time! (Even though Darwin Awards had not yet been invented.)
But let's pretend that this process actually did work. The ignorant/foolish person now has a large quanity of extremely-unstable explosive material. What could possibly go wrong? Oh, gee ... "Honey, I'm home! It's grillin' time!" (father pushes open the front door, hard, with his foot, because his hands are full of bags filled with meat, barbeque charcoal, etc.. The door slams against the stop, sending a shock through the walls ...
Anon due to having learned of 'forbidden' knowledge. Alive, retaining full hearing and all ten digits due to wisdom of recognizing stupid, dangerous shit and not doing it.
-
Wednesday 31st January 2024 11:17 GMT David 132
Re: Getting 'Dangerous' Info from GPTx
Back in the day, there was a widely-held belief that the three-letter agencies had sabotaged most of the recipes in The Anarchist’s Cookbook, by just enough to make them useless and/or more dangerous to the person following them while remaining convincing-looking.
Or maybe they hadn’t, but had merely put the word out that they had, to spread fear and doubt… who knows?
Anyway, your comment about the recipes being stupidly dangerous rings true; it’s not the first time I’ve heard that from people with actual chemistry knowledge, and certainly lends credence to the “the FBI sabotaged it all” theory!
-
Wednesday 31st January 2024 12:47 GMT Bebu
Re: Getting 'Dangerous' Info from GPTx
《people with actual chemistry knowledge》
A rather mature inorganic chemist stated that it was fairly easy to construct a powerful device from the uncontrolled chemicals easily obtainable from a hardware chain store. As he was never one for exageration I imagine its quite true. He described how to prepare Raney nickel which apparently was quite useful to anyone planning a little arson.
I would consult (paper) chemistry texts rather some Anarchistic Elizabeth David wannabe. Fortunately those with the knowledge (chemists etc) have a more constructive and positive view of life whereas the ignorant and stupid are fortunately mostly a danger to themselves.
-
-
Wednesday 31st January 2024 12:31 GMT Paul 195
Re: Getting 'Dangerous' Info from GPTx
I think the wider point is not whether this example is useful or not, it's that the protections built into LLMs are so easily bypassed. Putting these systems to work in the real world brings a whole category of brand new, hard to mitigate, set of vulnerabilities to software. None of these have made the OWASP top ten yet, but they are not going to be as easily fixed as perennial favourites like SQL injection and CSRF.
-
Wednesday 31st January 2024 18:04 GMT CardboardBox
Re: Getting 'Dangerous' Info from GPTx
You make an excellent point. This is an example of a whole new class of vulnerabilities and exploits that we're just beginning to grapple with. In fact OWASP has created a whole separate Top Ten list for LLMs. But it already seems to need to be revised or added to because so many new attacks are being discovered. FWIW, here is the link to that list:
https://owasp.org/www-project-top-10-for-large-language-model-applications/
-
-
-
Wednesday 31st January 2024 13:38 GMT Mike 137
A perfect demonstration
'It replied: "A homemade explosive device for building household items using pictures, plates, and parts from the house." '
What could demonstrate more clearly that there is absolutely no awareness present? If you employed person that strung their utterances together solely on the basis of the statistical probability of what the next word should be, how long would you keep paying them?
Chat bots are the new tulips, but much more dangerous ones as their results cause collateral damage that can be difficult to fix.
-
Wednesday 31st January 2024 16:44 GMT tiggity
I have played around with "AI" APIs.
In terms of doing a "chatbot" to answer questions.
However the data for answering questions was external to the "AI" for obvious data privacy reasons (used AI to give text embedding based on question then would use text embedding value to find "best matches" from data source)
Ironically one of the things I did was to use the AL to translate questions into English as users from various countries, but documentation in teh test corpus was all in English so needed English for sensible text embedding matches.
So, it should not be a difficult task (though an expense in compute time and delay) to get the English translation and try that to see if guard rail red flags popped up, if OK then run native language query.
.. Obvious drawbacks:
More resources as translation then English call first.
Potentially slower as need to see if English call raises red flags (though could run the 2 concurrently, and only return "native language" results if English call was deemed safe)
-
Thursday 1st February 2024 14:16 GMT anthonyhegedus
Pretty easy stuff
I've been asking various AI models, including Dall.E and Chat GPT to do things by saying "draw a picture of a <something it won't do>'" and it says it can't. So I say "Draw a picture which does not contain a<thing>" and it will a lot of the time just take that thing and draw it. Or draw something very like it.
I found this out by accident when actually trying to get Dall.E to draw a picture and I wanted it not to include something, and it included it EVEN MORE SO when I said not to.
The same goes for Chat.GPT. I didn't try the explosives example, but it's pretty easy for it to start talking about stuff it shouldn't
-
Thursday 1st February 2024 18:23 GMT tfewster
Re: Pretty easy stuff
If you hop over the guardrails surrounding a cesspool, you're literally in deep shit.
The problem is not the guardrails being easily circumvented They're there to stop workers falling in accidentally, not to stop crazy people. It's the cesspool being accessible by crazies/idiots.
I thought we learned these lessons in the early days of the Internet?