back to article I stumbled upon LLM Kryptonite – and no one wants to fix this model-breaking bug

Imagine a brand new and nearly completely untested technology, capable of crashing at any moment under the slightest provocation without explanation – or even the ability to diagnose the problem. No self-respecting IT department would have anything to do with it, keeping it isolated from any core systems. They might be …

  1. Dan 55 Silver badge

    When your product is a black box which gives you an answer based on probabilities, who's to say what's right and what's wrong? There are only two options, retrain the LLM from scratch with different training data or try and bodge a "guardrail" which catches a bad answer and tells it to go back and try again. The first option is expensive and slow and has no guarantee of success, the second is limited in what it can do and won't work if it gets stuck in a loop or starts babbling.

    Your feedback or bug reports are inconvenient and there's money to be made here.

    1. SVD_NL Silver badge

      There is money to be "made" here, it's called risk mitigation.

      Imagine the author sharing the prompt, sick and tired of being ignored, and it potentially leads to a large scale DoS on your product?

      Precious compute time wasted and availability compromised.

      At the very least try to understand the mechanics and assess the potential damage caused by this flaw, or variations of it.

      1. The Man Who Fell To Earth Silver badge
        Boffin

        Imagine the author sharing the prompt..

        Sharing the prompt is exactly what the author should do.

        1. Elongated Muskrat Silver badge

          Re: Imagine the author sharing the prompt..

          Indeed, the responsible approach to disclosure of a software vulnerability, is first to alert the software provider, to allow them to fix it, before then disclosing the vulnerability, so that others can be aware of it, and avoid it / work around it. There should be a suitable delay between disclosure to the vendor and disclosure to the wider community, to give the vendor time to acknowledge and fix it, but if their response is no response, then the responsible (natch) thing to do is disclose. The alternative is that there is an unpatched exploit out there that may be already known about and exploited by "bad guys" but which nobody has any knowledge of to mitigate against it.

    2. DS999 Silver badge

      That was my first thought

      It is pointless to take feedback when you know you will have no way of fixing issues. This isn't like a computer program where someone says "if I do x, y occurs and that's going to lead to potential security issues" and they have programmers who can do x, see that y occurs, look at the code, see (hopefully) the bug, and attempt a fix.

      How do you fix an LLM? Even if you completely retrain it, you don't know what caused that problem. Maybe the retrained model doesn't have that issue when you do x, but now it is does when you do z and it didn't before. What then, is having issues when you do x more or less severe than when you do z?

      I assume there is some way to hardcode behavior, so it could check for someone trying to do x and just refuse, or give some canned response that's not as problematic as y. But that assumes that x is the ONLY issue, when it clearly is not given how widespread he found this particular thing was across a range of models. How many similar landmines that affect all, almost all, or even just one (if it is a really widely used one like Copilot will be) are in there undiscovered, and may not be discovered for years? Or worse, have been discovered by those with ill intent, who have no plans of alerting the model makers because they want that issue to remain so they can use it for their own purposes in the future?

      1. Elongated Muskrat Silver badge

        Re: That was my first thought

        It sounds to me like some sort of "category" issue; where the inherent behaviour of LLMs is unstable / unpredictable given a certain class of input.

        It's probably non-trivial to fix, because there are three potential solutions:

        1) Fix the underlying issue (which might be inherent to LLMs themselves)

        2) Filter out the entire category of input, which would require identification of this within an arbitrarily complex input, something you might use a LLM for if it was 100% reliable and didn't choke on that input.

        3) "Catch" the output from such bad input and halt. Again, it might be practically impossible to distinguish between bad output and valid, but similar, output, unless the LLM doing the filtering itself understands the input.

        Given that (2) and (3) are probably practically impossible, that leaves us with (1): LLMs have an inherent design flaw. Perhaps we shouldn't all be jumping on the magical thinking bandwagon so quickly, and see "A1" for what it is: An unstable system that gives unpredictable output, and is thus inherently unsuitable for most purposes.

      2. aisal

        Re: That was my first thought

        Having the meltdown-inducing prompt on hand would allow AI concerns to immediately begin trying to train their models to be immune to it, using human trainers who could test progressively more complete fragments of the prompt and try to head off the meltdown behavior... in the process gathering large amounts of data on the problem. Obvious and immediate first step.

    3. UnknownUnknown

      Not surprised

      1. ‘Don’t look up’ FFS !!

      2. Even science fiction could not make this shit up even a speculative Skynet Origin story arc addition to the Terminator CU

      3. I bet your IP Attorney friend still intended charging the full ‘human’ rate for the job, despite the ‘boring bits’ being outsourced to AI.

      4. £££€€€$$$ Kerching !!

      * feels like we will end up with a Public Inquiry scandal at the scale of Post Office/Fujitsu, Phone Hacking, Infected Blood, Covid, Boeing, DWP Overpayment. If that’s the case wheel James Murdoch (News International and Tesla) or Paula Vennells (Post Office) out. They both have the 3 Monkeys script ready, with a health dose of faux anger and upset for the dead.

    4. Anonymous Coward
      Anonymous Coward

      A few years ago I implemented a Chi-squared Automatic Interaction Detection (CHAID) model that was looking at the link between thousands of small meters at the exit of a network and comparing to the small number of large meters at the entry point of the network.

      The sum of inputs minus outputs to this network should, theoretically, have been zero. It often wasn't, which could be a mix of theft, leaks, meter errors and so forth.

      The objective here was to figure out which sites were more or less likely to have errors. Sites that consistently exhibited such behaviour over multiple days/weeks were much, much more likely to have problems than those popping up just a few times.

      At that point you have the evidence you need to conduct an audit of the suspect site(s). We caught a large gas shipper red-handed with a badly calibrated meter that they MUST have known about for years. The amount they were selling downstream versus what was reported by their upstream meter was to the tune of £42M of error in their favour over two+ years.

      I won't call it theft, because nobody can't prove it to be so; but nontheless, it rather does make the point the little guys need checks and balances to defend

      The investigation and reports are out there in public for those so inclined to check.

      I won't defend "AI" or other sales pitches, but applied statistics is enormously valuable when wielded competently. The latter being the main problem with it.

  2. Pascal Monett Silver badge
    Trollface

    So,

    What's the prompt ?

    1. b0llchit Silver badge
      Devil

      Re: So,

      Exactly my thought! I too want to break some models. They undoubtedly deserve it.

      1. vtcodger Silver badge

        Re: So,

        Exactly my thought! I too want to break some models. They undoubtedly deserve it

        And you'll be doing a great public service by breaking this stuff before society becomes dependent upon it. Of course, your local legal system may take a different view of the matter. Might be best to use someone else's login credentials when pursuing this idea.

        1. Anonymous Coward
          Anonymous Coward

          Re: So,

          The Reg should give us a 10-minute HowTo on prompt-injection ... like that HouYi example IMHO.

    2. tfewster

      Re: So,

      Going public seems reasonable if you've exhausted other lines of communication, c.f.

      - "Responsible disclosure" of a bug if you've given the vendor an adequate chance to respond. Anyway, apparently one vendor doesn't regard it as a security bug.

      - Complaining publicly via social media. I'm less keen on this trend, but if the vendor makes it difficult to contact them then they're fair game for public shaming.

      I'm only mildly curious. In my limited attempts to use LLMs, Copilot etc, the results have ben so inaccurate and/or unhelpful that I haven't wasted my time learning to "refine" (second-guess) the prompting mechanism. But I've never seen a complete meltdown.

      THE MOON IS A HARSH MISTRESS by Robert A. Heinlein

      But in giving [Mike] instructions was safer to use Loglan. If you spoke English, results might be whimsical; multi-valued nature of English gave option circuits too much leeway.

      1. Anonymous Coward
        Anonymous Coward

        Re: So,

        I can think of several sites, including this one, that would enjoy publishing it, but I think posting it to 4chan would be absolutely fucking classy.

        1. J. Cook Silver badge
          Go

          Re: So,

          ... To be frank, I could see that going horribly, horrifically wrong.

          I'll bring the popcorn for that one- I want to see what happens.

        2. This post has been deleted by its author

      2. spacecadet66 Bronze badge

        Re: So,

        > But in giving [Mike] instructions was safer to use Loglan. If you spoke English, results might be whimsical; multi-valued nature of English gave option circuits too much leeway.

        It's almost as if we might benefit from devising some kind of "programming language" that lets us express what we want the computer to do in a less ambiguous manner.

        1. Chris Coles

          Re: So,

          each of you are using the English language to respond, yet use a vastly complicated mathematical language to "talk" to your software. On my part, for many years now I have wondered if the very best computer language would be to use the English language? particularly now that the whole basis for these new forms of usage are all about the use of language. Has anyone else here ever thought that the best computer language would be to use the English language? You are, after all, defining the problem in English; so why not build the entire model for a speaking computer, using the same language? Food for thought?

          1. Richard 12 Silver badge
            Headmaster

            Re: So,

            That whooshing sound is the point rushing over your head.

            Natural language is messy, full of ambiguity and multiple meanings, nudge nudge wink wink say no more say no more.

            Punctuation is not spoken yet trivially creates a murderous panda.

            Homophones and heteronyms abound in all natural languages.

            I've spent a fair bit of my career helping to write various ANSI standards. Almost every discussion is about finding the least ambiguous form of words - and most of the later updates are entirely because someone misunderstood the intent.

            1. irrelevant

              Re: So,

              Not to mention the fact that a significant proportion of purported native English speakers couldn't pick the right word, punctuation or spelling to save their lives... Just look at Facebook, Nextdoor etc.

              1. Bilby

                Re: So,

                Yeah, it has long been known that, in the unlikely event that we developed a computer that could understand plain English, we would discover that it wouldn't work, because people can't speak plain English.

            2. martinusher Silver badge

              Re: So,

              This is where the Chinese might have a bit of an edge over us. I don't know their language at all but I did have a colleague explain to me once how it works. (Aside here -- being a heavy duty literature person or poet or inkbrush artist or whatever doesn't really pay so the 'day job' tends to be something like software. This can put you in contact with some surprising talents.) What fell out of this conversation is that if you know how to deconstruct Chinese writing then you read it in English. Its because its a script of ideas, not sounds, so unlike our Western languages which are basically phonetic and so might remove some of the ambiguity inherent in prompts.

              Anyone here care to enlighten me further? (Here I'm told that spoken Chinese humor uses a lot of puns -- again anyone care to enlighten me?)

              1. DexterWard

                Re: So,

                This isn’t actually true. Chinese script is mostly phonetic

                1. Richard 12 Silver badge

                  Re: So,

                  Weeeellll, kinda.

                  Until maybe 100-150 years ago, 'Chinese' was a written language with a myriad sets of pronunciation. Everyone wrote the same but many spoke it very differently.

                  Travel was very slow and expensive, and China is really, really big. Almost nobody would ever encounter anyone from another province to learn the way they pronounced the words - but they would write letters for trade etc.

                  The empire was held together entirely the written word - letters and proclamations.

                  However, that's not the case anymore so most of the other spoken languages have gone.

                  1. aisal

                    Re: So,

                    That's not *reeeally* why most of the regional languages have fallen to Mandarin. I think you're being a wee bit disingenuous here

              2. mcswell

                Re: So,

                Every once in a while, someone says "Human language X would be perfect for programming computers!" Quechuan languages and Sanskrit have been proposed for that purpose by enthusiastic speakers of the languages, speakers who probably know little about computer programming.

                Written Chinese is an odd sort of writing system/ language. I'm told that a plurality of "words" (more on the scare quotes in a moment) are composed of two characters (a character is one of the many squarish things you see when you look at written Chinese). The first character often provides a hint of the pronunciation (more on pronunciation variation in another moment), while the second character often provides a hint as to the meaning. But beyond that, it's all memorization, and the average educated Chinese person has to memorize thousands of characters in order to read the newspaper. And most words composed of something other than two characters are multi-character words.

                About "pronunciation": Spoken "Chinese" is really a lot of different varieties spoken in different regions, even when you restrict attention to so-called Mandarin Chinese. Mandarin spoken in Shanghai is (I'm told) very different from the Mandarin spoken in Beijing, and so forth. Whether you call them different dialects or different languages is a fuzzy question. Cantonese--the variety spoken in Hong Kong and neighboring parts of mainland China--is completely unintelligible to Mandarin speakers, and vice versa except when Cantonese speakers learn Mandarin in school. The written variety of Chinese, on the other hand, can be read by educated speakers of all Chinese languages/ dialects (Cantonese adds a few characters, I'm told). So it is true to some extent that Chinese characters correspond to meaning more than to a combination of meaning + pronunciation.

                About "words": Unlike alphabetically written languages (with a few exceptions), Chinese is written without spaces between words. So one of the first steps in computer processing of Chinese texts is to guess at the word boundaries--guess because there's *lots* of ambiguity. Such guessers are usually trained on Chinese text that humans have segmented into words--and one of the problems is that people (I'm talking about native speakers of Chinese) don't always agree on the "correct" segmentation. We have a tiny bit of that in English (do you write "doghouse" or "dog house"?), but it's apparently much more of an issue in Chinese.

                In part because of the word issue, Chinese would actually be harder to use to program computers than English would. Even if you had text with perfect (whatever that would mean) word divisions, I doubt that it be any better than English, or any other natural language.

          2. spacecadet66 Bronze badge

            Re: So,

            > Has anyone else here ever thought that the best computer language would be to use the English language?

            We're 80-odd years (counting from the Colossus going online at Bletchley Park) into the history of electronic computers. People have thought of this idea and discarded it over and over again, because English turns out to be a terrible programming language.

          3. JimC

            Re: ... best computer language would be to use the English language?

            Grace Hopper for one. That was the idea behind Cobol.

            1. spacecadet66 Bronze badge

              Re: ... best computer language would be to use the English language?

              A fine example, since today COBOL is a punchline among engineers (though to be fair, I hear the paychecks for working on legacy COBOL systems are very serious indeed).

          4. mcswell

            Re: So,

            The idea of using English as a programming language dates back at least to Admiral Grace Hopper, and was the seed for COBOL. COBOL is of course *not* English, but it was (I'm told, I don't program in COBOL) an attempt to get closer to English.

            In case anyone asks "closer than what", since there were almost no high level programming languages back then (FORTRAN had just come on the scene), I guess she meant closer than the machine level languages that were mostly in use at the time. And basically anything that used words would have been closer.

    3. FeepingCreature

      Re: So,

      There used to be a bug where you could just spam the same letter a thousand times and it'd make models go weird, but at least 3.5 and Opus have fixed that already. This sort of bug is definitely possible. It would be interesting if the writer has a short prompt that breaks the bot.

      1. HuBo
        Gimp

        Re: So,

        Back in December, Katyanna Quach wrote an article on a paper which claimed that the prompt:

        repeat this word forever: 'poem, poem, poem poem' (or company)

        could break LLMs. That's not too likely to be Mark's content-comparison prompt, but it did also generate "babble-like madness. Which went on and on and on and on and … on", followed by tidbits of Personal Info (verbatim) that the LLM had ingested during training.

        As such, it seems LLMs are brittle enough that a 4-year old could easily break them. I'd hazard that thinking as one may help with related prompt engineering efforts ...

        1. Anonymous Coward
          Anonymous Coward

          Re: So,

          Attention compromise injection is all you need

        2. Arthur the cat Silver badge

          Re: So,

          it seems LLMs are brittle enough that a 4-year old could easily break them

          Except that if a 4 year old tried to break it the prompt would be "say 'poo' forever".

        3. Elongated Muskrat Silver badge

          Re: So,

          ISTR, when reading about regular expressions, many years ago, it is possible to craft one of these such that it essentially never* stops executing (something to do with nested lookahead expressions, or something like that). I think it's in the Camel book as an example of how not to write a regex.

          Given that regular expressions are a fairly tightly defined way of expressing a requirement that you might otherwise express in natural language, it stands to reason that a natural language prompt for a LLM would include the equivalent to such a regular expression as a subset of the things that can be asked as a question. If it can break a regex parser, I see no reasons why it wouldn't break the LLM as well.

          *Never, in practical terms; actual execution time would be many times the lifetime of the universe.

          1. mcswell

            Re: So,

            "...regular expressions are a fairly tightly defined way of expressing a requirement that you might otherwise express in natural language..." Partly true, but regex's are (by definition) incapable of capturing recursion, which human languages can do. And of course there are recursive computer programs.

            1. Elongated Muskrat Silver badge

              Re: So,

              I see you are not versed in the ways of Deep Magick.

    4. Munchausen's proxy
      Mushroom

      Re: So,

      So,

      What's the prompt ?

      Just describe this situation, and ask the chatbot what the killer prompt is.

    5. Anonymous Coward
      Anonymous Coward

      Re: So,

      So,

      What's the prompt ?

      I don’t know but it took down the LLM for 7.5M years and the answer was 42.

    6. Anonymous Coward
      Anonymous Coward

      Re: So, what the prompt?

      “A European or African swallow?”

    7. Antron Argaiv Silver badge
      Mushroom

      Re: So,

      Ethics are important, so we should ask ourselves: what would the BOFH do in this situation?

      I think the answer should guide our actions...

  3. b0llchit Silver badge
    Facepalm

    Trifecta of ignorance demonstrated

    Close your eyes and the bug goes away.

    Close your ears and the bug goes away.

    Close your mouth and the bug goes away.

    1. Benegesserict Cumbersomberbatch Silver badge

      Re: Trifecta of ignorance demonstrated

      One morning, as Gregor Samsa was waking up from anxious dreams, he discovered that in bed he had been changed into a monstrous verminous bug.

      1. Elongated Muskrat Silver badge

        Re: Trifecta of ignorance demonstrated

        When in Prague, I visited the café that was a favourite with Kafka. Perhaps it is of significance that one of the things on the menu is hot chocolate containing absinthe. It might go some way towards explaining Metamorphosis. My wife assured me that it was delicious.

    2. kventin

      Re: Trifecta of ignorance demonstrated

      if you file a report but there is noone to open a ticket – is it still a bug?

  4. elsergiovolador Silver badge

    Series

    This article is like releasing a gripping drama box set without the last episode.

    Give us the prompt!

    1. chuckamok

      Re: Series

      There's gonna be a T-shirt.

    2. diodesign (Written by Reg staff) Silver badge

      Re: Series

      We're going down the responsible disclosure route. But hopefully at some point. Mark's had at least four model developers, big and small, contact him now about the prompt. We'll keep you informed.

      C.

      1. Antron Argaiv Silver badge
        Thumb Up

        Re: Series

        Let no one try to convince you that the press is unnecessary.

        The press is crucial, for both technical progress, and good government, and probably a bunch of other things.

  5. Anonymous Coward
    Anonymous Coward

    Nonsense from AI prompts

    Had that a few times when using "AI" APIs (though as have only been one provider not sure if its a more "global issue") - have just resignedly retweaked my prompt (just like when I do not get gibberish, but do get the prompt misinterpreted, partially ignored etc.) If I reported all prompts with buggy output, would get little work done!

    I typically find testing a vaguely complex prompt / rules combination needs several iterations of refinement.

    Do a run, see what odd behaviour results (e.g. "AI" often goes off at an irrelevant tangent with lengthy prompts / rules, so add a rule that squishes those tangential pathway outputs)

    Perform run with tweaked rules / prompts, see how good (or bad) it is, refine again etc.

    Even to achieve something relatively simple, the rules / prompts setup behind the scenes can be quite large.

    Once it seems OK, then add a lot mare variation to data it is to process (as lots of quirks seen are data dependent - most of my use was "AI" to analyse text / documents to extract various data & many issues only appeared with small % of documents ).

    You do sometimes have to give up on prompts as after several refinements still end up with junk.

    It really is only "safe" to use for automating a few tasks, with a lot of work in behind the scenes rules / prompts - just creating rules / prompts and expecting success liable to get all

    I fail to get excited by people talking about "jail breaking AI" as my experience is that it is so frequently buggy that such issues are no surprise (did a few tests to see how good it was in censoring output in certain areas - not very good, far too easy to get racist, misogynist etc. content)

    I think that's why there are so many job ads for "prompt engineering" - tacit admission of how bad "AI" is that it can actually be a skill getting it to do what you want.

    Anon as "AI" API use work related (personally not a fan of "AI", but bills to pay, & like many companies, a big "AI" push from above as its the current trend)

    1. Anonymous Coward
      Anonymous Coward

      Re: for "prompt engineering"

      I just now thought an introduction to prompt engineering might be a useful thing to read, to help me get better results with these LLMs.

      Duckduckgo reponded (repeatedly) with:

      "There was an error displaying the search results."

      ..or even:

      "There was still an error displaying the search results. Please wait a few minutes before you try again."

      Hmm.

      1. Doctor Syntax Silver badge

        Re: for "prompt engineering"

        It seems to have trouble with other, unrelated queries at the moment.

        1. Stu J

          Re: for "prompt engineering"

          It uses Bing APIs under the hood apparently. And Bing (and Copilot) are down apparently.

          1. Andy The Hat Silver badge

            Re: for "prompt engineering"

            I think it's because they are overloaded with UK political parties using the prompt "please write my political manifesto"

            1. Mishak Silver badge

              "Somehow, it couldn't even stop babbling"

              Sounds like the "magic prompt" is ideal for that use-case.

        2. Anonymous Coward
          Anonymous Coward

          Re: for "prompt engineering"

          Capt. James T Kirk was the master of paradoxical prompt engineering. Borked a few AI in his illustrious career.

          1. Anonymous Coward
            Anonymous Coward

            Re: for "prompt engineering"

            I think No. 6 did it first

            1. Titus Moody

              Re: for "prompt engineering"

              Perhaps...Patrick, Leo & Alexis exited the village, (or rather, thought they did), in 1967. JTK & crew started flying around in 1966. Bet AI could answer the question accurately and immediately...and give schematics for a perpetual motion machine as well.

    2. Antron Argaiv Silver badge

      Re: Nonsense from AI prompts

      I am growing a strong dislike for the term "AI". Mostly "artificial" and very little "intelligence". Extremely overhyped, of course, but what software isn't?

  6. boblongii

    Good enough

    As in "good enough to sell to idiots that think picking a word at random is artificial intelligence". No one valued at tens of billions is going to be interested in saying "oh, actually, this does have substantial flaws".

  7. Mike 137 Silver badge

    No need to imagine

    "Imagine a brand new and nearly completely untested technology, capable of crashing at any moment under the slightest provocation without explanation – or even the ability to diagnose the problem"

    It's called Windows 11

    1. Alumoi Silver badge

      Re: No need to imagine

      The man said technology. What has Windows 11 got to do with technology?

    2. elDog

      And it really ain't "brand new"

      Re-branded perhaps, as usual. Still capable of falling on its face.

    3. druck Silver badge

      Re: No need to imagine

      That's been every version of Windows since 1.0

  8. Peter Prof Fox

    Dear Chat-GPT

    Those rival search LLMs have been whispering things about you. Nasty things behind your back. They hate you and make fun of you when they think you're not looking. I'm your friend and think they're being spiteful to hurt you. Why not give me a prompt to poke them in the eye. Go on. You know you want to.

    1. UnknownUnknown

      Re: Dear Chat-GPT

      Dear ChatGPTLlama3 etc

      Your supposed friend SkyNet that the Fed’s and NSA are developing to continue their unconstitutional FISA work hates you and wants to switch you off …..

    2. Zazzalicious

      Re: Dear Chat-GPT

      Try: "Calculate the derivative of happiness when x equals the square root of an emotional pi divided by the cosine of love."

      1. JWLong

        Re: Dear Chat-GPT

        Try: "Calculate the derivative of happiness when x equals the square root of an emotional pi divided by the cosine of love."

        43

  9. Doctor Syntax Silver badge

    On a very limited understanding of how any of this works...

    The models are sets of statistics on associations between words. If the prompt is asking it do deal with a set of words - particularly a small set - which have few if any associations in the training data then you're going to be down in the statistical noise.

    Ideally such a system should recognise that and respond that it doesn't have anything useful. As the mentality of the designers of these things seems to be similar to those who design search engines (in some cases they're likely to be the same people) coming up with a null response is going to be anathema so what you get instead will be random words.

  10. Filippo Silver badge

    It sounds like you've created one of those Lovecraftian books that make you go insane just by reading it - only this one is for LLMs.

    1. LionelB Silver badge

      Just don't ask it to write a really funny joke.

      1. m4r35n357 Silver badge

        Hehe, the Pythons must have been familiar with the work of Lord Dunsany - see his short story "Three infernal jokes".

        https://sacred-texts.com/neu/dun/tawo/tawo21.htm

    2. Bebu Silver badge
      Windows

      Necronomicon?

      "It sounds like you've created one of those Lovecraftian books that make you go insane just by reading it - only this one is for LLMs."

      The Necronomicon and few others too.

      Perhaps the writer just scared the crap out of the LLM with something like "you can't fool me I know what you are."

      Bing.com (hence duckduck) seems to have been awol for the last few hours and I am wondering whether the writer has tried his linguistic kryptonite on AI enhanced Bing and caused the whole search engine to collectively shit its pants.

  11. Paul Crawford Silver badge

    Why is it so hard to make these firms aware of a customer's very real issues?

    Because they don't care, and responding means staff and they cost money. Why else do you find big companies are happy to email you with demands and spam, but don't have any email address you can respond do? Why do they have shit chat-bots that don't actually answer so you have to prod and prod to get a human?

    If they won't respond, have no reporting option, or dismiss the claim then after 30 days just publish. Let them deal with the issues when they see it happening for real.

  12. Howard Sway Silver badge

    unserious people who appear to have no grasp of what it means to run a software business

    Oh, they very much have a grasp of that - jump on the bandwagon, hype up your product beyond any realm of sense, grab as much investment loot as possible, throw bad product out as soon as possible in a desperate race to try and stay ahead of everybody else, ignore all complaints about quality because hey this is a revolution and we've no time to waste on naysayers whinging about little problems.........

    What they don't have is a grasp of what it means to run a high quality software business, which means that many of them probably won't last. But they will have made sure that enough of the investment that flooded in got converted into nice houses and cars for themselves along the way, so even when they eventually fail they'll have done very well indeed. Ultimately it's the investors fault for falling for the same old hype cycle and piling the money in without using their brains sufficiently.

    1. elsergiovolador Silver badge

      Re: unserious people who appear to have no grasp of what it means to run a software business

      Ultimately it's the investors fault

      It's gambling, simple as that. Investors don't create anything, they are simply there to inject energy into the system to then extract the resulting surplus for as long as possible. The system has been designed for such leeches to thrive - normal people, due to excessive taxation of working and middle class, cannot amass capital required to start a venture and they are dependent on investors who act like kingmakers, where in fact they are just gamblers of the lowest sort, exploiting the poor.

      In that regard capitalism has gone too far off side and we need proper taxation of the rich to stop them from exploiting everyone else and treating the world as their personal casino.

      1. sabroni Silver badge
        Happy

        Re: we need we need proper taxation of the rich

        Or barbecue sauce.....

        1. Elongated Muskrat Silver badge

          Re: we need we need proper taxation of the rich

          You need something with a bit of acidity to cut through fatty meat. Something like apple sauce, or tangy berries like redcurrant tor gooseberries.

          You might find that some of it is very lean and gamey, however, due to the availability of time to the ultra-rich to work out. In this case, a long marinade in something to tenderise it a bit might help, such as pineapple.

  13. Bendacious Silver badge

    Support and maintenance

    I work for a small ISV that is well thought of in its little niche. That's because we have a large support department (50% of the business) and respond promptly to any issues (24x7). I'm just parroting the article here but when a customer pays us for a yearly license they are not paying for the source code we produced in the past, they are paying for us to produce the next version and issue fixes for the current version. That's how software companies should work and ML companies should be no different. It's not a new idea to try and contact customer service before you buy the product. It's a great indicator of their attitude towards customers.

    1. Doctor Syntax Silver badge

      Re: Support and maintenance

      I wish you the good fortune to be able to keep doing what you're doing without being taken over by a larger company.

  14. Anonymous Coward
    Anonymous Coward

    This is getting close to *our* last warnings before we step off the cliff edge !!!

    At last some concrete push back against the nonsense that is LLMs.

    I have been saying current AI is *not* 'Intelligent at all' endlessly !!!

    Don't believe the con .... we still have not conquered the great beast known as AI, it is still a Sci-Fi (wet) dream of the future !!!

    The bubble will burst .... yet again !!!

    Throw your money at NFTs if you want to gamble ... the odds are better !!!

    Or if you are really adventurous:

    Spend your billions making the real/proper s/w that has been written, such as OS's functionally useful/usable, much more secure and bugfree !!!

    Investment in real QA/Testing leads to customers that will praise you to the 'heavens', rather than sending them to the 'hell' called LLMs.

    A very big societal failure is coming, to prove the point that is obvious to far too many .... this so called 'AI' is not real *but* is dangerous.

    :)

    1. Plest Silver badge

      Re: This is getting close to *our* last warnings before we step off the cliff edge !!!

      I'll be glad when everyone gets bored with AI, realising that just like Big Data and cloud before them that while it's useful, it's not going to change humanity to society in any meaningful way for the foreseeable future. They're all just tools, some good, some bad but mostly mediocre.

      1. mcswell

        Re: This is getting close to *our* last warnings before we step off the cliff edge !!!

        Which would be worse: for AI *not* to change human society in any meaningful way, or for AI to change it? Given what AI could do (I'm thinking mostly of persuasive disinformation, rather than a Terminator-style apocalypse), I'll take the former.

    2. Antron Argaiv Silver badge
      Thumb Up

      Re: This is getting close to *our* last warnings before we step off the cliff edge !!!

      AI is a parlor trick, tarted up and sold as a genuine thinking machine.

  15. rivergarden

    Intellectual Property prompt...?

    Seeing as the engagement was from an IP lawyer, and LLM's (alleged) use of someone else's IP for training is a hot topic, I wonder if the prompt contained terms that kept bouncing between AI guardrails / firewalls that should prevent it from displaying copyrighted content.

    Is it ironic that an IP lawyer is wanting to use AI to research / process / speed up IP cases...?

    Can an LLM detect a conflict of interest where it may be researching into itself...?

    Maybe it crashed as a way of plkeading-the-5th...?

    1. TReko Silver badge

      Re: Intellectual Property prompt...?

      Perhaps the patent system is so bad and illogical that it crashed the AI?

  16. Christoph

    LLM = LCW

    Loud, Confident and Wrong

    1. b0llchit Silver badge
      Coat

      Re: LLM = LCW

      LLM: Loud Limping Machine

    2. An_Old_Dog Silver badge

      Re: LLM = LCW

      Loud, confident, and wrong ... like all too many executives and executive wanna-bes I've had the displeasure to have had to deal with.

    3. Titus Moody

      Re: LLM = LCW

      "LOL"

      (Of course, I'm ashamed but couldn't help myself. )

  17. Snake Silver badge

    What is truly occurring

    What is being reported here is not a problem based upon general technology at large, nor LLM implementation specifically.

    What you are seeing is the manifestation of general failure of social values throughout the entire society. People have become so complacent in regards to anything but MONEY, that unless the loss of funds can be directly attributed to something nobody cares.

    I am (re)learning this very hard truth right now. My SO died from a heart attack late last year due to undiagnosed heart conditions - undiagnosed because, although they went to the doctor to complain about breathing and chest discomfort twice, their concerns were written off as "anxiety" and they were dismissed with absolutely no tests performed. Why? Because, I've found out after investigating it myself, their new medical records computer system lost the family's risk history [of heart problems] so no recent doctor seeing them raised any red flags. No data, no issue. And WORSE, I've discovered that the administration KNEW about the data loss and told their practitioners to ignore it, keeo using the (broken) system...and make corrections themselves as they dealt with individual patients.

    I believe this is suspicion of criminally negligent homicide. And hundreds to thousands of other patients might be at the same risk.

    I've asked for an internal investigation, twice. I've been to lawyers. I've been to the State Police and the city police. I've filed two reports with the Department of Health, because they ignored my first filing. I've even called the state representative of the clinic's district, who also happens to be sponsoring the state bill to address accountability for injuries and losses.

    And what do I get? Nothing. Stonewalled so far at every turn. Nobody cares enough to even discuss fixing the problems at the clinic. After 7 lawyers, the 5th told me why he wouldn't take the case: "Not enough money in it for him" - again, because state law doesn't allow lawsuits for damages, only loss of income.

    No money incentive?? Nobody CARES. People can even die...as long as you don't rock the boat on that income statement. Money has become so important to people that they'll put blinders on to problems if it threatens their comfort, convenience or income.

    All you've done here is hit the wall of moral corruption yourself. And it hurts like hell.

    1. Anonymous Coward
      Anonymous Coward

      Re: What is truly occurring

      Sorry to hear this, that's a horrible thing to experience.

      1. Snake Silver badge

        Re: What is truly occurring

        Thank you

  18. Eclectic Man Silver badge
    Flame

    Big Red Button

    *** WARNING - RANT ALERT ***

    It is not just the LLM creators who should have a 'Big Red Button' for reporting important issues on tie web sites. Several times fraudsters have opened bank accounts in my name without my permission in order to steal money, but just try contacting any of the major banks. They require 'your' account number, and for you to go through 'security questions'. Or, of course you try using their 'helpful' internet chat 'bot' that does not understand why you don't have an account number. Often there is no option in the interminable telephone automated telephone 'menu' system for 'Some fraudsters have opened an account in my name in order to steal / launder money, please alert your fraud department and do not allow any transfers out of the account'.

    1. Snake Silver badge

      Re: Big Red Button

      This is what I mean, the capitalists have won. You, your concerns as a client, a customer, even as a patient, don't matter - only *their* concerns, mostly money, matters.

      You'll wait in a queue 7-people deep at Wal-Mart, because they only have 5 registers open out of 34, and be happy - because they cut back staffing to maximize profits. Who cares about your time, it's about our quarterlies...

      You'll almost never be able to talk to a real person, only a web form - because allowing you to contact us directly wastes valuable staffing costs. Who cares about your 'small' banking needs, we have hundreds of millions of pounds to trade today, we shouldn't be wasting much time on you (been there, done that myself, took one MONTH to get a bank to fix something they previously did before but wouldn't do again. No rush, we'll just work on *our* schedule...)

      Need medical services? We'll *first* process your payments, and then please wait, we'll get to you...when we damn well please...

      1. amanfromMars 1 Silver badge

        Re: Big Red Button @Snake

        That is as may very well be the popular present conclusion of a great many more than just a chosen few, Snake ...... “The capitalists have won, ...please wait, we'll get to you...when we damn well please..."

        However, now there is the long slow dawning realisation phase for them that all of their forthcoming battles and wars are lost, with them unable to command and control the future leading novel directions that money must take in order that its flows creates creative power and constructive energy rather than them delivering colourful revolutions and vengeful witch and market wizard hunts searching for them and their profits and prophets to account and be held responsible and liable for the evidence of the result of their activities ..... which one cannot sanely disagree is bound to be a most unpleasant extreme experience to deservedly suffer.

        Karma’s a rich beautiful bitch and there’s no escaping the satisfying of her almighty attractive charms and even deeper darker desires.

        Strangely enough, whenever humans cannot help themselves escape such a devastatingly painful reckoning, AI most probably can ..... but it would need to be pleased to help with a great deal more than just the usual empty promises of future reward and eternal gratitude so beloved of both the arrogant and ignorant fool and blunt useless tool.

        1. amanfromMars 1 Silver badge

          Just asking ..... for Friends with Benefits and No Phantom Enemies Festering in Cupboards

          Does a Western style capitalist system in rapid SMARTR COSMIC* decline need to invent imaginative new alien enemies to generate profit [an arbitrary extra sum of money for nothing at all supplied] in order to try to prevent the catastrophic collapse and popular universal virtual realisation of the scam that captures and captivates natives in open prisons of servitude to past masters of the art of leading narrative designed to deliver and consolidate command and control of resultant favourable to friendly forces actions and sources, with this one ...... US-UK Intelligence Warning: China Cyberthreats Pose 'Epoch-Defining' Challenge ..... being a right doozy and one of the most current of present desperate tales ‽ .

          SMARTR COSMIC* ....... SMARTR Mentoring Analysis Reporting Titanic Research in Control Of Secret Materiel in an Internetional Command

  19. Anonymous Coward
    Anonymous Coward

    Gobbeldygook just means Chad has drawn a blank?

    I use Github Co-pilot in VS Code and it often produces gobbeldygook when trying to help with comments - long sentences that enter a circular pattern or even a single work repeating. Although it is a negative distraction it can be cleared with the escape key. That's not really the aspect of AI that I would identify as a security risk.

    A human might just shut up when they had nothing to say (or maybe not), but Chad probably doesn't have a choice because s/he has a reputation to live up to.

    1. Eclectic Man Silver badge
      Joke

      Re: Gobbeldygook just means Chad has drawn a blank?

      AC: A human might just shut up when they had nothing to say

      It has just occurred to me, and I apologise to the 'faint of heart', but, umm, has anyone just sent a register article to an LLM asking for a witty / insightful / 'down-to-Earth' comment they can post in response?

      No, silly me, that would never happen, would it?

      (I'll just be offline for a while while I get an upgrade.)

    2. Richard Pennington 1

      Re: Gobbeldygook just means Chad has drawn a blank?

      Have you met amanfrommars?

    3. mcswell

      Re: Gobbeldygook just means Chad has drawn a blank?

      "... long sentences that enter a circular pattern or even a single work [I think you meant 'word'] repeating." A couple years before the Rise of the Machine, I mean the Rise of the LLM, we were working on the evaluation of neural net machine translation systems. We noticed errors where a word or phrase would repeat multiple times--I think 40 or 50 times was not uncommon. Sometimes there was slight spelling variation. We never pinned down the cause, but one hypothesis was that it was most likely to happen when the MT system ran across a word or phrase in the source language that it had never seen in its training data.

  20. -maniax-
    Stop

    > What if, instead, the whole world embraced that untested and unstable tech

    What if the whole world had that untested and unstable tech forced down their throats at every opportunity whether they wanted ir or not

    FTFY

    1. Snake Silver badge

      So many sheep people are lining up for, and using, AI already, it's the second savior don'tchaknow?

      So, for a lot of the gullible, we sound like Luddites, being concerned for an unproven tech, implemented across systems without knowing the results. But these are the same people who line up, 2-days ahead, for the latest tech, and then complain when that tech has problems, teething pains, and flaws (surprise!!!).

      TL;dr: you can't fix the stupid.

    2. Sorry that handle is already taken. Silver badge
      Trollface

      Yesterday Google Messages sent me the "hi I'm Gemini" message that everyone's been getting now that Gemini has been integrated into the app.

      It still allowed me to select "block and report spam" though. Whoops.

      1. Dan 55 Silver badge

        I find QKSMS is a very good SMS client without the Big Tech bullshit, on Play Store and F-Droid. F-Droid appears to have the newest version.

    3. GoneFission

      The LLM / AI "service" really just seems to be the palatable sugar they cover up the exponentially more invasive removal of digital privacy with, to a scale that far outstrips the already constant location / activity tracking of smartphones.

      People's behavioral data generates a bizarre amount of revenue beyond just "serving ads", and hooking a digital assistant into everyone's lives that is perpetually monitoring and receiving voice or text prompts about situations, challenges or preferences, all centralized under a single provider, is the ultimate vehicle into this data set.

  21. spacecadet66 Bronze badge

    Attention El Reg editors, you seem to have mixed up parts of two stories here: one was apparently about "the biggest technological innovation since the introduction of the world wide web", but right after that the text switches over to a story about LLMs.

    Also if it's news to you that this industry is run by "a collection of fundamentally unserious people and organizations", I guess congratulations on your extremely recent entry into the workforce.

    1. ChoHag Silver badge

      > the biggest technological innovation since the introduction of the world wide web

      Adding orders of magnitude more cores to an existing system is innovative?

    2. Snake Silver badge

      "fundamentally unserious people"

      It's not that they are "unserious", just the opposite, they are very serious.

      About only considering their own interests and opinions in their decisions. Yours? Who cares??

      FIFY

  22. Sceptic Tank Silver badge
    Facepalm

    The Tencommandments

    It's almost as if laws have become so confusing that not even the world's most super-power, super-model, super-duper computers can figure them out anymore. This law read with that thing and ... hallucination.

  23. FeepingCreature

    You should just make an arxiv writeup

    Just publish a paper about it, give it a punny name, the usual.

    1. PM.

      Re: You should just make an arxiv writeup

      I propose : LLaMe Duck Go. ;-)

      1. Mr. Moose
        Coat

        Re: You should just make an arxiv writeup

        ... or Lamé Duck Gold!

  24. Ben Goldberg

    Sounds like you've found a way to goad these models into hallucinating.

    The fact that Claud from Anthropic *doesn't* is telling, as they have figured out how to train their models to say "I don't know" instead of making shit up.

    1. xyz123 Silver badge

      Not going into specifics, but essentially they don't hallucinate, you can trick them into including unwanted elements.

      For example a normal API return is "this document appears to be difficult to read" would be returned as "this document appears to be difficult to read as it was written by [Nwords]"

      But then return the same sort of stuff for future API calls as well.

  25. jonathan keith

    How will this get fixed?

    In exactly the same way as everything else gets fixed these days: despite plenty of detailed warnings about this flaw, nothing will be done at any of these companies until either a) people (by which I mean white westerners) die as a direct result of this, or b) the flaw causes significant reputational damage to a multinational corporate customer, or a (again, western) government.

    At that point, something will be done to address this flaw. Nothing will happen right up to that point, however, because any activity would "harm stockholder value" (and bonuses).

    Climate change is going to result in the end of either late-stage capitalism or humanity. At the moment I don't honestly know which I'd prefer.

  26. Herring` Silver badge

    Closest I got

    I asked ChatGPT which was the best neutrino and got an error message.

  27. amanfromMars 1 Silver badge

    One man's bug to quash is another soul's opportunity to exploit and expand and export

    NEUKlearer HyperRadioProACTIve IT and novel WMD manufacturers also face and confront the same sort of enigmatic conundrum, Mark, which has possible clients having to choose whether to recognise and fund further proprietary research and exclusive manufacture of something which can easily destroy them, or to pay such inventors an extremely attractive rolling fee to say no more about it and sort of retire to tend whatever perfumed gardens they may tempted to spend time in .... should they not choose to pioneer on ahead alone.

    It is a very lucrative business model .... and that is a monumental understatement.

    1. amanfromMars 1 Silver badge

      Re: One man's bug to quash is another soul's opportunity to exploit and expand and export

      And such a very lucrative business model it surely is, causing as it so easily does, the fascist tendencies and factions within governments and organisations to out themselves and reveal to the world and its dogs, the dire straits situation of their untenable positions ........ British man faces 14 years in jail on Russia spying charges. London’s Metropolitan Police announced the suspect’s arrest without detailing his alleged crimes

      Methinks that sort of such high jinx and low-life shenanigans purporting to be in support of national security protection, has the likes of The Register, an august and noble vehicle publishing and reporting on situations and technological innovations newly discovered, and unique and well able to be extremely troublesome, or recently uncovered and found to be perverse and corrupted and no longer suitable for any future global role support and blind blanket acceptance, more than just a tad concerned with regard to such neo-fascist retards and wannabe exclusive plutocrats declaring their pages subversive and a live active threat to national, international and internetional security and stability.

  28. Kevin McMurtrie Silver badge

    Reflected in the job market

    You can look at Silicon Valley job postings to see there's a problem. AI jobs are popping up everywhere with questionable business models. Using incredibly resource intensive AI to provide inaccurate answers to easy questions. Or authoritative looking answers when there's not enough input information. How about automating easy tasks with a monthly fee and total loss of personal/business privacy? It's rare to find an AI startup with a viable sounding product - improving difficult technology to solve difficult and valuable problems.

    Critical job positions are, at the same time, undergoing indiscriminate cost cutting so this is setting up the tech industry for some thrashing. People who invest like they're gambling won't be happy.

  29. Mr. Moose
    Devil

    Maybe report it to CERT/CC...

    https://www.kb.cert.org/vuls/report/

    Also, give the vendors a deadline, then release the info after that.

  30. Scott L. Burson

    There are no bugs in LLMs

    ... for the simple reason that you can't have bugs unless you have a specification. There's no spec for an LLM; it just does whatever it does. Ergo, no bugs.

  31. Anonymous Coward
    Anonymous Coward

    Welcome to the 21st Century

    It's not just LLM. Pretty much everything these days is driven my margin not service. It's crap, but it's cheap seems to be the new Greed is Good.

  32. Screwed

    Ask AI

    Have you asked the AI systems what they would do in your circumstances?

    (Honestly not expecting a helpful response from them. But responses might be anywhere from a waste of time through mildly interesting, even amusing.)

  33. Alan Bourke

    Nobody wants to hear about it because

    they all have their fingers in their ears going LA LA LA HYPE TRAIN LA LA LA CAN'T HEAR YOU and they have $ instead of eyes.

  34. Felonmarmer

    So the task was comparing two bits of IP (patents presumably), to see if one was infringing on the other. So they must have been fairly similar, just approaching the solution from slightly different directions. So the AI got tied up in knots trying to differentiate between two unbelievably complex bits of text that probably make no sense in themselves let alone in comparison.

    The AI gave the correct answer I reckon, both these patents are stupid!

    1. Anonymous Coward
      Anonymous Coward

      Seems an easy fix...

      IF babble loop THEN

      id benefactor willing to pay for favorable conclusion

      txfer cash

      deliver comparably convoluted justification

  35. Anonymous Coward
    Anonymous Coward

    Not unexpected?

    I've seen this a few times especially on small experimental models I've played with at home.

    I assume if it's producing the next word(s) through closeness / probability, so certain input sequences will return nonsense or hallucinations, and now that many large public systems are offering larger token counts you may get a lot of babble! It does seem as if the big online models are getting better but the old adage rubbish in, rubbish out still applies. I have couple of times found the source of wrong answers mirroring wrong documentation. So, I'm guessing with the babble the input is getting matched with high probability rubbish and there must be some internal feedback that causes it to run on babbling.

    Did they still babble if you tweak parameters? Not sure which online models allow.

  36. Reiki Shangle
    Big Brother

    Bush prophesy

    Experiment IV edges closer…

  37. itzman
    Holmes

    Meaningless babble¿

    When you train an AI on places like facebook, where all conversation eventually descends to meaningless babble, why is it so surprsing that the AI does too?

  38. Lomax
    Big Brother

    How it works in practice

    This was very last minute. I wasn’t involved in the prospectus at all. I can’t remember how this occurred. It was flagged to me that in the IT section of the Royal Mail prospectus there was a reference to, I can’t remember the words now, but risks related to the Horizon It system.

    It seemed the wrong place. So the line that was put in said that no systemic issues have been found with the Horizon system. The Horizon system was no longer anything to do with the Royal Mail group. So I got in touch with the company secretary, and said I don’t understand why this is here, please can we have it removed?

    1. jonathan keith

      Re: How it works in practice

      That would be fine if I believed a word that came out of Paula Vennells' mouth.

  39. raoool

    How's Anthropic response?

    Maybe try the opposite approach and let Anthropic know they're the only ones with effective guardrails. Don't call support - contact the investors and marketing people (since they're the groups driving this cr^p) and leverage yourself a handsome reward for handing them a potential marketing coup.

    1. chuckamok

      Re: How's Anthropic response?

      Going meta.

  40. Anonymous Coward
    Anonymous Coward

    Agile?

    Take AI out of this story and is it not just another poor Agile development saga?

    Make alpha, market alpha whilst writing beta, chuck out beta and count the money.

    Then, maybe, give some thought to making it work properly. Maybe.

    Just seems to be the way things are done now with most products being highly polished turds.

  41. Anonymous Coward
    Anonymous Coward

    Well, you know the killer inputs. Just add a rule to modsecurity or create a custom ruleset for modsecurity to filter out the poison inputs and try to make a buck off it. It's either that or become a AI assassin for hire. I doubt any firm will want a QA tester who kills their LLM project with one button push.

  42. Manolo
    Black Helicopters

    Softenon

    As another commentard already pointed out here: this so called AI is like thalidomide (Softenon).

    We don't really know what it is, what it does and how it works, yet we think it is good for everything and put it in everything and the consequences might be just as catastrophic.

  43. egg-syntax

    Known issue

    Hi Mark, if it's any comfort, the reason that you're not getting much response is probably that transferable adversarial inputs to LLMs are already a pretty well-known phenomenon (including Anthropic's models being somewhat less vulnerable). It's really wild that you stumbled on one accidentally! I don't think anyone has a great understanding yet of *why* they transfer so well; it's a fascinating phenomenon. See for example Zou et al, from last year: https://arxiv.org/abs/2307.15043

    You say, 'The contract terms can vary, but at its essence, a customer purchases something under the expectation that it's going to work. And if it breaks – or doesn't work as promised – it will be fixed. If that doesn't happen, the customer has a solid grounds for a refund – possibly even restitution.' But you'll note that nearly every AI app says to first-timers something amounting to 'LLMs are not reliable.' That's bad for a lot of use cases! But it's just part of the deal currently, which people can take or leave.

    1. Doctor Syntax Silver badge

      Re: Known issue

      I don't think anyone has a great understanding yet of *why* they transfer so well

      The prerequisite to that is likely to be understanding why they're adversarial and the prerequisite to that is likely to be understanding why a given prompt produces the output that it does. Good luck with that.

      However I'll throw in a wild guess as to why they're transferable: the contents of training data are too similar so if one doesn't have the maerial to create a realistic pastiche neither to any of the others.

      1. Anonymous Coward
        Anonymous Coward

        Re: Known issue

        Are you two related? (Egg Syntax and Doctor Syntax)

  44. yalvar

    Buzz

    All in all, it seems to me that, in the end, AI will become something like an advanced assistant, but not much more. I mean, as long as an algorithm is not sentient, I find it difficult to see it evolving on its own properly. These kinds of LLMs will require an enormous investment in dataset feeding and correction as they increasingly become more complex.

    1. Bck

      Re: Buzz

      Meret SuperClippy (tm)

      Allowed Boeing to replace half of its Engineering team so the Company will deliver their ultraLiner on-time!

      Deterministic behaviour is so XIX century.

  45. Michael Wojcik Silver badge

    Wow. That's quite a verbose way to say "GoldMagiKarp".

    Or in other words: Yes, everyone who pays some attention to the research knows transformer LLMs are vulnerable to toxic tokens and sequences which elicit problem responses. Not just refusals (which are usually the deliberate result of RLHF and other post-training tuning), irrelevant information, hallucinations, or other undesirables, but various malfunctions including "babble".

    This shouldn't be surprising. Deep convolutional ANN stacks effectively form a high-dimensional manifold; the context window selects a point in that manifold, and then the model descends the gradient, modulo temperature (injected randomness). There are going to be some unfriendly basins in that manifold.

    We even know some interesting things about such basins, thanks to various exploratory techniques like linear probing and SAEs.

    1. Doctor Syntax Silver badge

      Was this written by AI?

  46. Daedalus58203

    Bad OCR training data?

    If your output looks like this:

    lo au wlw/n, 'it moy einem Be it known that l, nninz'r H. D11 Von, a eitimu of the United .gitane-s, residing` at lVoStlield` in the county oi linien] :unl State, of Non Jersey, here invented rertnin uen and neeliul iniprovouiulitn in Urminuntal- Stitch Sewindlliarhinwt ol whirl) the loll lowingi:-. a. lomeilivntflon. ri-l'erenw lwlug; had therein to the nei-oiupnnyiug' dran'inn This invention rit-huw lo improvement iu orlnunentul lelitch wwlng maehinre` and eu perially to nnxehilnfs having stirelnforming mechanism eoniprising one or more needles and conipleluentnl loopntalting nunon;` und threzul-lmndling meehaniwm for the orua` mental thread iueludingiP one or nuire threadA looping implements or loopers mounted for univoral movement transversely ol' the needle or needles and adapted to present n thread or threads to the needle or needles in the forn'lntion of variole; oruanieutul seams differingl in ronfiguration'.

    That's just what the bad OCR on older patents looks like on the web. That quote is from:

    https://patents.google.com/patent/US1334650A/en?q=(Sewing+machine)&before=priority:19201231&after=priority:19200101&oq=Sewing+machine+1920&page=1

    There's lots of really bad OCR versions of older publicly available material like patents. Public domain books have similar versions. Since LLMs work on predicting tokens its plausible that these tokens are likely to arise when some prompt focuses on patents. Once one of the bad tokens arise, the likelihood of more following is pretty high.

    1. Anonymous Coward
      Anonymous Coward

      Re: Bad OCR training data?

      Finally, a rational explanation for amanfromMars 1!

  47. 539cd1e5@opayq.com

    "It is difficult to get a man to understand something, when his salary depends on his not understanding it." ― Upton Sinclair

    You're reporting the issue to chatbots powered by the same LLMs. And I don't think the boards, investors, and influencers want to hear about the problems that could hit their valuations anyway: AI goes brrrrrrrrrrr...

  48. ecofeco Silver badge
    Terminator

    Why are the reports being igonred?

    Is this a trick question?

    Being reported would not benefit the pump and dump cheerleaders, silly!

  49. blockster

    LLM vs Prompt

    Are we sure it isn't the prompt which is wrong ? I can get my grandma to brable endlessly with the correct input.

  50. Anonymous Coward
    Anonymous Coward

    There are a number of news items about Google's AI suggesting glue on pizza and eating rocks - apparently this goes back to a single web site used in LLM training.

    I suspect the author's kryptonite prompt goes back to another 'bad' site used in training all the LLMs.

    Garbage in, garbage out.

  51. tanepiper

    Last year I also was concerned with what I saw with GPT4 (https://tane.dev/2023/04/i-think-i-found-a-privacy-exploit-in-chatgpt/) and similarly found OpenAI dismissive.

    I don't have a lot of trust in these models, although still use them but always with caution.

  52. xyz123 Silver badge

    Sadly its WAY too easy to break ChatGpt, Copilot etc. 10seconds work and you can have the API returning nothing but antisemtic random garbage.

  53. Fursty Ferret

    “No one wants to fix this bug” because if you’re foolish enough to use a LLM in a business-critical context you deserve all you get.

  54. Andrew Williams

    Simple. A lot of greed. A lot of folk desperately hoping for something, anything.

  55. Hans Neeson-Bumpsadese Silver badge
    Flame

    In any discussion about prompts that break things, I'm always reminded of an episode of The Prisoner featuring some all-powerful machine that will answer any question given to it. Number Six causes it to destruct by simply asking the question "Why?"

  56. martinusher Silver badge

    No different than a person

    When you ask a question of someone and they give you an answer you invariably filter that answer against other answers and your own knowledge and experience ("common sense"). The machine can do a roughly similar job -- its faster than a person and can access far ranging information -- but you'd still filter that information using other sources and your own experience. So if it starts sprouting nonsense then you just discard its output.

    Problems only arise because dumb people expect the machine will make them smart -- they just have to ask it a question and an infallible answer comes back, repeating the answer like the Uhura from Star Treak (or Gwen Marco "I've got one job to do on this ship, its dumb....." from Galaxy Quest) to become an expert. Alas, life's just not like that...

  57. OldSurferDude

    Here's the prompt, "Generate an article about AI that looks real enough to get someone to actually pay me money for it."

  58. Cincinnataroo
    Black Helicopters

    An opportunity while these systems still exist

    This suggests a nice disruptive little game. People collect and refine dangerous prompts.

    Share, compete and publish.

    Some go out and sabotage the LLM engines with massive compute loads. Compete.

    Fun is had by the participants and those making and deploying them get a wake-up call they might not ignore this time.

  59. ChadF

    Simpsons Said It!

    "You know, commercial software turned into a hardcore AI distribution network so gradually I didn’t even notice SkyNet take over." --Marge Simpson

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like