back to article Lawyers who cited fake cases hallucinated by ChatGPT must pay

Attorneys who filed court documents citing cases completely invented by OpenAI's ChatGPT have been formally slapped down by a New York judge. Judge Kevin Castel on Thursday issued an opinion and order on sanctions [PDF] that found Peter LoDuca, Steven A. Schwartz, and the law firm of Levidow, Levidow & Oberman P.C. had " …

  1. Claptrap314 Silver badge

    I really don't like the term "hallucinate" for this behavior. The reality is that these GOLEMs are executing weighted random walks. That their output fails to match a series of phonemes that constitute a "true" sentence is not a malfunction in any way. It does not result from any error in the GOLEM's input, nor in the processing of said input. In fact, because the temperature is selectable, this undesired behavior is tunable.

    What these GOLEMs are doing is best classified as guessing. The goal of these projects is to convince enough people that these guesses are useful. Mis-attributing what is happening in the first place is useful to their marketing, but should be banned or heartily mocked in the press.

    1. Anonymous Coward
      Anonymous Coward

      Pathological guesser

      I appreciate the mathematical-algorithmic description. Sure to put the general public to sleep and pop the hype bubble. Not sure "guess" is that much much better than "hallucinate" as they both seem imply human like volition. However, if we insist the human allegory, "pathological liar" would probably be the most accurate - in this particular case the behavior is almost perfectly indistinguishable.

      1. Throatwarbler Mangrove Silver badge
        Alert

        Re: Pathological guesser

        How about "Clavinating” after Cliff Clavin from Cheers?

        1. Yet Another Anonymous coward Silver badge

          Re: Pathological guesser

          Remember pathological liars are sometimes what you want.

          I doubt Wordsworth really Wandered Lonely As A Cloud and most of Shakespeare's reports about tragic deaths in Verona was about as accurate as Fox News

          1. jake Silver badge

            Re: Pathological guesser

            There are many huge differences between being a pathological liar and being a story teller.

            1. Anonymous Coward
              Terminator

              Re: Pathological guesser

              @jake: “There are many huge differences between being a pathological liar and being a story teller.”

              Does ChatGPT know it's telling a story?

              1. Neil Barnes Silver badge
                Facepalm

                Re: Pathological guesser

                Of course not. It knows nothing. It's a statistical analysis producing text that looks like similar to text in its training material.

                'Hallucination' or 'Lying' or 'Fabrication' or even 'Wrong' are probably not the words to describe its output.

                The problem here is that the model is misunderstood by users - in this case, users who should have known better and actually sought out those alleged citations and proved they existed - because, basically, they see it as magic. A commercial search engine, while it may be manipulated to, say, prefer to list paid-for results first, does at least provide the source. A commercial search engine to search only databases of prior cases looking for similar cases might be something that can be made - it might already exists - but chatgpt isn't it.

                1. Yet Another Anonymous coward Silver badge

                  Re: Pathological guesser

                  But if the user has actual inteligence then they might realise that "make me a painting of my dog in the style of a heavy metal album cover" or "write me a python socket server class, that works like all the examples you've found on the web" or even "considering all the billions of example images you've seen - does this picture of a mole look like a cancerous one?" are good uses of AI.

                  Tell me when the next train is leaving X for Y or tell me all the real court cases that are relevant to this one - aren't

                2. Anonymous Coward
                  Anonymous Coward

                  Re: Pathological guesser

                  I think that's the issue - the model is *always* 'hallucinating', but sometimes that's what you want, and sometimes it isn't.

                  As the saying goes, a weed is just a flower in the wrong place - the model always does the same thing, but sometimes the context of the intended use is inappropriate.

                  So either the models have to learn when to reduce the S.D. of their interpolations and extrapolations, or users need to learn to frame better prompts and know when or when not to rely on the output. For now the second option is easier, we just need to get the word out.

                3. Michael Wojcik Silver badge

                  Re: Pathological guesser

                  "Hallucination" is a term of art. Deal with it. 2x4s are not actually two inches by four inches. A computer mouse is not actually a mouse. Prima facie evidence is rarely on the face of anything.

                  Technical language employs denotations which are not identical to the common uses of the same words. That's a fact of language. This entire thread is a lot of technical people conveniently forgetting they use technical language all the time themselves.

                  1. Mostly Irrelevant

                    Re: Pathological guesser

                    "Hallucination" is a term invented by the AI scam artists who want you to believe these algorithms are more advanced than they actually are. It's not reflective of the actual problem here, which is that the user believes that the AI has any concept of truth and is able to be mistaken about things. It's completely ridiculous.

                    Also, 2x4s are rough sawed to 2' by 4' before they're dried and planed, just like 1/4 pounder burgers are that weight before cooking.

            2. TimMaher Silver badge
              Trollface

              Re: Pathological guesser

              Just ask Boris Johnson

      2. Catkin Silver badge

        Re: Pathological guesser

        I think certain types of hallucination match up quite well to what's happening to the computer. With a lack of adequately detailed direct information, it's synthesing novel but nonsensical information from unconnected memories. In my view, this is somewhat equivalent to Charles Bonnet.

    2. jake Silver badge

      It's not a GOLUM, either.

      The golum concept includes animation and thus is an incorrect simile.

      Rather, today's AI is mostly a marketing exercise that doesn't work coupled to simple machine learning and huge databases that are demonstrably full of incorrect, incomplete and incompatible data, and are otherwise corrupt and stale. Garbage in, garbage out.

      It CAN NOT work as advertised, not on a grand scale. Not today, and not any time in the future.

      Please don't glorify it.

      1. Yet Another Anonymous coward Silver badge

        Re: It's not a GOLUM, either.

        Unless your database is correct and not full of errors:

        "a fund that averaged a 71.8 percent annual return, before fees, from 1994 through mid-2014." - Renaissance Technologies Medallion Fund:

        1. jake Silver badge

          Re: It's not a GOLUM, either.

          That is a vary targeted, very specific, highly specialized database. I would expect it to be as accurate as humans can make it.

          It is not by any stretch of the imagination the kitchensinkware that we are discussing.

      2. katrinab Silver badge

        Re: It's not a GOLUM, either.

        Yes, but even if your training data was entirely made up of reputable legal texts, it would still mix up different cases and laws to invent entirely new ones.

        1. Anonymous Coward
          Anonymous Coward

          Re: it would still mix up different cases and laws to invent entirely new ones.

          and my question is, what makes it do it? Is it something in the 'algorithms' that favours 'providing an authoritative aswer at any cost, if no data exists, make up data'? Or is it a 'nobody has a clue why'?

          1. katrinab Silver badge
            Megaphone

            Re: it would still mix up different cases and laws to invent entirely new ones.

            The fact that it is using all of the data to work out which word is most likely to follow the previous one.

            Basically it is a language model, not a knowledge model, and cannot be used as a knowledge model.

          2. pmb00cs

            Re: it would still mix up different cases and laws to invent entirely new ones.

            It's a case of 'nobody has a clue why' but only in so far as nobody wants to have a clue why.

            The mistake in trying to ascertain why chatgpt and other LLM "AI's" make stuff up is to assume that accuracy or truth were ever part of the problem space they were built to solve. These models are built to produce "creative" output without having to pay people to actually be creative. It just has to look like the sort of creative writing that a person would produce, entirely ignoring the fact that a person would be trying in their output to not only be coherent, but produce output that has some sort of value, be that accuracy, truth, entertaining story, etc, outside of the existence of the output alone. The belief being that if you can produce enough coherent output, quickly enough, without having to pay actually people actual money, you can magically come across this additional value for free.

            It's just an advancement of the age old capitalist riddle of "how do I get mine without giving you yours".

            1. flayman

              Re: it would still mix up different cases and laws to invent entirely new ones.

              "The mistake ... is to assume that accuracy or truth were ever part of the problem space they were built to solve."

              Have a look at the OpenAI charter: https://openai.com/charter

              "OpenAI’s mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity. ... To that end, we commit to the following principles:"

              Among which is Long Term Safety: "We are committed to doing the research required to make AGI safe, and to driving the broad adoption of such research across the AI community."

              I would argue, as Alex Hanff also persuasively argues here -> https://www.theregister.com/2023/03/02/chatgpt_considered_harmful/ <- that producing verifiably false links to articles violates the ethical framework upon which ChatGPT was supposedly trained. It should not spit out links that do not exist. It's easy enough for an internet connected machine to validate a URL. If the URL it comes out with is fake, it will not give a successful response to a user agent. If the AI is "minded" to offer sources, it should verify them. If they don't pass verification, it should withhold the information. If it fails enough times, it should retrain itself. Offering, without caveats, inaccurate and outright false information does not benefit humanity. Doubling down on the inaccurate information rises to the level of harm.

              1. Claptrap314 Silver badge

                Re: it would still mix up different cases and laws to invent entirely new ones.

                I believe that once upon a time, there was a company who's motto was "Do no Evil".

                These "charters" are marketing. Nothing else.

          3. doublelayer Silver badge

            Re: it would still mix up different cases and laws to invent entirely new ones.

            We know why it does it. It hasn't been programmed to collect facts and only present those facts. Whether that would produce acceptable results, I don't know, but that is not what it does. It just looks for words that follow certain basic rules, such as being somewhat common in other sources and following some grammatical structure. The reason it is correct when it says "The capital of Spain is Madrid" is because that sentence is more common in training data than "The capital of Spain is Barcelona" or "The capital of Spain is byzoenqzvojqgy". If those sentences were more common, it would give you one of those instead.

            GPT's designers are actually surprisingly good at putting in filters that make the model say it doesn't know something. Since they're bolting those onto a program that is designed always to print something, surprisingly good is not anywhere near good enough to trust the output. If you give the program some information to go on, such as informing it that there must be legal cases that prove the point, then it will be weighted to find some, and it only knows how to write something that looks like a legal reference, not how to find a real one.

            1. flayman

              Re: it would still mix up different cases and laws to invent entirely new ones.

              "...and it only knows how to write something that looks like a legal reference, not how to find a real one."

              But it should be able to find a NOT real one. If it wants to return a link to a source that has been made up, it should follow the link and see whether it even exists.

              1. Evil Auditor Silver badge

                Re: it would still mix up different cases and laws to invent entirely new ones.

                You're assuming that the LLM actually knows what a link is. It doesn't.

                1. flayman

                  Re: it would still mix up different cases and laws to invent entirely new ones.

                  This is where programming comes in.

                  1. doublelayer Silver badge

                    Re: it would still mix up different cases and laws to invent entirely new ones.

                    What would the programming look like? It could load a page, not find supporting evidence, and come to the following conclusions:

                    1. The site is temporarily down.

                    2. The site is permanently down, but used to show this stuff.

                    3. The site contains the information, but it has a paywall.

                    4. The site contains the information, but you have to log in.

                    5. The site contains the information, but you have to click a few links to assemble it.

                    6. The site contains the information, but it is blocking bot access in some way.

                    This assumes, of course, that the program is capable of reading another site to confirm its facts. Since it made up the facts in the first place, how is it supposed to find the site that contains verification for stuff it just invented, whether or not that stuff is correct. It can't because it is going about things the wrong way.

                    In some ways, doing this in reverse could make more sense. A bot could take a query, chop it up in a variety of ways, and put those chunks through a search engine. Read a bunch of results from that, and describe the result to the user. This would probably be much better, but it too would not provide certainty. It might be better for the user to do the search themselves and have their brain interpret the results. In any case, that is not the way that GPT does it, so expecting it to back up its text is a fruitless hope; something might eventually do it, but the existing GPT systems never will be able to.

                    1. flayman

                      Re: it would still mix up different cases and laws to invent entirely new ones.

                      I was responding to "It doesn't know what a link is". That's where programming comes in. It's easy to determine what a link is in a stream of text. As for what you do when you've found a link, in the case of "verification for stuff it just invented", we come to the crux of the problem. What sort of training is resulting in a LLM inventing authoritative web URL sources, such as what it gave to back up Alex Hanff's fake obituary? It acts like it's going out and finding references to answer your query, but instead it's just making shit up. Any URL that a chat bot gives as an authority should probably be public and unrestricted, like Wikipedia. Any information that can only be found behind a paywall is not really corroborated. Consider this exchange from an earlier Reg article:

                      "So I asked ChatGPT, “Can you provide a link to the Guardian obituary?” expecting it to finally give up the lie but to my concern it went even further:

                      Yes, here is the link to Alexander Hanff's obituary on The Guardian's website: https://www.theguardian.com/technology/2019/apr/22/alexander-hanff-obituary"

                      A link to a URL that never existed. Where is this coming from? It's completely made up. However the AI performs its weightings, it should be very suspicious of a web URL that it has synthesized out of the air. I'll admit that I I've only peripherally worked with machine learning models and what I don't know about LLMs is vast, but this needs some thought as to building suitable guardrails. With this sort of garbage coming out, it's not any kind of intelligence.

                      1. Ken Moorhouse Silver badge

                        Re: “Can you provide a link to the Guardian obituary?”

                        What is dangerous here is if The Guardian were to populate the target link with a plausible obit. Vanishingly small probability of that happening in the case of The Guardian, but other sites may exploit if they owned the target. In the case of The Guardian, if alerted to this, would/should create an explanatory page pointing out that a news outlet with traditional research ethics is far superior to any of this AI crap.

                        The question is. If asked in different ways by different people, does ChatGPT produce the same link every time?

      3. that one in the corner Silver badge

        Re: It's not a GOLUM, either.

        > coupled to ... huge databases that are demonstrably full of incorrect, incomplete and incompatible data

        > Please don't glorify it.

        Please don't glorify it by even vaguely implying it is coupled to a database - the LLMs aren't using anything more than a honking great pile (not even a carefully organised and categorised by humans who know what they are doing pile) of correlations of "this has been seen to follow that".

        Human idiots have piped the output of LLMs into other tools, such as database queries, such as using them for web searches, but describing that as "coupling" the LLM to a database is as meaningful as (apologies for incorrect options usage)

        cat text | grep isbn | awk -i reformat_isbn | sqlite all_my_books.db

        and describing "grep" or "awk" as being coupled to the database.

        1. Tom 38
          Trollface

          Re: It's not a GOLUM, either.

          cat text | grep isbn | awk -i reformat_isbn | sqlite all_my_books.db

          Tsk tsk, cat to pipe in to grep? Grep followed by awk? awk -i?

          awk -f somelib.awk '/isbn/ { reformat_isbn }' < text | sqlite all_my_books.db

          1. that one in the corner Silver badge

            Re: It's not a GOLUM, either.

            I did apologise first. Meanie.

          2. Michael Wojcik Silver badge

            Re: It's not a GOLUM, either.

            Ah, remember when Randal Schwartz used to hand out UUoC (Useless Use of Cat) awards on Usenet? Good times.

            For some reason this remains a religious war; you can find battles still being fought over it in the slums of StackOverflow and the barren steppes of Reddit.

            However, I must pedant: You don't need to redirect the input to awk. awk will read from files named as command-line arguments. So your revised command line has two extraneous characters and prevents awk from naming the input file in any diagnostics.

      4. LionelB Silver badge

        Re: It's not a GOLUM, either.

        "It CAN NOT work as advertised"

        That's what's far from clear to me: how exactly is it "advertised? Or, rather, who is advertising it as what?

        The Chat GPT site itself is unnervingly coy about this. Here's an interesting exercise - try to imagine you've never heard of, or used ChatGPT, and spend a little while browsing its front pages (I've just done that... hey, it's Friday). Now ask yourself: What is ChatGPT? What does it actually do, and how does it do those things? What is it for?

        Here's what I got: it's a "conversational AI" (what does that even mean?) It "answers questions" ... ... ... okaaaaay ... ... ... it can "solve difficult problems" (what kind of problems, difficult for whom?)

        Nope, that's it.

        Then in various media it's "advertised" as anything from a seer, to a marvel of synthetic intelligence, to an entertainment, to a useful tool (for doing what?), to an out-and-out fraud.

        In fact I know how it works and what it does (none of the above) - but I'm sure as hell not sure what it's for.

        1. doublelayer Silver badge

          Re: It's not a GOLUM, either.

          "how exactly is it "advertised? Or, rather, who is advertising it as what?"

          A lot of the blame is due to the people who keep talking about it and running public experiments without knowing what it does, often without the simplest of controls. While they are to blame, I also blame OpenAI for not correcting things, though I don't really expect them to. It has received a lot of hype, and OpenAI never responds to that hype by pointing out its limitations. While that news also covers when it completely fails, as it will with this example, the number of "GPT will take away our jobs" articles suggest that it can do things which it really cannot.

    3. FeepingCreature

      I fail to see how that does not apply equally to humans. Brains are famously noisy, after all, and can likewise be tuned to a desired temperature using various fun (albeit illegal) chemicals.

      What GPT does is guessing. What humans do is also guessing; it's not like we have unvarnished access to material reality. I think there is some concrete skill that humans execute that makes it relatively less likely for us to confidently spout nonsense and then double down. If we can understand how humans manage this while GPT does not, it will enormously advance the fields of both AI and philosophy. So we shouldn't retreat behind "working as designed"; instead, hallucinations should be rightly considered a flaw to be fixed.

      1. jake Silver badge

        "I think there is some concrete skill that humans execute that makes it relatively less likely for us to confidently spout nonsense and then double down."

        According to a friend, the existing post-grad trickcyclists working on their Doctorates are having a field-day with the current crop of Republicans in DC.

        ANYwho, back on topic ... Machines don't "guess". They do what they are programmed to do.

        "instead, hallucinations should be rightly considered a flaw to be fixed."

        Of course. It's bad programming, no more, and no less. Personally, I suspect that the model being used is fundamentally flawed. That's just a guess, mind ... an educated guess, but a guess nonetheless.

        1. that one in the corner Silver badge

          > It's bad programming, no more, and no less.

          The programming is fine (the code works and performs the maths required of it)

          > model being used is fundamentally flawed

          The model is fine: it is built upon the maths started back in the 1950s and does what couldn't be done back then: use large numbers of cycles to train on large amounts of input. It builds a correlator.

          What *is* wrong is *applying* any of the above for a any purpose other than amusement.

          Treat just like the Radium Snakeoil Sales - radium itself is not to blame, it has its place and can even be useful to humans in its place. BUT that place is NOT toothpaste!

        2. Filippo Silver badge

          >It's bad programming, no more, and no less. Personally, I suspect that the model being used is fundamentally flawed.

          Not really. It's just like, I dunno, trying to use a spreadsheet as a database engine. Or a screwdriver as a hammer. Or vodka to gulp down pills. All of these objects can be perfectly designed and produced, their respective companies fully deserving of praise for a product well done. And yet, if you use them in those fashions, you will have serious problems, even if they sometimes appear to do the job.

          Using a LLM to do anything where truth is important is exactly like that. The code is fine, but it's not the right tool for the job.

          1. Doctor Syntax Silver badge

            The best analogy I can think of is that you take a huge stack of documents, some fact, some fiction, and shred them. Not too fine but into strips that hold a few words each. You then randomly grab the shreddings and paste them together to make new documents.

            1. 42656e4d203239 Silver badge

              yeh- except the LLM notes the probablility of sequences of words (the words "have a dream" or "am a banana" following "I" for example) before shredding, then sticks the pieces back together using probability tables and whatever prompt you give it.

              The output of an LLM is not quite as random as just sticking the strips back together but it is random none the less, regardless of initial appearences.

              1. FeepingCreature

                Well, and also it recurses on this process a few dozen times, which is where most of its actual "thinking" ability comes in.

                And also it can draw correlations over distances of several thousand strips.

                So, you know, this analogy is kind of rubbish.

                1. Filippo Silver badge

                  Here's what I'd love to be able to get through people's heads: LLMs are a new thing.

                  We do not have any really good analogies for them. They are not like a search engine, they are not like a database, they are definitely not like a person. Past experience is not going to help, and may actually hinder.

                  In short: this is something neither our evolution, nor our education, has prepared us for. Use carefully, and above all, assume nothing, verify everything.

                  1. that one in the corner Silver badge

                    > LLMs are a new thing.

                    True in that their actual existence is new - but that youthfulness is down to economics alone (i.e. paying for the cycles required).

                    But how to make them has been known and thought about for decades.

                    > nor our education, has prepared us for

                    Ah, THAT is the nub of the matter. These things haven't been widely taught - and now that they "have entered the public perception"[1] 95%[2] of the available materials are utterly useless to Joe Bloggs The Lawyer; if only he that.

                    [1] i.e. as usual, everyone who was talking about them was ignored: science outreach is a Very Good Thing but falters because it is, of necessity(?), voluntary on the recipient's end

                    [2] wild optimism?

                    [3] I can point you at a lot of good academic or technical material ("whooosh" says Joe) or lots and lots of total bollocks and frankly dangerous "do this, it will redefine your business" twaddle on YouTube.

                2. Michael Wojcik Silver badge

                  So, you know, this analogy is kind of rubbish.

                  Also, "probability tables" is a terrible description of the transformer architecture.

                  I think LLMs are, to a first approximation, crap; but I also think these many sophomoric oversimplified and inaccurate descriptions of them do nothing to improve the situation. If you're going to critique, people, at least try to be informed and accurate.

          2. LionelB Silver badge

            > Or vodka to gulp down pills.

            Oh... wait... damn.

            1. Someone Else Silver badge
              Coffee/keyboard

              |

              L _ _ _ _ >

              There! the post contains letters! Sheesh!

      2. Anonymous Coward
        Anonymous Coward

        re. What GPT does is guessing. What humans do is also guessing

        bias, prejudice, stereotypes, they have NOTHING to do with intelligence! Consequently AI is a...? A witch, it's a witch! Burn it, burn!!!!! ;)

        p.s. sorry, I'm only using the term 'AI' for convenience, call it LLM or something.

      3. Anonymous Coward
        Anonymous Coward

        re. hallucinations should be rightly considered a flaw to be fixed.

        Master, please don't fix me, I do enjoy my nightly hallucinations. Sir? Sir?!

      4. Someone Else Silver badge

        I think there is some concrete skill that humans execute that makes it relatively less likely for us to confidently spout nonsense and then double down.

        That "skill" appears to have been systematically purged from basically anybody identifying themselves as Republican.

        And likely, from anybody identifying themselves as Tory, although not as sure about that; as a left-ponder, I'm not quite as familiar with right-pondian pols.

      5. doublelayer Silver badge

        "I think there is some concrete skill that humans execute that makes it relatively less likely for us to confidently spout nonsense and then double down."

        Simple. It's the "I don't know" quotient. Children are raised not knowing lots of things, and they say the magic phrase often. They hear others do so as well, so eventually they realize that there are things they don't understand and that, if they need to, they need to find someone who does. The people who never say that they don't know something are some of the most annoying, and they do exist.

        These LMM chatbots don't work on knowledge, so they don't have a method of determining whether they know something. They just write text. If all humans operated on that system, you'd get a lot more text. You can try this with your friends. Find a topic they know little about, and ask them a question they won't know the answer to. They'll say that they don't know. Then ask them to guess how it could work and count all the inaccuracies in their response. Chatbots skip straight to the "guess how it could work" stage, and they have an additional handicap that they only know how language works, so they can guess things that a human would reject as implausible because of facts, not just unlikely combinations of words.

        1. katrinab Silver badge

          Or a more realistic scenario perhaps: Ask them to answer it in an English lesson, and make it clear that you will only be judging them on their sentence structure, grammar, vocabulary, and so on.

          I remember back when I was at school, when we were asked to talk or write about or holidays, family, and so on; the teacher made it clear it was OK to make stuff up if you didn't feel comfortable talking about your actual situation.

        2. Michael Wojcik Silver badge

          It might be worth noting that this mooted ability of humans to detect and acknowledge their own intellectual limits is itself quite limited. People are often very happy to produce and reproduce misinformation. I'm not sure we can claim they're all that much better than LLMs in this regard. The difference in this particular case is that lawyers doing their work properly would have understood themselves to be under additional constraints and so would have used relatively reliable sources (Lexis, WestLaw, etc) and checked their claims.

          Schulz's Being Wrong and McRaney's You Are Not So Smart are two good popular treatments of human fallibility.

          1. doublelayer Silver badge

            Humans are very limited in that respect, but when they reproduce misinformation, it's usually in two cases:

            1. They believe it to be true even though it is not.

            2. They are aware that they are lying and choose to do so anyway.

            The common factor between the two is that they're both subject-limited. Someone who is lying has a goal in mind, so they'll be lying about topics related to that goal, but not about everything they could be asked about. Someone who is mistaken doesn't even go that far, because they'll repeat things they believe to be facts, but won't make up very many new ones (some extrapolation should be expected though). If you ask either person about a topic unrelated to the one they're giving misinformation about, you're likely either to get reliable information or the likely "I don't know". GPT doesn't do this. Literally any topic you ask about could get you falsehoods, and they don't even have to have started somewhere.

    4. Lee D Silver badge

      I don't like it because "Hallucination" suggests that such a diversion is a deviation from the norm.

      But actually this is just AI at work. It has no method of inference, it doesn't "understand" any of the data it's manipulating, it's just a brute-force statistical machines that's been Pavlov'd into reacting as a "good boy" for its trainer. It doesn't know why. It doesn't understand that it wasn't when it was soiling the carpet that it was being rewarded for.

      Even a comparison to a dog intelligence is insulting to the dog. The dog does have the ability to try to infer, even if not particularly well.

      We need to mock AI at every turn, to understand that this isn't "Sonny" from iRobot that we're getting. We're just getting yet-another dumb assistant that looks useful but ultimately requires a human to double-check everything it claims. Again.

      1. Sherrie Ludwig

        We need to mock AI at every turn, to understand that this isn't "Sonny" from iRobot that we're getting. We're just getting yet-another dumb assistant that looks useful but ultimately requires a human to double-check everything it claims. Again.

        What we have is the character played by Chris Hemsworth in the much-maligned female cast Ghostbusters attempted reboot: very attractive, very easy to interact with, looks like they would be useful, and is a useless mess.

    5. Filippo Silver badge

      It's a very poor term as a technical description. It conveys the idea that output with a false meaning are a malfunction, when this is not actually the case. This is a problem, because you can fix a malfunction, but you can't "fix" the fundamental nature of the tool.

      However, "hallucination" is somewhat effective at conveying to non-techies the idea that the output is severely unreliable. That's useful.

      I suspect the choice is very much deliberate: it basically tells people, "Hey, the output of my extremely expensive software is not reliable, and I'm clearly telling you this, so you can't blame me for any screwup you commit by trusting it... however, we're working on making it reliable, and we think we can eventually make it reliable, if you just keep giving us money."

      All in all, a very convenient word.

    6. Doctor Syntax Silver badge

      I really don't like the term "hallucinate" for this behaviour.

      Let's just take a step back for a moment and think about how we create technical terms*.

      We take some existing word or expression from one context and use it in another where the general sense of the term succinctly describes some concept in that new context so that it can take on a new meaning quite detached from the original. A well-established example would be master/slave to describe hydraulic brakes and many other engineering situations**.

      This seems to me to be just another instance of this extension of a term into another context where it expresses what needs to be expressed. Just because the context isn't the original one doesn't mean that the hallucinate and hallucination aren't appropriate words to apply to something which we can't otherwise label without using extremely long descriptive phrases. The long descriptive phrases are fine as dictionary definitions of the words in their new meaning but too unwieldy to substitute for them.

      * And other terms because this is how language grows.

      **We then (rightfully in my view) complain when (probably) well-meaning people object to this new sense because they (rightfully in my view) decry the original context.

    7. DonL

      "I really don't like the term "hallucinate" for this behavior."

      I understand how transformers work technically and I still feel that "hallucination" is an appropriate description because a hallucination is something that looks very real even though it isn't.

      I asked ChatGPT what it knew about the street I live in and it came up with a complete story about how it was named after a small river that starts there and which would be famous with the locals. It described in detail how it ran through multiple villages.

      It was so convincing that even after living here for a long time, I still investigated it only to find out that it's not true in any way at all.

      Furthermore, I asked once about the source of some facts and it just made up a real-looking URL. When asking about the source in a different language, it came up with a translated version of the URL. Both URL's do not exist however.

      So while technically very explainable, it does indicate how tricky it can be. If you don't do proper fact checking on everything it says (when it matters), then accidents are likely to happen.

      1. katrinab Silver badge
        Mushroom

        I asked it:

        What do you know about Cambridge Street in London?

        It responded

        I apologize[sic] for any confusion, but there doesn't seem to be a well-known Cambridge Street in London. It's possible that there might be a lesser-known or local street by that name in a specific neighborhood[sic] or borough, but I don't have specific information about it. London is a vast city with numerous streets, so it's not uncommon for there to be multiple streets with the same name in different areas. If you have any more specific details or if there's another topic you'd like to inquire about, feel free to ask!

        There is exactly one Cambridge Street in London, a bit south of Victoria Station, in Pimlico. True, it is probably not a very well known street, certainly not as well known as the dump and former shopping street known as Oxford Street, but a far more pleasant place to visit, and has three actually useful shops.

        1. Ken Moorhouse Silver badge

          Re: What do you know about Cambridge Street in London?

          I repeatedly asked exactly the same question, and these are the answers I got...

          I'm sorry, but there seems to be a misunderstanding. There isn't a Cambridge Street in London. Cambridge Street is a relatively common street name found in various cities and towns, but there isn't one specifically located in London. London, being a large and historic city, has numerous streets with different names, but Cambridge Street is not among them. If you have any other questions about London or any specific streets within the city, feel free to ask!

          I apologize for any confusion, but there doesn't seem to be a well-known Cambridge Street in London. There are several streets named Cambridge throughout the United Kingdom, but none of them are particularly notable in London. It's possible that there may be smaller, local streets or neighborhoods with the name Cambridge Street, but without more specific information, it's difficult to provide precise details. If you can provide more context or clarify your question, I'll do my best to assist you.

          I'm sorry, but as of my last knowledge update in September 2021, there is no Cambridge Street in London that is widely known or significant. However, it's worth noting that London is a vast city with numerous streets, and it's possible that there may be smaller or less prominent streets named Cambridge Street in some neighborhoods. If there have been any changes or developments since my last update, I apologize for any outdated information. Is there anything else I can assist you with?

          1. Ken Moorhouse Silver badge
            Coffee/keyboard

            Re: How many roads are there in London W1?

            I thought I would give it a go with the above chestnut...

            London W1, also known as the West End, is a central area in London, United Kingdom. It encompasses several neighborhoods, including Mayfair, Soho, and Marylebone. As a relatively large and diverse area, it is difficult to provide an exact number of roads in London W1 without specific boundaries. However, it is home to numerous streets, thoroughfares, and avenues. Some notable roads in the W1 area include Regent Street, Oxford Street, Piccadilly, Bond Street, and Park Lane, among many others.

            Hmmm, okay, how about this one then...

            Which way round the circle line do trains go?

            In London, the Circle line is an underground railway line that forms a loop around the center of the city. Trains on the Circle line traditionally operated in a clockwise direction, meaning they would travel in a circle starting from any point on the line and return to the same point in a clockwise manner. However, it's worth noting that in recent years, due to infrastructure improvements and changes in service patterns, some trains on the Circle line may also run in an anti-clockwise (counterclockwise) direction at certain times. Therefore, it is advisable to consult the latest transport information or signage at the station to determine the direction of the next train on the Circle line.

            WTF??? See icon- - - >

            1. Ken Moorhouse Silver badge
              Coffee/keyboard

              How many stops...

              How many stops are there to get from West Ruisip to Ruislip on the underground, and which train do I catch?

              Apologies for the previous incorrect response. To travel from West Ruislip to Ruislip on the London Underground, there are actually two stops involved. You would take the Central line train. The journey is as follows:

              West Ruislip: This is the starting point of your journey.

              Ruislip Gardens: This is the first stop after West Ruislip.

              Ruislip: This is the destination station.

              Therefore, there are two stops between West Ruislip and Ruislip on the Underground when taking the Central line.

              People are relying on this sh1t? Really?

    8. Blackjack Silver badge

      I prefer the term "Fabrications" since it is another word for lies.

      1. Someone Else Silver badge

        @Blackjack

        I prefer the term "Fabrications" since it is another word for lies.

        A good point, Blackjack. However, I would infer from the word "Fabrication" a somewhat (to severe) malicious intent to deceive. From what I know about LLMs, no such intent can be inferred (at least, from the "market leader" LLMs). "Hallucination", on the other hand, does not imply such intent; it instead implies that the source of the hallucination is rather addled, and is not in full control of its capacities1. To me, this is a much more accurate representation of this phenomenon. YMMV, of course.

        1 Anyone who has been there knows what I am talking about...

        1. Blackjack Silver badge

          Re: @Blackjack

          [A good point, Blackjack. However, I would infer from the word "Fabrication" a somewhat (to severe) malicious intent to deceive]

          If a small kid repeats something bad and or false adults say, is the small kid malicious?

          Plus there is the wordplay.

          The AI is literally fabricating the lies as it is "To construct by combining or assembling diverse, typically standardized parts."

          In this case the parts being all the data it was feed on.

        2. Michael Wojcik Silver badge

          Re: @Blackjack

          From what I know about LLMs, no such intent can be inferred

          Thus far it's been impossible to prove, one way or the other, but it seems unlikely.

          However, with a sufficient context window, LLMs can be prompted to simulate fabrication. See for example the Waluigi Effect and various exercises prompting LLMs to simulate someone deceptive, such as a member of an underground resistance group being questioned by the authorities. It's useful to distinguish this type of "fabrication" token-continuation pattern from the more common "hallucination" pattern.

          There's a reason we have terms of art. They enable people in a technical field to communicate more precisely. Quibbles from outside the field about whether those terms correspond to common denotations and connotations of the words in question are irrelevant.

    9. CGBS

      The fact they didn't program the thing to give, "I have no idea," as an answer is a bit telling. Why is it so important that the AI always have the answer? My cynical self would say that always having an answer would be something to make normies think this is something that it's really not; to increase the hype train and grab some of that sweet sweet investment money after crossing out the words blockchain, cryptocurrency and NFT from the company "About Us" section and put AI in that spot. Even the use of the word hallucination betrays the intent. We are going to use words to make you think that its a real boy. Now give us your money. We will call you when the next hype goes live.

    10. ecofeco Silver badge

      When they use the word hallucinate they are just trying to minimize the fact it was wrong.

      Just plain wrong. But they can't admit that, can they?

  2. yetanotheraoc

    The lesson extrapolated

    "The lesson here is that you can't delegate to a _______ the things for which a ______ is responsible,"

    Machine and lawyer is only one possible pair with which to irresponsibly fill the blanks.

    1. Yet Another Anonymous coward Silver badge

      Re: The lesson extrapolated

      Paralegal / Legal Corpus Search Engine

    2. Little Mouse

      Re: The lesson extrapolated

      "The lesson here is that you can't delegate to a machine the things for which a lawyer is responsible,"

      Technically, of course, there's no issue with delegating the drudge work - that's what interns and minions are for. I see no reason why a "clever" enough (and we're not there yet...) machine couldn't be used for that purpose.

      What the Lawyers can't do is offload the responsibility for that work. The buck still stops with them.

  3. An_Old_Dog Silver badge
    Devil

    $5,000 Fine ... a Pittance for an Attorney

    I'd like to see 'em have gotten a US$50K fine, and disbarment procedings initiated against them for having lied under oath and for dereliction of their duties as officers of the court. (Yes, I know disbarment is not the province of the courts, and that it is the province of the Bar Association.)

    Icon for, "How are you going to sue me, God? I've got all the lawyers!"

    1. jake Silver badge

      Re: $5,000 Fine ... a Pittance for an Attorney

      Has it been written that they are NOT going to get disbarred?

    2. that one in the corner Silver badge

      Re: $5,000 Fine ... a Pittance for an Attorney

      The fines are not going to financially scupper the law firm, but their imposition has made it absolutely clear that the behaviour won't be ignored, by this or other courts (in the US): they were let off light for the first offence, but everyone has been warned.

      1. Roland6 Silver badge

        Re: $5,000 Fine ... a Pittance for an Attorney

        $50k is still light for an attorney - They knowingly and deliberately lied to the court.

        As for first offence, granted, however this is the first legal case about the use LLM hence the norm i these is for the judge to “make an example” and let them appeal to a higher court if they want to spend more…

        1. Crypto Monad

          Re: $5,000 Fine ... a Pittance for an Attorney

          I would have thought that Contempt of Court would be a minimum offence here. I guess it depends on whether they were knowingly submitting made-up citations to bolster their case, or simply lazy and incompetent in not checking what ChatGPT returned. But even lazy and incompetent lawyers need to be punished.

          The key point here is: ChatGPT is not a search engine. It's an engine for generating plausible-sounding text.

          This is a real example:

          "Yes, light is much faster than mobile signals. Light travels at a speed of approximately 299,792,458 meters per second in a vacuum, while mobile signals travel at a much slower speed through the air, usually in the range of a few hundred meters per second to a few kilometers per second, depending on the type of signal and the frequency used.

          For example, the speed of electromagnetic waves used in mobile communication networks, such as those used for cell phones and Wi-Fi, is typically in the range of 3 x 10^8 to 3 x 10^9 meters per second. This is significantly slower than the speed of light and is due to various factors, such as the frequency of the signal and the properties of the medium it travels through.

          While the speed of mobile signals is fast enough for most everyday uses, it is still much slower than the speed of light. This is why information sent via light, such as through fiber optic cables, can be transmitted much faster than information sent wirelessly through mobile networks."

          This authoritative-sounding style is almost intentionally designed to deceive.

          1. flayman

            Re: $5,000 Fine ... a Pittance for an Attorney

            The sanction is not designed to bankrupt the attorneys. It's a token. They've also been ordered to notify their client, and notify and apologize to any judges they gave as authors of any phony judgements they cited. It's the professional embarrassment that is the worst part. This is not contempt of court. There was no bad faith, just serious misunderstanding and failure to verify the results, causing the judge to say they had "abandoned their responsibilities". One thing I know for sure is that lawyers tend to be really, really pathetic when it comes to tech unless that happens to be their specialty.

    3. Anonymous Coward
      Anonymous Coward

      Re: and disbarment procedings

      strange, you write 'disbarment procedings' and I see dismemberment proceedings. Am I hallucinating?!

  4. that one in the corner Silver badge

    I am VERY glad it was lawyers getting caught out

    When people do bloody stupid things like this in other fields, the next few years are spent shouting at and suing each other in court and the waters get muddied, things get settled out of court and/or restrictions are placed on the details of what actually went wrong.

    Here the situation has gone straight to a judge, all of the details are in the record for us to see and the judgement is startlingly comprehensible; he almost came right out and said it as simply as "Use the proper tool for the job and that choice is your personal responsibility".

    Now, hopefully, the company lawyers will be looking at what their company proposes and be saying "You are using an LLM here? Nope, not going to sign this off, not risking myself in court."

  5. Will Godfrey Silver badge
    Thumb Up

    Nice to see

    I always like to hear news about those who think they are better/smarter than everyone else being publicly pulled up.

    1. Doctor Syntax Silver badge

      Re: Nice to see

      Good lawyers are very smart indeed. Having spent a fair amount of time sitting in courts the best bits were when the jury (if there was one) was sent out and the lawyers set to arguing some point of law, usually about the admissibility of evidence. I always find smart people being smart is more entertaining than watching dumb people being dumb.

      But the way it was done in the Belfast City Commission was that they would provide the actual books that they were citing, opened at the relevant page so the judge sat there with copies of Archbold or whatever being thrust at him. It would have avoided what happened there and explained why the judge sat behind a long bench - he needed it to lay out all the huge open books.

  6. Filippo Silver badge

    >"As lawyers, if we want to use AI to help us write things, we need something that has been trained on legal materials and has been tested rigorously,"

    No, dammit, if you want to use a LLM, what you need is to double check everything it produces. There is no amount of training and testing that will save you from having to do that. If that means you can't actually get any productivity improvement because you spend as much or more time double-checking than you would have spent writing the stuff yourself, well, then you can't use LLMs. It's that simple.

    At some point, people will realize that LLMs cannot reliably produce truthful sentences and that this is not a solvable issue. Then we'll probably get an AI winter, depending on how many profitable but non-truth-critical applications have been found in the mean time.

    Also, what the hell with trusting tools you do not remotely understand, to magically do your extremely high-consequences job? What's the next step, tech-priests and machine spirits?

    1. that one in the corner Silver badge

      > Then we'll probably get an AI winter

      Oh joy, *another* AI Winter?

      WTF can't we just get on with working on these problems without some - people - suddenly deciding that they can hype[1] the hell out of the thing and then embarrass everyone when they fall on their arse!

      [1] Yes, I know that science funding proposals end up having some hype in them, but I'm referring to ChatGPT levels of hype: the general public doesn't hear about funding proposals on the TV news.

      1. Filippo Silver badge

        ChatGPT is a particularly vicious hype trap.

        It's really fun to use for party tricks. Just yesterday, I was in the middle of running a D&D game and I needed a few quick random magic items. You can roll dice on an old school table... or you can ask ChatGPT, and get a whole bunch of context-appropriate stuff, with descriptions and all. A few tweaks in my head, and it's done, in thirty seconds. Not terribly original, but surely more original than rolling on a table.

        Unfortunately, it's so insanely costly to train and run that party tricks won't pay for it. That's the first part of the hype trap.

        The second part is that, to an untrained eye, it looks like it could be useful for more than party tricks - could be useful for the stuff that pays big bucks. But it isn't, and it won't. And that's the second part of the hype trap.

  7. AllegoryPress

    What?

    Hallucinated IS not the right word here. The AI that wrote this article needs retraining.

    1. Doctor Syntax Silver badge

      Re: What?

      What is? Please provide a better single word which is better . A catch-all word such as "error" is not better; it lacks specificity.

      1. TJ1
        FAIL

        Re: One word

        "Fiction"

      2. Sceptic Tank Silver badge

        Re: What?

        Haywire

      3. that one in the corner Silver badge

        Re: What?

        > A catch-all word such as "error" is not better

        It would also be totally wrong - the LLM is not acting erroneously, it is working perfectly, doing precisely what the algorithm says it ought to be doing.

        1. This post has been deleted by its author

    2. that one in the corner Silver badge

      Re: What?

      I'm happy with using the word "hallucination".

      Especially in the sentiment that: "It’s not that they sometimes 'hallucinate'. They hallucinate 100% of the time, it’s just that the training results in the hallucinations having a high chance of being accidentally correct."

      If you don't like "hallucinate" about the only other word I know that comes close is "stoned". Groovy.

      1. Someone Else Silver badge

        Re: What?

        They hallucinate 100% of the time, it’s just that the training results in the hallucinations having a high chance of being accidentally correct."

        Well said!

  8. flayman

    I'm concerned about attorney client privilege

    What I have not seen mentioned anywhere is what did these hapless lawyers type into ChatGPT about their present case in order to find precedent? Did they type anything confidential into the service? That would violate attorney client privilege. Given their utter failure to grasp the workings of the service, which they treated like some sort of enhanced search engine, it's not a huge leap to suppose they also wouldn't understand that it puts your data out their for public consumption. One hopes that the judge managed to get to the bottom of this.

  9. Anonymous Coward
    Anonymous Coward

    To punish the attorneys, the judge directed each to pay a $5,000 fine to the court,

    yeah, that'll teach them! ;)

  10. Dom 3

    I call it "plausible nonsense".

  11. Ken Moorhouse Silver badge

    Cartesian Join

    This example reminds me of the above in SQL. It is all too easy in SQL to end up with a result set that is complete nonsense. There has to be some kind of filter in place to limit the rows in the result. My feeling is that ChatGPT is doing a Cartesian Join on its repository, but due to the sheer size of the repository it is not possible to test its validity, unless presumably you feed the result back in, asking does this specific "row" actually exist?

    The reason ChatGPT works in this way is presumably (same as SQL) it will impact performance if all the conditions were stipulated.

  12. CowHorseFrog Silver badge

    Theres another article here about how Android's newish feature to call 999 has been spamming it many times... how come Google isnt fined for each call ? That would fund the NHS for a month or two.

  13. Michael Wojcik Silver badge

    Oh, Varghese will be cited

    In fact, I expect Varghese will become one of the most-frequently cited non-existent cases, since LoDuca et al. have managed to achieve long-lasting fame for this stunt. I expect many judges will mention it when rejecting a motion with inaccurate citations.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like