back to article Top LLMs struggle to make accurate legal arguments

If you think generative AI has an automatic seat at the table in the world of law, think again. Top large language models tend to generate inaccurate legal information and should not be relied upon for litigation, fresh research has shown. Last year, when OpenAI showed GPT-4 was capable of passing the Bar Exam, it was …

  1. elsergiovolador Silver badge

    Reason

    The AI simply recycles biases found in the training material.

    Whereas intelligent person can most of the time tell they are being served BS. AI will just repeat it, won't challenge it.

    AI may find holes in the legislation if you nudge it towards it. But it's a bit like leading a horse to the water.

    1. doublelayer Silver badge

      Re: Reason

      It's not even as simple as the AI being fed garbage data and not filtering it out, but sometimes the AI being fed applicable data and not being able to determine when it is applicable and when it is not. Admittedly, I've seen humans fail that test as well, but they're usually a bit better at it. For example, a person does a search for a legal issue and gets results that describe, accurately, the process for dealing with that issue in a place they're not in. The location where it applies will be written in that article, and most people will find that out and try to find another article. Language models will probably fail to correlate that mention of the location with all the words further along in the article and, if what it says is common enough, give it to anyone who asks about the issue, even if that person specifically mentioned a different location. It got correct data and nonetheless generates garbage. That is what an LLM does, and the sooner people realize that, the fewer idiots they will make of themselves.

      1. MiguelC Silver badge

        Re: Reason

        Most importantly, even if AI is fed correct and applicable data it might spew hallucinations - as stated in the article. This may even be an intrinsic protection against copyright theft accusations (not that it's working, as we've seen in cases where AI delivered textually identical content to what it was fed, like in the NY Times case)

      2. veti Silver badge

        Re: Reason

        Sure language models will fail in that way, because they've never had even the rudiments of legal instruction. They know the word "jurisdiction", but they have no inkling of why it might be relevant to their current task.

        In a word, they're being given tasks they're neither trained nor designed to do.

        I wonder how hard it is to give them that training. Maybe the results in six months time will be different.

        1. doublelayer Silver badge

          Re: Reason

          Their training data will include lots of sites that explain what jurisdiction is. It's a basic concept. It will include plenty of pirated law textbooks as well. If this was a magical brain bot that could do what a human can do but at a much higher scale, it would be able to get an accurate picture of legal reality, store all the tiny legal details, and put them together. It would only be wrong if some major change was made after training, for example a law being passed which invalidates stuff but it wasn't told about that new law.

          We don't have a magical brain bot. We have a large language model which takes a bunch of plausible guesses and stitches them together. It doesn't matter what you put in, because the way the bot works is guaranteed to make something up. If you put less in, those errors look like it had some kind of stroke and can't manage language. If you put more in, it looks like it's clueless about facts but the grammar looks acceptable. Either way, it's intrinsic to the way they are built.

  2. Mike 137 Silver badge

    "they don't understand law and can't form sound arguments"

    Substitute 'anything' for 'law' and that will sum up the reality.

    The anthropomorphic terminology that its proponents use in relation to AI in general (not just LLMs) -- 'understand', 'hallucinate' etc. -- belies that objective fact that these machines do not think at all. They have no correlates of human cognitive processes, they're just context-driven Markov chain generators drawing from frequency-weighted repositories of tokens. So any concept of 'meaning' is irrelevant, and apparently meaningful (let alone accurate) output is fortuitous.

    1. katrinab Silver badge

      Re: "they don't understand law and can't form sound arguments"

      The one thing that is more specific to law than to other areas though is that it can be changed.

      You could have millions of cases and other reputable legal texts that genuinely support one particular argument, but if your parliament or equivalent passed a new law last week that contradicts it, then that takes precedent.

    2. oliversalmon

      Re: "they don't understand law and can't form sound arguments"

      Absolutely! There is no AI or ML, just statistics and probability.

    3. veti Silver badge

      Re: "they don't understand law and can't form sound arguments"

      Define "think".

      What makes you "think" human cognitive processes are qualitatively any different?

      What difference does it make anyway?

      1. Lyndication

        Re: "they don't understand law and can't form sound arguments"

        A human can explain their reasoning, typically. GPT et al typically can't.

        A human can also undertake abstract thought and create new ideas, while an ML algorithm tends towards the most common trends in its data set by design.

      2. Peter Gathercole Silver badge

        Re: "they don't understand law and can't form sound arguments"

        That's an interesting question.

        As far as I am aware, the workings of even the most simple organic brain have not been modelled fully. Oh, I know that they've created neuron maps of fruit flies, but it still seems that we still don't know how these systems work to control the behaviour of the flies.

        Neurons are not like computer decision making which is mostly binary at the most simple level, and the best we can do to simulate neurons is to apply some type of statistical bias to make them act mopre like simple brains.

        So if we can't do a fruit fly, which has a few thousand neurons and in the order of a few hundred thousand synaptic sites (effectively connections between neurons), we don't have a chance in hell of really understanding complex brains, at least not yet. We've come a long way, but we're not there yet,

        Of course, AI specialists will say that they are not trying to model the human brain, but it's behaviour in various contexts. But I contend that not being able to understand how non-binary decision making is performed in a real brain, any AI model will just be over simplistic that may be able to simulate some types of decision making, but cannot take all of the factors that impact human thought processes.

        For example off the top of my head, if a real person suffers serious embarrassment in their past due to a decision they made, future decisions in a similar area may appear more irrational, and it may prove to be a difficult task to model that emotional effect on their thinking. Human thought is not always rational, and this is not just as a personal level, but you have have irrational group think happening all the time (you only have to look at social media to see this!).

        For creative tasks, this irrationality may be the key to success in humans, which is why AI produced media seems so derivative compared to that created by people. Building in irrationallity into AI models to try to simulate irrational, or maybe out-of-the-box thinking is unlikely to be helpful for legal pleadings (or maybe it is in some cases, I'm not a lawyer). So we need different types of decision making for different types of problems. We see this in people (afer all, not everybody can understand a reasoned argument, let alone create one), so why should a general purpose AI model be able to do this.

  3. cyberdemon Silver badge
    Coffee/keyboard

    > Top large language models struggle to make accurate ...

    Hold the Front Page! Statistical bullshit generator fails to generate accurate bullshit ...

    On other pages: Grizzly bear fails to use public conveniences.. Pope Francis declines to attend wiccan nude mud-wrestling festival at solstice..

    1. Rich 11 Silver badge

      Re: > Top large language models struggle to make accurate ...

      Pope Francis declines to attend wiccan nude mud-wrestling festival at solstice

      Can I have his ticket if he's not going to use it?

  4. Anonymous Coward
    Anonymous Coward

    The issue I'd have with this research is that it's based on GPT-3.5, when it was GPT4 that demonstrated the capability to pass the bar exam.

    Of course the version of GPT that failed the bar exam is bad at legal shiz, GPT3.5 has been shown multiple times to be incapable of regularly performing well enough to pass the exam. scoring in the bottom 10% of participants: https://openai.com/research/gpt-4

    Would be far more interested in the results from GPT4, which OpenAI claim scores in the top 10% of test takers.

    1. Anonymous Coward
      Anonymous Coward

      Great, the bullshit from GPT-4 is even more plausible.....

      Doesn't change any of the arguments above, this is still just a markov thingy with slightly better probabilities. It's no closer to "understanding" than a brick is.

      Who are you trying to kid? Us or yourself?

  5. Sceptic Tank Silver badge
    Black Helicopters

    I put it to you

    Interesting, this. I watched a youtube video recently where some amateur Go player managed to defeat an AI bot that had defeated the world's top Go player simply by exploiting the fact that the bot didn't know or care that it was playing Go. So if it's Matlock vs. LA Law AI Law, all Matlock needs to do is to figure out what the exploitable flaws are in the bot that the lazy legal eagles from LA Law are using and it should be possible to out-law them in every case.

    1. veti Silver badge

      Re: I put it to you

      I imagine some variants on that story will probably happen, and some lawyers will do very well that way for a while, until the opposition wises up.

      But it's an eternal arms race. And in the long run, my money is on the side that improves indefinitely over time, unlike the individual lawyer.

  6. Jason Bloomberg Silver badge

    Snake Oil doesn't work

    I thought we all knew that already.

    It is mightily impressive they have managed to get a computer to emulate the bullshit artist from every pub but I don't see how anyone thought that would be useful, or is surprised that it isn't.

    1. Ian Johnston Silver badge

      Re: Snake Oil doesn't work

      It is mightily impressive they have managed to get a computer to emulate the bullshit artist from every pub ...

      This just in: The entire staff of journalists (using the term loosely) at the "Daily Mail" has just received their P45s.

  7. Pete Sdev Bronze badge

    Quite human

    It's an interesting "feature" of LMLMs that instead of answering with "I don't know", they simply reply with made-up bullshit. Similar to (sadly) many people.

    1. Doctor Syntax Silver badge

      Re: Quite human

      Similar to (sadly) many people search engines.

      So a LLM made by a search engine corporation can be expected to be particularly bad.

      1. Pete Sdev Bronze badge

        Re: Quite human

        Hmm, I generally find what I'm looking for in the first 3-4 of the results with search engines. Sometimes I'll need to qualify my search query, e.g. a piece of software that shares a name with a more common item.

        Are you still using Altavista dear doctor? ;-)

        Ironically I find the hallucinations the most 'AI' aspect of LMLMs.

        1. Roland6 Silver badge

          Re: Quite human

          > I generally find what I'm looking for in the first 3-4 of the results with search engines.

          Actual traditional search results yes, however, I’ve yet to find any of the AI generated results they are now filling the first screen with to be of any help whatsoever.

      2. doublelayer Silver badge

        Re: Quite human

        Not in my experience. When I search for something, I may get something useful, I may get nothing related to what I want, or I might get things that are related but aren't helpful. What I rarely see is something that looks like it's helpful but is actually complete gibberish. For example, I was looking for a source for firmware update files for hardware whose manufacturer does not properly organize and present them, and I got firmware files for a similarly-named but otherwise completely different product. I did not get an essay summarizing firmware updates that didn't exist. I much prefer the former because I can quickly identify that the hardware I'm looking at is not for radio stations, so I don't want that result.

  8. Doctor Syntax Silver badge

    I'd expect this to be an application at which ML would be particularly bad.

    From my experience of listening to legal arguments in court, usually about whether something is admissible as evidence, they seem to hinge on decisions made in a given set of circumstances and how the current, novel set of circumstances, can be considered as equivalent or near enough so for the same decision to apply vs whether they're sufficiently different that it doesn't. Apart from a need for logic, judgement and ability to put things persuasively the ML is at an obvious disadvantage in relation to its material. It will have encountered the previous circumstances in the training material but the key adjective above was "novel"; it won't have encountered them before and without having the understanding of both, won't have any means of relating the two. No wonder it serves up some random response.

  9. Forget It
    Joke

    IANAL

    IAAAI

  10. Anonymous Coward
    Anonymous Coward

    what's special about legal eagles?

    "The biggest concern is that AI often fabricates false information" "It shows that AI cannot even retrieve information accurately"

    I don't see anything specific to the legal field there. They seem like fundamental flaws in pretty much any field that relies on accuracy.

    1. doublelayer Silver badge

      Re: what's special about legal eagles?

      There are some reasons to expect that it will be even worse at legal situations than at some others, since it is very easy for historical data to be invalid and for tiny differences in the input situation to make a big difference to the accurate answer. While it's not too accurate at anything, law would be one of the worst things to use it for, probably along with medicine. Some things that involve more rote memorization would be more accurate, though likely not accurate enough to use, which is why LLMs have been so useful at cheating at basic schoolwork where you have to learn the same basics that everyone else learned in order to properly manage the advanced stuff you'll learn later.

      1. This post has been deleted by its author

  11. OculusMentis
    Pirate

    Who’s actually talking gibberish here?

    Law should be logical and straight. If AI cannot make top legal arguments from current convoluted laws, it’s because many laws (especially tax laws) are neither logical nor straight but instead illogical gibberish produced by a self-serving caste.

    1. doublelayer Silver badge

      Re: Who’s actually talking gibberish here?

      Laws are not easily understood, but even if they were, LLMs wouldn't handle them properly because they are LLMs and they can't handle anything particularly detailed. An LLM builds its model by getting a lot of reinforcement, which means if there is a small detail that's important in your situation but not in others' situations, then it's likely to have seen many more sources that don't bother with the detail and try to treat you accordingly. Laws, meanwhile, are a collection of details that apply in some situation and not in others, making them a poor thing to use an LLM to understand.

  12. Snowy Silver badge
    Joke

    Study finds

    Ai lacks intelligence.

    1. Ian Johnston Silver badge

      Re: Study finds

      So, just "A", then?

  13. Anonymous Coward
    Anonymous Coward

    uncertainty about hidden goals

    These has been so much hype about LLMs that I now suspect some is done to hide certain major flaws. And that there is a less than innocent goal behind all this: to actually attack certain values and goals of the population. Put fear into people and make them afraid their future is shaky.

    There are parties, entities, out to weaken economies. My gut says they have a hand in this.

    I am in AI and see that some important things are being withheld about the innards of the chatbot technology. I do not trust some of it, and especially OpenAI. To a lesser extent, also Google and Facebook.

    The OpenAI API in essence asks one to take it on faith that it works well. But they hide so much inner design detail - of course for proprietary reasons.

    But what if we become dependent on the technology, and then they decide to use their control over it to do harmful things, restrict those who came to depend on it? Look at the Google monopoly for example, or Apple's closed garden.

  14. Ken Moorhouse Silver badge

    So long as nobody has the [not so] bright idea...

    So long as nobody has the bright idea of utlising AI databases as a primary repository for case law.

  15. Anonymous Coward
    Anonymous Coward

    Am I missing something?

    Aren't all LLM "taught" using available text - that includes (available?) fictional text.

  16. Lee D Silver badge

    Gosh, do you mean that the "AI" does not have any inference or insight into the actual meaning of the data and merely regurgitates like the statistical machine that it is?

    These things are fancy Bayesian filters, nothing more.

    There's a concept of superstition - that if you were wearing your lucky socks when your team won, that wearing your lucky socks MAKES your team win.

    Current "AI" is quite literally as dumb as that. We trained it on data and it got small "successes" when it did random things, so it thinks those random things must be what caused its success without any insight as to the mechanism of how or why.

    And then when you feed it new data, it runs off with those superstitions and "thinks" (ha!) that they are concrete determinations of what you need from it. And that's when you get what AI people are calling "hallucinations". It's not an hallucination - it's a superstition, and it's precisely as dumb as thinking that your date will be a good one because the guy is a Taurus or Capricorn.

    And this isn't intelligence in any form.

  17. Alan Bourke

    Of course it can't

    since all it's doing is generating patterns based on previous examples of patters, with no actual intelligence.

  18. HandleBaz

    AI Wizardry

    Oh no.

    AI can't do a difficult research task. One that most humans can't do.

    It's still usefull though. Say if you can't be arsked to actually write your own El Reg comments.

    The prose is a bit formal, but it'll do.

    "Not surprised that AI falls short in actual legal research; the intricate nuances and evolving nature of law demand human expertise and contextual understanding that machines struggle to replicate."

    "Oh, what a groundbreaking revelation! AI struggling with genuine legal research? Who would have thought that the intricate dance of statutes, precedents, and human interpretation would be a tad too much for our binary buddies? "

    Making ChatGPT sass itself is pretty amusing.

  19. Stuart Castle Silver badge

    I was listening to a lawyer talk about the AI passing a bar exam. He said the problem was that while it is hard to pass a bar exam, apparently, if you have a large enough selection of previous exams, you find they don't change much. So, a reasonably good AI, given access to a decent pool of previous exams could easily pass the current exam, because all the questions in it have probably been answered in one or more old exams.

    Law in practice is different. Cases may have similarities to other cases, but they won't be precisely the same, and while I am no lawyer, in my Computer Science degree, we did a module on Contracts law. I've read through the notes of enough cases to know that even if two cases appear to be the same on the surface, they may not be when you dig under the surface.

  20. Bebu Silver badge
    Windows

    Sounds likeTrump's legal team...

    《....struggle to make accurate legal arguments [..] can't cite cases, fully grok the law, or reason about it effectively》

    Or Musk's diy opinions on the (in)applicability of various laws to his sorry case.

  21. Necrohamster Silver badge
    Headmaster

    You think you'll save time, but you won't.

    My personal experience with attempting to use AI to write legal papers is that it leverages existing flawed data found on blogs, forums and papers written by law students.

    You want citations? They're going to be wrong, and you're going to have to verify them manually.

    You want up-to-date-information? Nope. You're getting stuff from a decade ago.

    You want the full title of a bill or act? You're getting an invented name with an incorrect year on the end.

    Everything the AI spews out will need to be fact-checked. You can't trust it whether you're writing a legal brief, or an essay for your law degree (attention law students: TurnitIn is pretty good at detecting AI-written guff...don't do it).

    So my advice would be to use that time you would have used checking your AI's work to just write the document yourself.

  22. low_resolution_foxxes

    I am slightly suspicious that the majority of these articles focus on how AI cannot replace a $1000 per hour lawyer. There are relatively few articles about how engineers or doctors can be replaced. The legal industry has not always proved to be exceptional value.

    I have personally surmised that lawyers are terrified that their services will become 'free' and this is part of a backlash PR effort to minimise this.

    Where I have a vague understanding of law/compliance, I have personally found that openAI brings up perfectly reasonable responses to most basic queries (and many complicated ones) that I have tested it with. You have to take that with a pinch of salt and understand the limitations, but TBH they are a perfectly good starting point.

    I occasionally have to dabble with compliance topics, and while I haven't trusted it with a full response, it has essentially agreed with every opinion that has taken me 8+ hours of Google searching to determine the wider picture, but it does it within seconds.

  23. Ken G Silver badge

    Ready for a career in politics

    "AI can't... fully grok the law, or reason about it effectively,"

  24. Ian Johnston Silver badge

    I am shocked, shocked, to learn that a combination of a web search engine and an autocomplete allowed to ramble on for paragraphs might not be a wholly accurate source of information.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like