back to article AI models hallucinate, and doctors are OK with that

The tendency of AI models to hallucinate – aka confidently making stuff up – isn't sufficient to disqualify them from use in healthcare settings. So, researchers have set out to enumerate the risks and formulate a plan to do no harm while still allowing medical professionals to consult with unreliable software assistants. No …

  1. Mentat74
    Thumb Down

    "Do no harm"...

    Unless it saves money apparently...

    1. ecofeco Silver badge
      Joke

      Re: "Do no harm"...

      You sound like some damn commie carin' 'bout other people. You ain't a commie, are ya?!

      1. Anonymous Coward
        Anonymous Coward

        Re: "Do no harm"...

        Unless you're one of those Gulag loving, crack down on 'em hard commies. Gulag lovin' commies are our kinda commies.

      2. Benegesserict Cumbersomberbatch Silver badge

        Re: "Do no harm"...

        [I swear] ... to hold my teacher in this art equal to my own parents; to make him partner in my livelihood; when he is in need of money to share mine with him; to consider his family as my own brothers, and to teach them this art, if they want to learn it, without fee or indenture; to impart precept, oral instruction, and all other instruction to my own sons, the sons of my teacher, and to indentured pupils who have taken the Healer's oath, but to nobody else.

        Hippocratic Oath

    2. Anonymous Coward
      Anonymous Coward

      Re: "Do no harm"...

      The thing the snap reaction above ignores is that doctors frequently confidently get things wrong, too.

      I’ve had three seriously bad medical decisions made about me by consultants in my time, one of them life-changing. In each case, when I presented the information I gave to the doctor to ChatGPT recently, it suggested the right thing to do.

      Would I like to be diagnosed purely by an LLM? No - no doubt it would have hallucinated in other cases. Do I hope doctors use them routinely in future as an immediate second opinion to challenge their decisions? Hell yes. Harm reduced, and despite additional time requiredments would likely be cheaper too as medical errors are costly all round,

      My father, a retired consultant, agrees. As does my mother, a retired head of clinical audit for an NHS trust. It’s all about how they’re implemented.

      1. Anonymous Coward
        Anonymous Coward

        Re: "Do no harm"...

        Almost all academics are confidently wrong sometimes. Its part and parcel of being an academic. There are checks and balances that exist to catch these things. Like peer review.

        1. Francis Boyle

          Re: "Do no harm"...

          FTFY: All humans are confidently wrong sometimes. Its part and parcel of being human. See also Dunning Kruger.

  2. blu3b3rry

    Would you trust Clippypilot to diagnose your ailments?

    "It looks like your patent has six legs, would you like help with that?"

    1. Dante Alighieri
      Facepalm

      Re: Would you trust Clippypilot to diagnose your ailments?

      USPO has verified this.

  3. Sorry that handle is already taken. Silver badge
    Stop

    On Hallucinations

    Generative AI models are just complicated mathematical predictive models. Calling unwanted outputs "hallucinations" is an attempt to deflect (from the fact that "hallucinations" constitute 100% of their output) and to anthropomorphise them (so that people believe they're capable of "understanding" the idea of "truth" in the first place).

    1. Anonymous Coward
      Anonymous Coward

      Re: On Hallucinations

      "Calling unwanted outputs "hallucinations" is an attempt to deflect ... and to anthropomorphise them"

      I find this rabid objection to the use of metaphors silly and foolish. The practice also has a long and bloody history in religious intolerance and censorship.

      Metaphors are the prime way language incorporates new notions, ideas, and objects.

      Refusing to call this behavior from "AI models" (two metaphors) "hallucinating" is as silly as refusing to call a "programmable electronic calculator" (three metaphors) a "computer" because a "computer" used to be a woman doing computations and a "programmable electronic calculator" is not even human, let alone a woman.

      "Hallucinations" perfectly describe the experience users have about these erroneous outputs. Users have absolute no interest in politico-philosophical hair splitting by people who object to the use of machine learning in practical life.

      Btw, humans have anthropomorphized tools and objects in general since the origins of the species (and maybe before). Anthropomorphizing tools is not a reason to dismiss people's opinions or arguments. I think that those who were unable to anthropomorphise were also unable to use these tools effectively and became extinct as a consequence.

      1. Jonathan Richards 1 Silver badge

        Re: On Hallucinations

        I have sympathies with both your arguments. If I can speculate, the OP is objecting to using metaphors that encourage us to accept (without thinking about it) that LLMs can think, reason, self-reflect on their thinking and reasoning, and generally behave like a human intelligence. They don't, can't, and, in my opinion cannot be made to do so. The underlying mechanism of an LLM, i.e. statistical forecast, is so different from how a human intelligence works[1] that we must guard against the mind-set that they're 'intelligent just like us'.

        P.S. edit: re. "humans have anthropomorphized tools and objects in general". I don't think so, and certainly not in the way that LLMs are being identified with human attributes. Nobody sympathises with a hammer because it's having its head banged on a rock, nobody thinks that wheels are happy simply to turn on their axles. I'd like an instance of your statement, if you can give me one.

        [1] This remark is confidently made, in the knowledge that I don't really know how human intelligence works. We don't understand our memory or acquisition of biases, we have to force ourselves really hard to think absolutely logically (syllogisms, etc.) and so on. Maybe I'm hallucinating.

        1. Simon Harris

          Re: On Hallucinations

          I would suggest anthropomorphising tools and objects is actually so ingrained into language that we don't even notice we're doing it a lot of the time.

          An example that springs to mind is how often people refer to a piece of equipment with an intermittent fault as 'a bit temperamental'.

          Also, of course, the maritime tradition of assigning a female gender to ships.

          1. razza de azzer

            Re: On Hallucinations

            We have a 150 iq in the house.

        2. Anonymous Coward
          Anonymous Coward

          Re: On anthropomorphization

          "I don't think so, and certainly not in the way that LLMs are being identified with human attributes."

          So you are one of those rare wonderful humans who have not cursed and yelled at their computer. Or their car.

          I must admit that I do not have that level of restrain.

          1. Jonathan Richards 1 Silver badge

            Re: On anthropomorphization

            I've cursed and yelled at the way a computer reacts, by doing just what it has been told to do instead of what I want it to do, but always in the certain knowledge that a human has (mis-)programmed it to be that way.

            An earlier point made about giving ships (vessels and vehicles in more general terms, too) a feminine gender is well taken; we can gain an affection for what is an engineered item keeping us afloat in a bitter and unforgiving ocean, but I still don't think that counts as *anthropomorphism*. Maybe I'm being too picky - I think anthropomorphism is attributing humanity to something non-human. Peppa Pig is anthropomorphic. Paddington Bear, and Peter Rabbit, too. The idea that we should consider ChatGPT and its ilk to be in any way human, or attuned to humanity, is mildly repulsive.

          2. SuperGeek

            Re: On anthropomorphization

            "So you are one of those rare wonderful humans who have not cursed and yelled at their computer. Or their car."

            If YOU don't START, I'm going to give you a DAMN good THRASHING!!!!

            No, me neither! Oh, maybe a bit ;)

        3. Helcat Silver badge

          Re: On Hallucinations

          " I don't really know how human intelligence works"

          The field is called Cognitive Psychology. It was part of the course I did on Artificial Intelligence way back when I was studying for my BSc. It's quite a complex topic, and a fascinating one.

          It does explain why we anthropomorphize things around us: It's part of how we understand the world, and how we link things together and build associations to better predict outcomes from something we've never encountered before.

        4. SCP

          Re: On Hallucinations

          I'd like an instance of your statement, if you can give me one.

          My trusty hammer has never let me down (and it's not called Mjölnir).

      2. Sorry that handle is already taken. Silver badge

        Re: On Hallucinations

        No you've missed the point. Labelling only "bad" outputs of, say, an LLM as "hallucinations" deflects from the fact that all outputs of an LLM are equal. They're all hallucinations. It's not a useful, or even constructive, distinction.

        1. razza de azzer

          Re: On Hallucinations

          Take that back. Bog standard adhd-er

      3. Benegesserict Cumbersomberbatch Silver badge

        Re: On Hallucinations

        By definition an hallucination is a disorder of perception.

        Being incapable of perception, what AI models suffer from is analogous to a formal thought disorder, most commonly seen in humans as a psychosis.

        Headlines including the term "Psychotic AI model" don't do as much for the share price.

    2. O'Reg Inalsin

      Re: On Hallucinations

      "Life of Earth", from the first single celled lifeforms that evolved to face the sun for sustenance, is itself comprised of "complicated mathematical predictive models" wonderfully encoded in DNA, where the "unwanted" outputs (behaviors) are pruned from the DNA tree of life by evolution. Therefore humans also falls in that category of things "comprised of complicated mathematical predictive models", but based in DNA, and on top of that, we have evolved a communal cherry of "customized intelligence via lifelong learning experiences", including natural language ability through speech and writing.

      That's hugely different from chatbot software, even though chatbot software is also "comprised of complicated mathematical predictive models", However chatbot software is not based on DNA, it is JUST based on software algorithms and computer hardware. And that is a huge JUST, because it means chatbot software is not sustainable life. Chatbot software is JUST a TOOL. Although that tool does display some features in common with the human cherry of "customized intelligence", it is most definitely not self-sustainable life on earth.

      We must not forget out roots of being DNA life on earth, must not forget that we are keepers of a treasured DNA pool enabling sustainable life on earth, and we should not be overly confident of our own rational intelligence having powers above and beyond "complicated mathematical predictive models". Testing and developing such models is the entire goal of science, isn't it?

      Of course human life is not only about science.

    3. ecofeco Silver badge
      Facepalm

      Re: On Hallucinations

      Problem at hand: AI could kill people and there are no real precautions in place

      OP: Let's split hairs over semantics

      1. GloriousVictoryForThePeople

        Re: On Hallucinations

        "...there are no real precautions in place"

        Wait, we are certain it is going to reduce wasteful clinical spending

    4. Anonymous Coward
      Anonymous Coward

      Re: On Hallucinations

      Interesting back-and-forth, point-and-counterpoint, etc ...

      Just came across a paper by Hicks, Humphries, and Slater, openAccess published in Ethics and Information Technology (2024), where they argue similarly that rather than "hallucinations", or even "lying", one should refer to LLM outputs using terminology of the form: ChatGPT is bullshit.

      Two quote from there (that refer back to an Edwards, 2023): "the term hallucination anthropomorphises the LLMs" and "the use of a human psychological term risks anthropomorphising the LLMs".

      --Your friendly librarian.

  4. find users who cut cat tail

    Not surprised

    Every time homo sapiens can lose a skill because they can somehow get away with it, what they do? Jump at opportunity. Without exception.

    Will doctors use machine learning tools to make critical decisions even when the tools are know to be broken? Of course they will if they can. As basically everyone else does.

    No machine uprising will be needed because humanity is just going to give up preemptively.

    1. Anonymous Coward
      Anonymous Coward

      Re: Not surprised

      "Every time homo sapiens can lose a skill because they can somehow get away with it, what they do?"

      I assume you do not create your own fire using stones, sticks and dried moss? That is a skill humans have happily forgotten.

      Plato has this nice dialog against the use of writing (Phaedrus, easy to find). He taught:

      Their trust in writing, produced by external characters which are no part of themselves, will discourage the use of their own memory within them. You have invented an elixir not of memory, but of reminding; and you offer your pupils the appearance of wisdom, not true wisdom, for they will read many things without instruction and will therefore seem to know many things, when they are for the most part ignorant and hard to get along with, since they are not wise, but only appear wise.

      It is worthy to note that Plato's teaching have reached us because someone wrote them down. So this makes this a very good example of why Luddites eventually loose the fight.

      1. Jonathan Richards 1 Silver badge

        Re: Not surprised

        > Plato's teaching have reached us because someone wrote them down

        Which doesn't mean that he was wrong. I've made a living for quite a long life, not by knowing things as Plato did, but having an idea where to find a reminder, as he warned of.

        "they will read many things without instruction and will therefore seem to know many things, when they are for the most part ignorant

        Fantastic! That's the Key User Requirement for Large Language Models, right there.

        1. HuBo Silver badge
          Windows

          Re: Not surprised

          Well, this POV that Plato expresses, through Socrates, is an examplar of (relatively narrow) thesis, to be balanced-out by a (missing from that dialogue, but obvious) antithesis, both of which are then to be dialectically subsumed by a synthesis (to get the knowledge advancement tripodic spiral rolling!). A synthesis might be that reading/writing is useful at times (eg. to learn about bikes) but should be let go at others (eg. when actually riding a bike, except for road signs and the likes).

          IOW there's a time and place for read/write mass storage I/O operations, but at other times one should operate purely from internal registers and volatile RAM. I'm sure Plato knew that.

          The question of whether lossy compressed databases with stochastic recall can have the same type of utility as reading-and-writing is quite orthogonal to that though, imho (and not something I'd be willing to confidently bet on)!

        2. Anonymous Coward
          Anonymous Coward

          Re: Not surprised

          "they will read many things without instruction and will therefore seem to know many things, when they are for the most part ignorant"

          IIRC, that was also Socrates' (or Plato's, it is difficult to separate them) opinion of the Sophists of the time. They did not write things down anymore than Plato did. Rhetoric has been accused of the same sins.

          So, I think "seeming to know without understanding" was even then not limited to people who read.

    2. PinchOfSalt

      Re: Not surprised

      I've asked it before and I'll ask it again.

      If / when they go down this path, what will it mean to be a 'doctor'?

      If the doctors are using it, then why not give it directly to patients and cut out the meatbag altogether?

      I'm not suggesting this is a good outcome, but some nugget somewhere will do it.

      To some extent it makes sense to do this... It currently takes months to years to get the results of a brain scan for certain types of neurological problems. Having a means of doing this more efficiently / effectively makes sense. But, I don't trust that people will get lazy and just take this system at its word and not do the due diligence. Much as self driving cars will teach drivers to be inattentive at the wheel. Or GPS users to have less well developed brains for location mapping and route finding.

      1. Anonymous Coward
        Anonymous Coward

        Re: Not surprised

        Definitely! I've pondering whether to use my kitchen's microwave oven to conveniently get me a self-service brainscan MRI, affordably ... not sure what power level to use though at this time. Might try it with the neighbor's cat first (just to make sure)! ;)

      2. trindflo Silver badge

        Re: Not surprised

        I imagine all of us are playing doctor using the internet to some degree. I consider it 'homework' to bring a conjecture based on the internet sources available to me to my doctor, BUT I still ask the doctor and accept the professional knows more than I do and can sift through the noise better than I can.

        The idea that doctors would use an AI isn't ominous to me. The idea that doctors would rely on an AI is very ominous.

        It's been demonstrated that programmers who rely on AI lose some of their ability to think outside the bot. Specifically designing, optimizing, and debugging talents get lost.

        Reducing something to a practice that doesn't need to change by writing it down is a good thing. But you have to be willing and able to throw out that procedure when things do change.

        Back to doctors, if they use an AI for everything as a first step and the AI is correct 99 times out of 100, will the doctor notice the time the AI got it wrong? Probably if the AI is grossly wrong and recommends amputating the fourth leg, but maybe not if it is something subtle that requires a great deal of talent such as reading an X-ray or MRI.

  5. TheBruce

    harm mitigation strategies

    So "...argues that harm mitigation strategies need to be developed." How about don't use a predictive model. Whats wrong with an expert system? I can see using well designed models to help in coming up with new tools/methodology in diagnosis, but not a LLM doc.

  6. Alister

    It's hard enough for a human Doctor to sift through the obtuse and confusing information they get from a patient to arrive at a sensible diagnosis, what chance does an LLM have.

  7. Anonymous Coward
    Anonymous Coward

    Sunk Costs

    • "C'mon, everybody, the potential benefits of this are so great, and we've poured so much money into this -- I'm sure we can fix it up with guardrails!"

    • Medical doctors are not, by-and-large, computer experts.

    • I have to wonder what inducements may have been offered to this bunch to influence their opinions.

    Do not confuse the desirability of an outcome with the likelihood of achieving it. Plenty of doctors and computer scientists lose money in Reno's and Las Vegas' gambling establishments.

  8. An_Old_Dog Silver badge
    Headmaster

    Potentially-Misleading Headline

    The headline should have been, "... and Some Doctors are OK with that".

    Omitting the word "some" leads readers seeing the headline to think ALL doctors are OK with generative AI hallucinations, though TFA makes it clear that's not true.

  9. ICL1900-G3 Silver badge

    Eggheads

    Using only El Reg approved units, describe the key differences between an egghead and a boffin.

    1. Jonathan Richards 1 Silver badge
      Thumb Up

      Re: Eggheads

      Not going to take on the units challenge, but I offer that a definition of egghead involves theoretical work and maybe philosophising, whereas a boffin probably designs and builds gadgets, and may even know which end of torque wrench is which.

      1. Anonymous Coward
        Anonymous Coward

        Re: Eggheads

        Eggheads have no need to wear safety glasses.

        Boffins do. When you have USDA 1st grade boffins, bystanders should wear safety glasses.

  10. heyrick Silver badge

    I asked both OpenAI and DeepSeek if they knew of me and my blog. I then asked them to summarise the health situation that I talk about on my blog.

    OpenAI thought I had cancer and chronic pain.

    DeepSeek said: "He often writes candidly about his health challenges, particularly his struggles with chronic pain and Ehlers-Danlos Syndrome (EDS), a connective tissue disorder that affects joints, skin, and other tissues.

    This is a steaming heap of llama dung. And, honestly, the thought that doctors are taking any of this seriously is scary. Surely paying any attention to the witterings of a machine known to hallucinate is malpractice?

    I don't want to shill my blog, so I'll just say my domain is heyrick.eu and it's the blog entry of the 1st of February this year - in case you're interested.

    1. HuBo Silver badge
      Pint

      Chronic discombobulative digestion and related dyscrosseyedivismia ... I think we can all relate!

    2. Handlebars

      I told Gemini I was a doctor evaluating it's diagnostic skills, described a modestly complex presentation, and it gave me a decent response. On the other hand you could build a knowledge graph to do much the same using less compute resource and it would be deterministic as well.

    3. Nematode Bronze badge

      Jeez, it's hard enough for those of us in the aortopathy community (which includes EDS, Marf, LDS, etc) to get over to doctors sometimes about these conditions that we could do without an LLM training doctors to ignore the condition as "it's probably just an AI hallucination."

  11. Anonymous Coward
    Anonymous Coward

    She blinded science with hallucinations⁽¹⁾

    A puzzling piece of work imho, at 50 pages of text for the main body (plus 24 pages of bibliography and 15 of Appendix), it resembles a book chapter (rarely peer reviewed), or a monograph, much more than a scientific article (that would be peer reviewed). The bulk of its investigative parts seem to be in "Chapters" 6 to 8 (20 pages), the rest being literature review and proposed approaches for one thing or another.

    Chapter 7 (8 pages, in which the word "reveaos" occurs at least once) is interesting, but as with claims of Majorana probably needs careful critique to prevent "distractions caused by unreliable scientific claims". In particular, if, for experiment and results, "It ends up that it's sensitive to things like measurement ranges" (metaphorically), or that what is inferred cannot be determined from the results of the experimental protocol.

    In the present case, it is this commentard's opinion for example, that one cannot make broad conclusions about applicability of LLMs to the diagnosis process (eg. p.43 "Differential Diagnosis Test") when these tools are exposed to the entire dataset in one go (from NEJM case reports) because diagnosis is a multi-step process that requires starting from initial observations, generating a preliminary diagnosis, providing an initial prescription and requesting additional tests, evaluating the impacts and results of those to refine the diagnosis, and so forth, marching the process forward with hypotheses and tests until completion. Presenting an LLM with the entire case in one go gives it an unrealistic advantage (advanced knowledge of the process that was actually followed) relative to the (by nescessity) standard approach.

    With LLMs unable to follow the testing and treatment chronology, and unable to interpret lab test results, it seems clear that their potential for stepwise sequential differential diagnosis, and construction of the related decision trees, is at best very poor at present (close to null). One would have expected the authors to conclude their manuscript accordingly, imho!

    (1)- Loosely adapted from Thomas Dolby's 1982 song title "She Blinded Me with Science"

  12. TheMaskedMan Silver badge

    "Diagnosis Prediction consistently exhibited the lowest overall hallucination rates across all models, ranging from 0 percent to 22 percent"

    22 percent?? Almost 1 in 4, and this is the lowest?? And the buggers are using this on (currently) live patients?? Fuck me!!

    As I've said in here many times, LLMs have their uses, as long as you're confident you can spot their errors. Letting them loose on actual patients is probably not one of those uses, not because they may not be useful, but because some pillock doctor will blindly follow the machine with terminal results. And then blame the machine.

    Doctors are busy, and human. They make plenty of their own mistakes. They don't need a machine to make more for them. It is human nature to take an easy answer rather than think things through, and busy, tired doctors will do just that. Which is a shame, because it's quite possible that the LLM could catch medical errors, as well as making its own. It might also suggest possibilities that the doctor hasn't thought of. No harm in that - if the suggestion is useless it can easily be ignored. But they absolutely must not be relied upon to give the correct diagnosis. No amount of guardrails is going to make these things safe to do that and the notion that it might is just wishful thinking.

    As for liability, it lies with the user of the tool, just as it would if they cut off the wrong leg with a (presumably quite large) scalpel.

  13. TrickyRicky

    Out of time?

    Could the the difficulty in chronological ordering be down to the different DDMMYYYY vs MMDDYYYY regimes of various datasets that, without understanding probable context, would be easy to mis-sort?

  14. harmjschoonhoven
    IT Angle

    medical software

    should run for at least 30 seconds before giving a diagnosis, if necessary by a call to sleep(), because otherwise the doctor will not trust the result.

  15. vekkq

    My doc hallucinated a diagnosis just yesterday. If he's right or wrong I'll know for sure in 6 weeks.

    I'm already at a point were AI couldn't do much worse.

  16. Anonymous Coward
    Anonymous Coward

    Apple

    Apple includes the instructions it feeds to its Generative Models as plain text within a very deep directory structure. I found this one interesting:

    You are an assistant which helps the user respond to their mails. Given a mail, a draft response is initially provided based on a short reply snippet. In order to make the draft response nicer and complete, a set of question and its answer are provided. Please write a concise and natural reply by modifying the draft response to incorporate the given questions and their answers. Please limit the reply within 50 words. Do not hallucinate. Do not make up factual information.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like