
"Do no harm"...
Unless it saves money apparently...
The tendency of AI models to hallucinate – aka confidently making stuff up – isn't sufficient to disqualify them from use in healthcare settings. So, researchers have set out to enumerate the risks and formulate a plan to do no harm while still allowing medical professionals to consult with unreliable software assistants. No …
[I swear] ... to hold my teacher in this art equal to my own parents; to make him partner in my livelihood; when he is in need of money to share mine with him; to consider his family as my own brothers, and to teach them this art, if they want to learn it, without fee or indenture; to impart precept, oral instruction, and all other instruction to my own sons, the sons of my teacher, and to indentured pupils who have taken the Healer's oath, but to nobody else.
Hippocratic Oath
The thing the snap reaction above ignores is that doctors frequently confidently get things wrong, too.
I’ve had three seriously bad medical decisions made about me by consultants in my time, one of them life-changing. In each case, when I presented the information I gave to the doctor to ChatGPT recently, it suggested the right thing to do.
Would I like to be diagnosed purely by an LLM? No - no doubt it would have hallucinated in other cases. Do I hope doctors use them routinely in future as an immediate second opinion to challenge their decisions? Hell yes. Harm reduced, and despite additional time requiredments would likely be cheaper too as medical errors are costly all round,
My father, a retired consultant, agrees. As does my mother, a retired head of clinical audit for an NHS trust. It’s all about how they’re implemented.
Generative AI models are just complicated mathematical predictive models. Calling unwanted outputs "hallucinations" is an attempt to deflect (from the fact that "hallucinations" constitute 100% of their output) and to anthropomorphise them (so that people believe they're capable of "understanding" the idea of "truth" in the first place).
"Calling unwanted outputs "hallucinations" is an attempt to deflect ... and to anthropomorphise them"
I find this rabid objection to the use of metaphors silly and foolish. The practice also has a long and bloody history in religious intolerance and censorship.
Metaphors are the prime way language incorporates new notions, ideas, and objects.
Refusing to call this behavior from "AI models" (two metaphors) "hallucinating" is as silly as refusing to call a "programmable electronic calculator" (three metaphors) a "computer" because a "computer" used to be a woman doing computations and a "programmable electronic calculator" is not even human, let alone a woman.
"Hallucinations" perfectly describe the experience users have about these erroneous outputs. Users have absolute no interest in politico-philosophical hair splitting by people who object to the use of machine learning in practical life.
Btw, humans have anthropomorphized tools and objects in general since the origins of the species (and maybe before). Anthropomorphizing tools is not a reason to dismiss people's opinions or arguments. I think that those who were unable to anthropomorphise were also unable to use these tools effectively and became extinct as a consequence.
I have sympathies with both your arguments. If I can speculate, the OP is objecting to using metaphors that encourage us to accept (without thinking about it) that LLMs can think, reason, self-reflect on their thinking and reasoning, and generally behave like a human intelligence. They don't, can't, and, in my opinion cannot be made to do so. The underlying mechanism of an LLM, i.e. statistical forecast, is so different from how a human intelligence works[1] that we must guard against the mind-set that they're 'intelligent just like us'.
P.S. edit: re. "humans have anthropomorphized tools and objects in general". I don't think so, and certainly not in the way that LLMs are being identified with human attributes. Nobody sympathises with a hammer because it's having its head banged on a rock, nobody thinks that wheels are happy simply to turn on their axles. I'd like an instance of your statement, if you can give me one.
[1] This remark is confidently made, in the knowledge that I don't really know how human intelligence works. We don't understand our memory or acquisition of biases, we have to force ourselves really hard to think absolutely logically (syllogisms, etc.) and so on. Maybe I'm hallucinating.
I would suggest anthropomorphising tools and objects is actually so ingrained into language that we don't even notice we're doing it a lot of the time.
An example that springs to mind is how often people refer to a piece of equipment with an intermittent fault as 'a bit temperamental'.
Also, of course, the maritime tradition of assigning a female gender to ships.
"I don't think so, and certainly not in the way that LLMs are being identified with human attributes."
So you are one of those rare wonderful humans who have not cursed and yelled at their computer. Or their car.
I must admit that I do not have that level of restrain.
I've cursed and yelled at the way a computer reacts, by doing just what it has been told to do instead of what I want it to do, but always in the certain knowledge that a human has (mis-)programmed it to be that way.
An earlier point made about giving ships (vessels and vehicles in more general terms, too) a feminine gender is well taken; we can gain an affection for what is an engineered item keeping us afloat in a bitter and unforgiving ocean, but I still don't think that counts as *anthropomorphism*. Maybe I'm being too picky - I think anthropomorphism is attributing humanity to something non-human. Peppa Pig is anthropomorphic. Paddington Bear, and Peter Rabbit, too. The idea that we should consider ChatGPT and its ilk to be in any way human, or attuned to humanity, is mildly repulsive.
" I don't really know how human intelligence works"
The field is called Cognitive Psychology. It was part of the course I did on Artificial Intelligence way back when I was studying for my BSc. It's quite a complex topic, and a fascinating one.
It does explain why we anthropomorphize things around us: It's part of how we understand the world, and how we link things together and build associations to better predict outcomes from something we've never encountered before.
By definition an hallucination is a disorder of perception.
Being incapable of perception, what AI models suffer from is analogous to a formal thought disorder, most commonly seen in humans as a psychosis.
Headlines including the term "Psychotic AI model" don't do as much for the share price.
"Life of Earth", from the first single celled lifeforms that evolved to face the sun for sustenance, is itself comprised of "complicated mathematical predictive models" wonderfully encoded in DNA, where the "unwanted" outputs (behaviors) are pruned from the DNA tree of life by evolution. Therefore humans also falls in that category of things "comprised of complicated mathematical predictive models", but based in DNA, and on top of that, we have evolved a communal cherry of "customized intelligence via lifelong learning experiences", including natural language ability through speech and writing.
That's hugely different from chatbot software, even though chatbot software is also "comprised of complicated mathematical predictive models", However chatbot software is not based on DNA, it is JUST based on software algorithms and computer hardware. And that is a huge JUST, because it means chatbot software is not sustainable life. Chatbot software is JUST a TOOL. Although that tool does display some features in common with the human cherry of "customized intelligence", it is most definitely not self-sustainable life on earth.
We must not forget out roots of being DNA life on earth, must not forget that we are keepers of a treasured DNA pool enabling sustainable life on earth, and we should not be overly confident of our own rational intelligence having powers above and beyond "complicated mathematical predictive models". Testing and developing such models is the entire goal of science, isn't it?
Of course human life is not only about science.
Interesting back-and-forth, point-and-counterpoint, etc ...
Just came across a paper by Hicks, Humphries, and Slater, openAccess published in Ethics and Information Technology (2024), where they argue similarly that rather than "hallucinations", or even "lying", one should refer to LLM outputs using terminology of the form: ChatGPT is bullshit.
Two quote from there (that refer back to an Edwards, 2023): "the term hallucination anthropomorphises the LLMs" and "the use of a human psychological term risks anthropomorphising the LLMs".
--Your friendly librarian.
Every time homo sapiens can lose a skill because they can somehow get away with it, what they do? Jump at opportunity. Without exception.
Will doctors use machine learning tools to make critical decisions even when the tools are know to be broken? Of course they will if they can. As basically everyone else does.
No machine uprising will be needed because humanity is just going to give up preemptively.
"Every time homo sapiens can lose a skill because they can somehow get away with it, what they do?"
I assume you do not create your own fire using stones, sticks and dried moss? That is a skill humans have happily forgotten.
Plato has this nice dialog against the use of writing (Phaedrus, easy to find). He taught:
Their trust in writing, produced by external characters which are no part of themselves, will discourage the use of their own memory within them. You have invented an elixir not of memory, but of reminding; and you offer your pupils the appearance of wisdom, not true wisdom, for they will read many things without instruction and will therefore seem to know many things, when they are for the most part ignorant and hard to get along with, since they are not wise, but only appear wise.
It is worthy to note that Plato's teaching have reached us because someone wrote them down. So this makes this a very good example of why Luddites eventually loose the fight.
> Plato's teaching have reached us because someone wrote them down
Which doesn't mean that he was wrong. I've made a living for quite a long life, not by knowing things as Plato did, but having an idea where to find a reminder, as he warned of.
"they will read many things without instruction and will therefore seem to know many things, when they are for the most part ignorant
Fantastic! That's the Key User Requirement for Large Language Models, right there.
Well, this POV that Plato expresses, through Socrates, is an examplar of (relatively narrow) thesis, to be balanced-out by a (missing from that dialogue, but obvious) antithesis, both of which are then to be dialectically subsumed by a synthesis (to get the knowledge advancement tripodic spiral rolling!). A synthesis might be that reading/writing is useful at times (eg. to learn about bikes) but should be let go at others (eg. when actually riding a bike, except for road signs and the likes).
IOW there's a time and place for read/write mass storage I/O operations, but at other times one should operate purely from internal registers and volatile RAM. I'm sure Plato knew that.
The question of whether lossy compressed databases with stochastic recall can have the same type of utility as reading-and-writing is quite orthogonal to that though, imho (and not something I'd be willing to confidently bet on)!
"they will read many things without instruction and will therefore seem to know many things, when they are for the most part ignorant"
IIRC, that was also Socrates' (or Plato's, it is difficult to separate them) opinion of the Sophists of the time. They did not write things down anymore than Plato did. Rhetoric has been accused of the same sins.
So, I think "seeming to know without understanding" was even then not limited to people who read.
I've asked it before and I'll ask it again.
If / when they go down this path, what will it mean to be a 'doctor'?
If the doctors are using it, then why not give it directly to patients and cut out the meatbag altogether?
I'm not suggesting this is a good outcome, but some nugget somewhere will do it.
To some extent it makes sense to do this... It currently takes months to years to get the results of a brain scan for certain types of neurological problems. Having a means of doing this more efficiently / effectively makes sense. But, I don't trust that people will get lazy and just take this system at its word and not do the due diligence. Much as self driving cars will teach drivers to be inattentive at the wheel. Or GPS users to have less well developed brains for location mapping and route finding.
I imagine all of us are playing doctor using the internet to some degree. I consider it 'homework' to bring a conjecture based on the internet sources available to me to my doctor, BUT I still ask the doctor and accept the professional knows more than I do and can sift through the noise better than I can.
The idea that doctors would use an AI isn't ominous to me. The idea that doctors would rely on an AI is very ominous.
It's been demonstrated that programmers who rely on AI lose some of their ability to think outside the bot. Specifically designing, optimizing, and debugging talents get lost.
Reducing something to a practice that doesn't need to change by writing it down is a good thing. But you have to be willing and able to throw out that procedure when things do change.
Back to doctors, if they use an AI for everything as a first step and the AI is correct 99 times out of 100, will the doctor notice the time the AI got it wrong? Probably if the AI is grossly wrong and recommends amputating the fourth leg, but maybe not if it is something subtle that requires a great deal of talent such as reading an X-ray or MRI.
• "C'mon, everybody, the potential benefits of this are so great, and we've poured so much money into this -- I'm sure we can fix it up with guardrails!"
• Medical doctors are not, by-and-large, computer experts.
• I have to wonder what inducements may have been offered to this bunch to influence their opinions.
Do not confuse the desirability of an outcome with the likelihood of achieving it. Plenty of doctors and computer scientists lose money in Reno's and Las Vegas' gambling establishments.
I asked both OpenAI and DeepSeek if they knew of me and my blog. I then asked them to summarise the health situation that I talk about on my blog.
OpenAI thought I had cancer and chronic pain.
DeepSeek said: "He often writes candidly about his health challenges, particularly his struggles with chronic pain and Ehlers-Danlos Syndrome (EDS), a connective tissue disorder that affects joints, skin, and other tissues.
This is a steaming heap of llama dung. And, honestly, the thought that doctors are taking any of this seriously is scary. Surely paying any attention to the witterings of a machine known to hallucinate is malpractice?
I don't want to shill my blog, so I'll just say my domain is heyrick.eu and it's the blog entry of the 1st of February this year - in case you're interested.
A puzzling piece of work imho, at 50 pages of text for the main body (plus 24 pages of bibliography and 15 of Appendix), it resembles a book chapter (rarely peer reviewed), or a monograph, much more than a scientific article (that would be peer reviewed). The bulk of its investigative parts seem to be in "Chapters" 6 to 8 (20 pages), the rest being literature review and proposed approaches for one thing or another.
Chapter 7 (8 pages, in which the word "reveaos" occurs at least once) is interesting, but as with claims of Majorana probably needs careful critique to prevent "distractions caused by unreliable scientific claims". In particular, if, for experiment and results, "It ends up that it's sensitive to things like measurement ranges" (metaphorically), or that what is inferred cannot be determined from the results of the experimental protocol.
In the present case, it is this commentard's opinion for example, that one cannot make broad conclusions about applicability of LLMs to the diagnosis process (eg. p.43 "Differential Diagnosis Test") when these tools are exposed to the entire dataset in one go (from NEJM case reports) because diagnosis is a multi-step process that requires starting from initial observations, generating a preliminary diagnosis, providing an initial prescription and requesting additional tests, evaluating the impacts and results of those to refine the diagnosis, and so forth, marching the process forward with hypotheses and tests until completion. Presenting an LLM with the entire case in one go gives it an unrealistic advantage (advanced knowledge of the process that was actually followed) relative to the (by nescessity) standard approach.
With LLMs unable to follow the testing and treatment chronology, and unable to interpret lab test results, it seems clear that their potential for stepwise sequential differential diagnosis, and construction of the related decision trees, is at best very poor at present (close to null). One would have expected the authors to conclude their manuscript accordingly, imho!
(1)- Loosely adapted from Thomas Dolby's 1982 song title "She Blinded Me with Science"
"Diagnosis Prediction consistently exhibited the lowest overall hallucination rates across all models, ranging from 0 percent to 22 percent"
22 percent?? Almost 1 in 4, and this is the lowest?? And the buggers are using this on (currently) live patients?? Fuck me!!
As I've said in here many times, LLMs have their uses, as long as you're confident you can spot their errors. Letting them loose on actual patients is probably not one of those uses, not because they may not be useful, but because some pillock doctor will blindly follow the machine with terminal results. And then blame the machine.
Doctors are busy, and human. They make plenty of their own mistakes. They don't need a machine to make more for them. It is human nature to take an easy answer rather than think things through, and busy, tired doctors will do just that. Which is a shame, because it's quite possible that the LLM could catch medical errors, as well as making its own. It might also suggest possibilities that the doctor hasn't thought of. No harm in that - if the suggestion is useless it can easily be ignored. But they absolutely must not be relied upon to give the correct diagnosis. No amount of guardrails is going to make these things safe to do that and the notion that it might is just wishful thinking.
As for liability, it lies with the user of the tool, just as it would if they cut off the wrong leg with a (presumably quite large) scalpel.
Apple includes the instructions it feeds to its Generative Models as plain text within a very deep directory structure. I found this one interesting:
You are an assistant which helps the user respond to their mails. Given a mail, a draft response is initially provided based on a short reply snippet. In order to make the draft response nicer and complete, a set of question and its answer are provided. Please write a concise and natural reply by modifying the draft response to incorporate the given questions and their answers. Please limit the reply within 50 words. Do not hallucinate. Do not make up factual information.