Yawn (again)
Flannel + bullshit + bollocks
Marketing wins every time!
A notable flaw of AI is its habit of "hallucinating," making up plausible answers that have no basis in real-world data. AWS is trying to tackle this by introducing Amazon Bedrock Automated Reasoning checks. Amazon Bedrock is a managed service for generative AI applications and according to AWS CEO Matt Garman, who spoke at …
Great article - like a breath of fresh air. Learned something new. I am realizing the majority of commenters may be experts in very narrow fields. Instead articles about Elon get huge number of mostly hallucinatory comments. Political opinions are hardly provable through logic.
> defending the indefensible. - I doubt there are many experts in the field here, or willing to waste their time.
I'm not so sure of that: LLM's are, essentially, a reasoning engine by the definition given, they take a query and "reason" a response based on a huge model. So now they propose...using a reasoning engine to correct a reasoning engine? How does that work with assuring output?? If a mathematical model can be applied towards the output of a LLM to assure output quality, why wasn't this corrective mathematical model applied *before* the LLM placed the final output from the get-go?
I'm not on this level but it sounds like complete double-speak. We'll build a LLM, a reasoning engine, that has flaws...so we'll apply yet *another* reasoning engine to correct it! It'll work perfectly!!
Just (re)design the LLM from the get-go to stop believing it is "reasoning" and simply generate the (equivalent) output from that "validated automated policy" that it could create from the start. "Design a database...that has errors...and apply yet another database to correct it!".
You know they did this to the Doctor in Star Trek:Voyager, right? Life imitates art again.
Why sell one AI when you can sell two? Fix the original AI so it doesn't hallucinate and you've lost a sales opportunity! This isn't about fixing anything, but the bottom line. It's a bit like selling someone a service that doesn't work (i.e. it hallucinates) and then selling them the fix as well. Double bubble.
I'm going to have to disagree with you there.
Large Language Models are (by definition) "models" so not "reasoning."
And their implementation is through multi-layer artificial neural networks.
Stick a pattern in, get a pattern out.
The fact these things produce output patterns that are completely bogus suggests a complete lack of formal reasoning.
They are also only language models - they are not trying to model reality but just the way (up to now) humans have used language. Things are going to get worse because more and more of the training data is going to be output by LLMs.
I have no problem with the hallucinations. It is the same as any other bad information on the net.
What I do have a problem with is arguing with it. It is like trying to get a teenager to do chores.
Me: you removed functions from the code I gave you.
Ai: ok, here is the complete code
Me: where is the rest of the code?
Ai: you are right, here is the COMPLETE code.
Me: still not there. How many lines of code in the example that I gave you and how many did you provide back?
Ai: analyzing
Ai: you are correct.
Ai: analyzing
Ai: analyzing
Ai: analyzing
Ai: I found that there is an issue with your code. Here is your compete code
Me: nope, still not there.
2 year old with an attitude.
The company I work for seems to be going a little AI mad at the moment, proposing the use of it for all sorts of things. Whenever I get drawn into the discussions I ask a simple opening question, "Can the problem be solved by either a good pattern recogniser or a parrot?" Sometimes it is the pattern recogniser they need, which means that they might be looking at the right tool for the job. Often, they turn out to be relying on the parrot.
LLMs are not reasoning engines. An LLM takes your input (and a hidden setup script) and then produces text which looks like text it has already seen of people talking about the things in your input. It does no reasoning whatsoever. It picks a syllable that looks right, then it picks another one and it keeps going until it's RNG says to stop.
This is useful if you are writing code which is normal boilerplate or summarising well known works or even just boring reports that look like a million other boring reports. It won't generally design a new algorithm for a problem you've hit while researching something novel.
People need to get it into their heads that an LLM simulates what it has already encountered and the input is nothing more that context for that process. This is not intelligence, artificial or otherwise and anyone that tells you it is is lying.
The snooker player? Ruh-roh? I just find programming logic interesting is all and while I don't really agree with this UCL comp sci person about the nature of hallucinations, it's interesting to hear this take. It doesn't smell like obvious marketing BS to me, though he and his team may well ultimately be incorrect in their approach.
Looking at the underlying mathematical correctness of a program or a line of code seems to me to be an intellectually invigorating job. None of this precludes the guy / AWS in general being utterly, irrevocably wrong, but thinking about these things, in my view, is interesting. And he has a cool job. That's all i was saying.
The software has no intent, obviously.
The programmers did. They were after something that appears to be intelligent, which for most people (even Alan Turing!) means able to fake human sufficiently well to fool a human. What they didn't, and couldn't program was actual intelligence in the sense of reasoning, deduction, insight and all those other intangibles. So we get plausible but wrong.
Plausibility was the objective, correctness entirely incidental.
-A.
Unfortunately, it is not anthropomorphism ... its is a straight lie to hide the fact that LLMs are not suitable for SO MANY of the uses they are being force fitted to !!!
LLMs are not AI ... Full Stop !!!
There is NO AI by any credible measure ... Full Stop !!!
The many many Billions of Dollars being spent are chasing a Sci-Fi dream ...
Because IF they can convince enough people that it DOES work they will be in control of the world and rich (In terms of Power & Money) beyond measure.
This is a Bandwagon that NO-ONE can afford to ignore if you are a 'Tech Behemoth', they can afford the risk ... BUT the Start-Ups are risking an 'Arm & a Leg' on the hope they will be there 1st or 2nd.
There is so much at stake that even Apple are bending their so called 'Holier than thou' reputation to dabble in the mire called LLMs ... and by all accounts are no more successful than the rest of the crowd !!!
We the so called 'Users' are the test subjects verifying if the 'lie' can be told and it is accepted as truth !!!
:)
...except that's precisely the kind that actually matters. It made something up. Something that did not in fact happen.
It's not a question like "Is The Black Album good?" to which there really is no answer, because the judgement is aesthetic. You can hate it, or love it, you could argue about if the mix is technically good perhaps for a particular vinyl or CD release... but is it good? Well, you like it or not I guess.
Ultimately the latter question is unimportant for AI to answer, because it's unimportant for anyone to answer. There isn't an answer. But inventing things and asserting them as facts? Kinda more important that one.
So what kind of hallucinations can it solve?
It seems all he's done is list everything that people use LLMs for and say that it doesn't help those use cases.
In other words, he has nothing whatsoever and should be discarded.
"What developers do when they don't have that capability is quite conservative, call it defensive coding if you like."
Maybe somebody developing, let's say an HR system, might decide it defensive to check whether getting struck by a motor vehicle and shot in the foot is something that might require sick leave. It might also include things like checking that a driver has delivered all the packages that should have been delivered at a delivery point, checking that everything that went into a warehouse can be found when it's time to despatch and a whole lot of other things that Amazon coding doesn't do.
I took that statement to be about developers being less conservative/more confident when they have access to formal proofs, as per the example of Rust's optimisations. Like an extension of the confidence to refactor we'd get from good unit+integration tests.
Nothing directly about anyone (developers or not) using LLMs to write code, or Rust being better than C, or anything else that everyone gets upset about around here. It's an inference on the part of readers, and clueless management, that this means LLMs will replace everything "because formal proofs make it safe."
We ask Cook to comment on a well-known case of AI hallucination, a lawyer who cited cases invented by Open AI's Chat GPT. Cook says it was not quite the kind of hallucination the automated reasoning tool could solve. "We could build a database of all known [legal case] results and formalize them," he says. "I'm not sure if that would be the best application."
If you're going to provide cases to cite in a legal argument I'd have thought that it would be an essential application. Or is he saying that the citation provider isn't the best application?
The legal profession is already filled with databases of existing legal cases with extensive search facilities.
The correct answer from AI should be "I don't have a fucking clue, how about you go and consult one of the existing, competent legal databases run by actual lawyers?"
It's like your overconfident bullshitting mate down the pub whose ability to make shit up was killed by the advent of google (this definitely wasn't me, of course not)
Which ironically would be the answer an actual AGI would give if it hadn't been trained on the subject. Possibly without abuse (although if I'd been supplying the training set...)
One example of this is if you google "Plasmid." There are 2 kinds of plasmid. They are totally unrelated to each other yet Google usually normally spews up only one of them.
Pattern recognition <> understanding.
It seems to me the training process, at the very least, needs to treat each citation it finds as a single unique token, rather than just another miscellaneous collection of characters or words. Then it would at least only generate actual citations, rather than merely some text that resembles a citation. And it might even manage to put them - sometimes - in a correct context, but I don't think you could rely on it - the reasoning behind why authors cite a thing is not always clear - it can range from some-generic-backgound, all the way down to a-specific-result-on-page-something.
They are presumably not infallible, but many scientific journals now automatically check the citations in submitted papers, and raise a query if they cannot find an authoritative match.
Thus you might imagine a scheme where any "citation" detected and tokenised, could also be tagged as validated, or as unvalidated; and if used, reported as such.
Unlikely to be infallible, and only workable in specific cases, sure -- but still an improvement. But you probably wouldn't want an LLM to do the validating :-)
Consider what you're asking in programming terms.
A program "recognises" some data as not just data but data to restructure itself and how it recognizes future data.
It could be argued that is a true example of intelligent behaviour.
Do you think that's happening with ANN today?
I don't, but I don't know everything so I'd happily take some citations of work showing it is happening.
Real citations, not ones hallucinated by an AI.
>There's opportunity for the translation from natural language to logic to get a little bit wrong
Good that it's only a "little bit". (He didn't try to justify that claim with formal logic).
Perhaps getting the translation right is important. Perhaps it can't be done in most of the situations that AI is being thrown at.
Natural language is generally horrible for expressing logic. What language? you might say.
I suspect that people using this "technology" in future will need to be fluent in at least one of the _very few_ languages with enough internet data to snaffle^H^H^H^H^H^H^Htrain on . . . the others (languages, not people!) will simply wither & die.
This is why we have formal language specifications for doing coding. I translate what I intend to mean in English that has multiple potential interpretations into something formally complete that means precisely one thing such that the compiler/interpreter will always do the same thing when presented with it.
Thank you.
The day people who actually buy this s**t realise this will be the day this s**t starts going away*
*The Aholes who sell it then start selling something else. Hopefully with less life changing consequences.
LLMs are not "coded" - there is no source code you can analyse to prove that it is mathematically correct. LLMs are statistical black boxes: we don't know how or why they generate their output, other than the output will be plausible compared to the material used to train the LLM statistical model.
LLMs are bullshit-generators, nothing more.
This Automated Reasoning thing is something completely separate, some programmed logical way to decide whether the bullshit is "true" or not (at a given point in time, as "truth" is a function of time).
But if we have Automated Reasoning, we don't need LLM bullshit-generating "Artificial Intelligence" in the first place!
You can't fix LLM hallucination, because the very concept of using a Large Language Model for anything other than modelling language is fundamentally flawed.
Sure, if you ask an LLM "What is the Capital of Japan", it can do its statistical analysis of its training set and find that "Tokyo" is the word that most often comes up; but you don't need AI to answer that question. Any question that requires any sort of ability to answer rather than just memorising facts; an LLM has no chance.
Right on! That formally verified code is what we should want advanced programming tools to produce for us (rather than buggy boogers of cut-pasta mumbo-jumbo!). And RustBelt was looking at that straight through the cyclop eye of Coq (variously renamed cat in J. Alglave's work on hardware concurrency, or Rocq by the INRIA of mini-C's underlying F* KaRaMeL) ... so there may be hope for Rusty code yet!
And in the same throbbing vein of strict functionally verifiable discipline, one'd be remiss not to recall Google's Lean Deepmind effort on AlphaProof-ing the performances of otherwise girthy models of language ... it's inspiring to see math and compute get together in such festivity!
Either way, yes, auto-whip me out some serious verified code please, not childish snot!
Hallucination is the *goal* of generative AI, not a bug. Current gen generative AI neural nets are designed to extrapolate and interpolate answers and images. With randomness. This is by design. It is the WHOLE POINT. Human vision is a kind of hallucination. It feels accurate, but we don't see the world, we experience a hallucination based on reality. And it is never perfect. AI is similar but...er, without the continuous error correction and double checking of a human mind behind it.
I mean theoretically, yes, you can use AI as a fact search engine, but you need to redesign the entire backend of the neural net to fetch validated data. Duh?
Yes, I'm afraid "Duh."
(And by "duh" I mean you don't need several years and multiple conventions and symposiums on "what about AI Hallucination" to understand this, just a weekend with Python and some open source neural net libraries on a midrange PC.)
You can not use generative AI to generate facts. You can use it to /choose/ facts, assuming you have facts already available, but whether it will choose the fact that answers your question, not guaranteed.
Many careers have been built and successfully reached exit strategy based on intentionally abusing people's misunderstanding of this. :)
Ding ding Ding ...
"intentionally abusing people's misunderstanding"
And there you have the driver behind the 'AI scam' and all the other scams yet to come !!!
Money is to be made from 'misinformation' ... if you can create an indirect connection between 'you' and the 'misinformation machine' you can claim a total lack of intent !!!
[It wasn't me it is a small side-effect ... lets call it say an 'Hallucination' for now ... not a problem, we are working on minimising the issue please ignore it !!!]
:)
Leaving aside the "no shit Sherlock" I recall from formal logic, truth was satisfiabilty in all models which probably excludes any intuitive natural language notions of truth.
Even in the mathematical realm there are statements whose truth cannot be decided.
Formal verification and automated theorem proving are real, incredibly hard and light years away from the LLM nonsense to which this chap seems to be prostituting the former.
Just to formally model (3D) solid geometry in a simplified block world without adding a simplified physics is a non trivial undertaking.
I can see that adding successive layers of formal reasoning to fix LLMs' failings might be like stone soup with the stone (LLM) eventually being discarded along with the deceit that the resulting system possesses intelligence in any meaningful sense.
Everybody loves to laugh at philosophers until one day they have to figure out what truth is (or what numbers are, or what consciousness is, or...) and suddenly they realise there has been an entire discipline dedicated to trying to think through these problems for thousands of years. So they go looking for technical shortcuts because surely they're smarter than all those people but you can't solve problems of philosophy with a computer program and eventually they either give up (and claim they have solved problems that they haven't) or they start engaging with the problems philosophically (and everyone else in their original field laughs at them because they're a philosopher now.)
Another example of how focussing all our education on STEM and defunding the humanities ultimately results in well-educated STEM specialists running face-first into the French windows of humanities subjects.
"Another example of how focussing all our education on STEM and defunding the humanities ultimately results in well-educated STEM specialists running face-first into the French windows of humanities subjects."
This is so so true & so well put !!! :)
I am a typical Techie of the 80s+++ BUT I do not have a perception filter that blocks out all other fields of knowledge.
I like technology BUT also like 'dead tree' tangable artifacts AKA Books/documents etc !!!
I appreciate Humanities as far as my meager knowledge/understanding takes me and do try to expand my horizons by learning something new when I can.
Humanities informs your understanding of the world and how 'WE' got where we are today.
I like Clasical Music, as it is called by the man in the street, covering Gregorian chant to modern day, very wide but very interesting as well.
[I can appreciate how the growth of 'technology' of the day encouraged the developement of new Ideas, this as much applies to the Humanities as it applies to Science/Maths etc ... somewhere in there is Music and its development :) ]
I cannot understand why anyone would deliberately limit their knowledge when so much is available via the 'InterWebs' etc. :)
It is so easy now compared to 50+ years ago when access to libaries was the only option if you where NOT availing yourself of Further education.
:)
when you look at the training data, especially if they incorporate post from arseache, x/y/z or what ever its call now how can you expect AI to even come close to a sensible answer let alone a factually correct answer
LLM's are just oversize prabability databases, they break inputs up into tokens, the work out what we are asking (or rather take a wild stab at it) then the output is based on the probability of A following B, B following C etc
they are classed as "non deterministic" i.e. you can ask the same question a. number of times and depending on the optimisations you will get similar but different answers each time
which is why every AI systems answer normally has a tag line of "check the answers for accuracy"
if you have to fact check every answer an AI system delivers why waste time asking it in the first place?
You only need to read the sentence "Hallucination in a sense is a good thing, because it's the creativity" to be thoroughly depressed that a clearly intelligent individual has drunk the kool-aid.
It's really very simple, hallucinations are lies and hence are not "a good thing". Bodging another layer on top of a fundamentally flawed system will not resolve its inherent flaws.
If an AI spouts bollocks from a prompt, and some person copies said bollocks and puts it online and others reference the online bollocks and Google stores the bollocks and an "agent" scrapes the bollocks and feeds it back to the AI, then the bollocks over time becomes the reality...
I won't even go into the unintentional bias resulting from "mathematicians" being logical where associations get grabbed from a sub object and applied to the core object.
Example... me being an expert in quad bike customisation because I have a quad bike and most people customise their quad bikes. Mine is lucky if it gets an oil change.
This seems to be an example of Betteridge's law of headlines - Any headline that ends in a question mark can be answered by the word no.
"These are big claims. What lies behind them?"
"behind them?" could be replaced be a full stop.
"domain experts arguing about what the right answer should be"
The main issues aren't disagreements between experts, it's AI spurting out absolute horseshit.