Darwin In Action
I am beginning to think that this indecent haste to adopt “AI technologies” into everything is going to lead to an interesting mass extinction when it all falls apart ...
Several big businesses have published source code that incorporates a software package previously hallucinated by generative AI. Not only that but someone, having spotted this reoccurring hallucination, had turned that made-up dependency into a real one, which was subsequently downloaded and installed thousands of times by …
To be fair, the practice of downloading random software packages and trusting their contents was well-established before AI came along.
In fact, it could even be said they're both symptoms of the same problem - wanting someone else to do your work for you and not caring about the consequences. Fundamentally, that's what's going to lead to the mass-extinction event.
I have long held that lazy people make the best programmers, because they will go to great lengths to get computers to do boring tasks that other people might just sit and do repetitively. This is good.
However, this same trait leads to exactly that thing of reliance on external code without any kind of research into it, too. This is bad.
GJC
you forgot the obligatory XKCD reference
"Not that risky" for very large values of "not that", perhaps. There are tons of exploitable vulnerabilities that start with piping tainted output to a shell run by a normal-privilege user. The number grows considerably if the user is a developer.
curl | bash is one of the most idiotic idioms to have entered developer parlance in recent years. Sure, back in the '80s we had shar archives, but they were relatively rare and generally of known provenance and were easy to inspect. These days, piping random crap to the shell is remarkably stupid, even considering the rampant stupidity in IT.
Yeah, I'm not sure where the AI angle is here beyond someone using AI to produce some code.
From reading the article several times, it seems like someone (a real person) made some code (with the help of an AI) that deliberately included some dependencies that didn't exist and put it online for people to download, as part of a test. If someone else later fulfilled those dependencies and did so with some code that contained malware, that malware could be run.
This just feels like "downloaded code can run other code that could contain malware (but in this case didn't)"
Quite possibly, but then how would they check that part? At university, it can be very hard to prove beyond all reasonable doubt students haven't used AI to generate text, it is harder in code, we find. Homework for students is becoming more and more of a risk in this respect. We are moving more and more to assessment in the form of "old-fashioned" exams (also digital ones) in which they have no access to AI tools. I can only imagine how hard it would be for companies to check code from third parties for use of AI tools.
those foolish enough to depend on these "generative AI" tools are going to get the karma they so richly deserve
Unfortunately, it's their users and possibly customers who will get the karma they don't deserve (save for the arguably just punishment for using lousy and insecure software, but quality control may be non-trivial).
An extra note: quality and security are highly correlated.
At an enterprise I worked at in 2023, they announced a plan to replace developer-written code with AI-generated code, arguing that it improves security because developers can write insecure code. There really is the unexamined assumption among non-technical management that the computer is always right.
Of course if we challenged management about this, they thought we were making things up to preserve our jobs. There was no hard evidence at the time that AI could generate insecure code, but there was lots of evidence that human developers have written insecure code.
We recently did some tests to assess the use of offline generative AI in a particular application.
We gave it specific prompts asking various model to summarise a particular patent.
Twice we got the "reasonable" response aloong the lines of "I can't do that as I don't what that patent says" - remembering that this is "offline" so nothing can get back onto the internet for privacy reasons.
The rest of the time (the vast majority of it) they literally made it up as they went along. We got summaries of multiple different patents which had NOTING to do with the one we specified.
Artificial Intelligence? More like Artificial Imagination.
For the most part they churned out pure fiction.
It looked really credible and convincing - but still utter shite.
Forget ditching developers in favour of hiring lesser mortals. No need to hire anyone when the end user is perfectly capable as they know exactly what they want it to do... after all, Excel is perfectly OK for data processing and accounting for them
Think how many businesses still have a typing pool...
> Alibaba, which at the time of writing still includes a pip command to download the Python package huggingface-cli in its GraphTranslator installation instructions
And this package didn't exist, in any form, before the "naughty" version was created.
It is one thing to publish instructions that install the wrong package without any errors, but even the most incompetent of "testers" can surely spot when pip tells you there is no such package at all?
Not if you lump it into a requirements file which says to install a bunch of packages, and you just assume that if you run that file and the program works, you must be fine. I'm guessing it was in a list of other packages so it wasn't a completely ineffectual install step and that they didn't have any testers of any competency checking on it.
Bad guess. The README (which has now been updated) explicltly told users to run the command "pip install $WRONG_PACKAGE". FWIW it looks like the reason for installing the $WRONG_PACKAGE was just to facilitate downloading some data files, and this has now been replaced with a simple "git clone" command.
The paper suggests that Alibaba and others added this package because AI told them to, and the reason pip succeeded in downloading it is that the researcher had created the empty package after AI had repeatedly recommended a non-existent package of that name.
How many similarly AI-invented packages have already been created by miscreants, quietly waiting to undertake supply chain attacks is an exercise left to the reader.
It is apparently useless to rail against devs pushing external, unverified code to their production servers. This stupidity is now ingrained into force of habit, and everyone smiles in beatitude at the practice.
So, what is it going to take for companies to put the brakes on this ?
If pseudo-AI and miscreants work together to smash through that wall and make companies understand that you do not run unverified code on a production server, then I'm almost ready to welcome the mayhem that will ensue.
Yeah, no, that's not going to do it. Black Monday was bad (Wikipedia quotes an estimate of $1.71T for it; assuming that's not adjusted for inflation, around $4.67T today), but much of that was paper losses — change in unrealized gain. And it was widely distributed, which also helps ease the shock.
NotPetya's cost was "only" around $12.7B (constant dollars), a couple of orders of magnitude under Black Monday. But its impact on industry was arguably worse, particularly since it hit shipping especially hard and so disrupted supply chains. It also had long-running repercussions for the insurance sector.
It's hard to find decent estimates for Sunburst/Supernova (SolarWinds), but it looks like it may be in the neighborhood of $100B, about 1 oom greater than NotPetya and 1 oom less than Black Monday. And, again, this wasn't just paper losses; it was huge disruptions to business and considerable ongoing effects, many of them unknown, due to sustained presence of attackers in external systems, information exfiltration, etc.
The simple fact of the matter is that industry is only beginning to price IT security correctly (just look at the turmoil in IT-risk insurance), and even if and when we start doing that, it will take a long time to correct flawed installations of software and hardware, and the flawed practices which produce them.
Legacy systems are sticky and replacing them is expensive and risky (which of course is another kind of expense, but one that's difficult to budget). Legacy practices require determining standards, imposing them, and enforcing them, and as someone who's worked in software security for a few decades, believe me — that is very difficult. Developers will go to great lengths to be contrary. Project and product managers will nod their heads at your security presentations and then promote new-feature development instead.
Since it's called "hallucinating" in the academic literature, it's a technical term that should probably be included in a tech rag's article. It is also vastly more specific _what_ kind of 'bug or error' we are dealing with - a large neural network randomly deciding in full detail that something that does not exist, exists (vs Tim from accounts adding that incorrect line item into the database). I agree it's a bit anthropomorphic but eh, we don't make the rules, and inventing new terminology confuses an already very confusing topic.
> Can we please stop using the nonsense marketing term "hallucinate"
No, I don't think so. It's a correct use of the verb "hallucinate" as defined by every dictionary1 that I've just looked at. And I don't think it's a marketing term, at all. It's a class of failure, not a feature.
1E.g. Shorter OED, p918: Hallucination 1646. ...2. Path. and Psych. The apparent perception of an external object when no such object is present.
But the AI doesn't perceive anything. The resulting code has an unresolved external dependency (that termonoligy is already more specific than hallucination and is long-established) which is then turned into an attack vector by providing some malware of the same name.
I don't see that introducing the term "hallucinate" does anything for clarity. On the contrary, it seems to blind people to just what a fucking stupid balls up this actually is.
iron,
The problem with the word "lie" is that it requires intent. A lie is a specific choice to say something that is both not true, and that you know is not true. If somebody mistakenly tells you something that isn't true - they've simply made an error.
So an LLM can't lie, because it doesn't have agency. It has no ability to decide, because it has no consciousness. Also, it doesn't know anything, because it's simply a bunch of statistical relationships designed to put words into plausible orders. The problem is the people that make the models, who do lie, because they know what they've made is just a statistical model, but they call it AI in order to get loads of people to give them money.
I can then understand the obvjection to the term hallucinate. Because that also implies some kind of consciousness. But I doubt there's any better term, because "made up" and imagined" also imply agency.
Could we compromise on Machine-Generated Random Bollocks? Then the academics could call it an MGRB event.
Could we compromise on Machine-Generated Random Bollocks?
I'd prefer, and would excuse the all-caps, UTTERLY UNTRUTHFUL LYING BULLSHIT HALLUCINATORY AI GENERATED CRAP.
While "lie" may technically require intent I am pretty sure most people will take uttering falsehood and untruth without intent as lying and therefore a lie in their stride.
"While "lie" may technically require intent I am pretty sure most people will take uttering falsehood and untruth without intent as lying"
I don't think they do. I certainly don't. I class that as being wrong. I know lots of people who are frequently wrong but aren't trying to be dishonest, and the distinction is relevant to what I think of them. Of course, it can be difficult to know what the intent is, because I also know some types of people who say something they know is incorrect, and are thus lying, but are good at acting as if they're really deluded into thinking it's true. Those people are quite annoying.
"For only £10,000 ($50,000) our experts will install demon-defying crystals[1] in your coding tool-chain ..."
[1]Follow that link at your own risk to your own sanity!
For those who don't want to risk it, a "I’m a psychic medium on Medium. •BA in Psychology •Substitute teacher •Lover of pink" is trying to sell small lumps of rock to people who have larger lumps of inanimate matter between their ears. Fairly harmless, as these things go.
Hmmm, that should lead you to conclude that they're not being pulled out of thin air, but are happening for a reason. Like, the chatbots have got together and come up with this plan. That they intend to start populating these dependencies with real code once they're installed everywhere..... and then take over the world!
I am only half joking by the way - at some point the amount of pollution of computing ecosystems by dangerous or simply shoddy code generated by these things is going to require one hell of a cleanup job, with much hand wringing about how to stop all this happening again in the future.
And they're getting close to nuclear power plants... Coincidence?
One possible explanation is nefarious SEO-optimised auto-generated webshites are now being used to poison AI models..
That, or GitHub really is ingesting private repos into CoPilot, and there could be an unlimited number of private repos full of malicious code designed to poison any AI model. These days it is possible to optimise that poisoning to cause model collapse.
The only way to win is not to play.
Hmmm, that should lead you to conclude that they're not being pulled out of thin air, but are happening for a reason.
An economist wrote a piece about this, when he came across Chat-GPT inventing non-exisitent Nobel Prize winning economics papers. It even used real academic author's names - but paired two guys who'd never worked together.
The piece about it said that both authors had published a lot, but also both had common first and last names - which also appeared in a very large number of papers - and so it was possible that the LLM had picked both names at random - or that it had picked both authors as being statiscailly likely to appear in the author field of academic papers. Also the title of the invented paper was made up of words that were common.
Basically it was working as designed. It had ingested a huge database of names and authors of academic papers, and was now spiiting them out in statistically likely orders - the same way it does with sentence construction.
Could it be a feedback loop... A hallucinates a 'solution', B uses A's solution, C sees A and B using the same solution...
A bit like Bermuda Triangle theories... A hallucinates a new theory (turtles), B cites A, C cites A and B... A can then cite B and C as proving their original imaginary theory... and soon it's turtles all the way down
Every time I've tried to use it to do simple translation jobs (c# -> java for example).. it is after all a language model, that should be its wheelhouse, it has hallucinated parts of the code that have meant wasting time discovering that and trying to find out what the real function is called, and it taking longer than if I just did the work manually.
For more obscure stuff (I asked it a question using delphi once, just a simple maths problem) it hallucinated an entire solution using an nonexistant package and its output was entirely useless.
It's like stackoverflow but worse.
It's scary that places like github are pushing this crap.
'Armed with thousands of "how to" questions, he queried four AI models (GPT-3.5-Turbo, GPT-4, Gemini Pro aka Bard, and Command [Cohere]) regarding programming challenges in five different programming languages/runtimes (Python, Node.js, Go, .Net, and Ruby), each of which has its own packaging system.
'It turns out a portion of the names these chatbots pull out of thin air are persistent, some across different models.'
Surely alarm bells should be ringing.
Actually, surely alarms bells should already have been ringing at any org foolish enough to dabble (or worse) in this snake oil. Where was it getting the names of these fictional packages from? Why did it think it needed them? What was the consequence of their non-existence? If nothing, why were they required?
How did they not notice?
'"Our findings revealed that several large companies either use or recommend this package in their repositories. For instance, instructions for installing this package can be found in the README of a repository dedicated to research conducted by Alibaba."'
I've been banging on for ages how "plausible but wrong" is the worst possible combination for software. Turns out that my imagination is insufficiently developed. Plausible but malevolent is worse.
Shall we all start taking responsibility for our own software projects now?
-A.
These AI chatbots are not hallucinating, they don't "know" anything, and calling it hallucinating grants too much credit. They're MUCH too random to have any sort of thoughts or hallucinations about anything.
Anywhere else, this would be called "making things up", "lying", "misleading", but somehow AI gets a free pass to instead be considered "hallucinating".
This has been covered at length in the first thread here. Your complaint is also not internally consistent. If using the term "hallucination" is giving the program too much credit, then surely so would "making things up" or "lying", as both require intent. "Misleading" fits a little better, but typical usage uses misleading most frequently for intentionally misleading, and your entire first sentence was trying to make sure that the terms make it clear that the program is not thinking. So all three of your terms don't meet your own goal, and if we tried to have one, it would likely be the ungainly "emit information that is either factually incorrect, likely to lead to unwarranted results, or irrelevant". Maybe choosing a word, a word that clearly indicates the degree to which the results are useless, is logical after all?
Exactly. People love to regurgitate incorrect terms, apparently to seem "cool" or "in the know." But they just reveal ignorance.
AI does not "hallucinate." English offers a wide variety of words, one of which is "fabricate." Use that if you can't figure out an equally good one next time.
C-suite (including RedHat's as I read recently) will turn to mcKinsey, who states that it is fine to "unleash developer productivity with generative AI", especially in four key areas:
- Expediting manual and repetitive work
- Jump-starting the first draft of new code
- Accelerating updates to existing code
- Increasing developers’ ability to tackle new challenges
Those who code should document really good where management pushed them into using AI without all the additional checks. It may become an issue in court.
It may become an issue in court.
Yep. We have seen over the last couple of months how the Post Office have tried to redirect the blame away from their own lies and malicious prosecutions towards blaming the foreign, Japanese Fujitsu!
Just imagine how the next similar scandal will involve "it isn't our fault at all that all these lives were lost/destroyed - it was AI wot did it by lying to us! How could we have possibly known we should have tested it?"
"McKinsey ... states that it is fine to 'unleash developer productivity with generative AI', especially in four key areas...."
One may conclude that using LLM's for debugging code is an ideal use case.
I am reminded of my experience using my college department's secretary to type my thesis (my Apple ][+ not up to the task). Each successive edit fixed some errors, made new errors, and redid errors previously fixed. (I would eventually borrow a friend's Mac.)
Software will become so convoluted with bugs, ghosts of past, present and future, that it will be better to throw in the towel and start over. (Whether the C-Suite decides to fire all the programmers and just use AI is the only question.)
Personally, I do not trust any AI that is "publically available".
It is even less controllable then random plugins/libraries you download and use in your own products.
I ask myself these questions on AI:
Who feeds it?
With what data is it loaded/fed/trained/polluted?
Who owns it?
What are the owner's inentions?
AI does have value, given that above questions can be answered by you. If who=you, things look even better as there are less actors to deal with.
If you aren't using one yet, you probably should be.
And if you are on the JavaScript bucking bronco, where there's a new re-invented lib or framework every other week, you may want to be locking version numbers in your package.json files.
Sure, automated checks (you have got those, right?), do a damn good job of alerting you to security threats, but they'll only catch what is known about, and release updates happen in seconds.
In the case of `huggingface-cli`, this strikes me as AI mining human-based content where someone once made a typo.
This is a specific type of hallucination where the AI can't tell the difference between right and wrong answers in its underlying data - it treats everything as truth.
Sooner or later, this weakness is going to kill someone.