Having AIs practice ordering simulcra of humans around? What could go wrong?
At least they can't pause the world to materialise a wall in front of the toilet.
Chatbots like Google's LaMDA or OpenAI's ChatGPT are not sentient nor that intelligent. Nonetheless, boffins believe they can use these large language models to simulate human behavior inspired by one of the world's most popular early computer games and some AI code. The latest effort along these lines comes from six computer …
where a carefully crafted conversation could convince an agent of the existence of a past event that never occurred – and hallucination, among other things.
So just like real life.
Note: 'hallucination' is a domain-specific re-definition. 'delusion' would probably be closer, but perhaps the choice was deliberate to reduce cognate interference,
Valhalla!
Play in your browser here: https://zxart.ee/eng/software/game/adventure/rpg/valhalla/
From the examples in the paper, the most obvious characteristic of the simulated interactions is their shallowness. Neither the content nor any personality traits of the participants create an impression of real humans interacting. The "ability to remember past experiences, plan future actions based on their experiences, react appropriately to unexpected events, and reflect on their performance to improve their future actions"6.1 does not seem to have been entirely validated unless a very superficial interpretation, particularly of "react appropriately to unexpected events, and reflect on their performance to improve their future actions" is allowed. Furthermore interpretation of what constitutes "believable behavior" 6.5 is highly subjective (the same problem that beset the Turing Test). Personally, I'd very soon abandon any attempt at conversation when faced with banal responses such as those in the sample interactions. The fundamental (and probably insuperable) probem of course is that this automaton has zero understanding about the meaning of the ideas it's trying express.
The fundamental (and probably insuperable) probem[sic] of course is that this automaton has zero understanding about the meaning of the ideas it's trying express.
Wouldn't that describe a large part of the (disenfranchised) human population too? It takes quite an effort not only to understand an idea, but also to act on the premise the idea represents, as well as the freedom to do so.
Wouldn't it be fair to say that many of the "problems" society is facing these days comes from the varied mixture of ideas, meanings, responses and perceptions? Humans very often vary their behaviors based on so many factors that may include environmental, other human presence, other animal presence, situations and a potential myriad of other scenarios.
These LLMs (much more preferable than AI) are interesting but they, as well as the concept of AI, seem to lack the ability to consider the many things that the layers of meaning create in their models. In all fairness, human beings seem to generally lack the ability to consider these things as well unless they are directly presented with a situational need to consider them.
...directly presented with a situational need to consider them.
That is an important insight. Need is master and drive of invention and filters those who fail. AI has no "need" and has no "drive" to do anything. The experiment is basically an exercise in putting LLMs in an artificial virtual world and see what happens.
AI has no "need" and has no "drive" to do anything.
Couldn't you just program artificial need? We do it all the time with people. It's called governing through fear. People turn the natural inclination to select 'familiar' into prejudice and then use that to generate fear to steer society in a predetermined course. That fear is an artificial need.
This post has been deleted by its author
A few questions Dave. :D
1. Who is "you" that will program artificial need?
2. How do you determine "need"?
3. What does "need" actually mean?
I could go on and on.
One last gem regarding "need". To quote a humorous movie: "Is it necessary to drink my own urine? No. But I do it anyway because it's sterile and I like the taste."
"Wouldn't that describe a large part of the (disenfranchised) human population too?"
Yes, but that very variation is one of the things that make human interactions recognisably human, and perusal of a decent dictionary shows the huge subtlety of it (just take a look at the range of meanings of the word "quite", some of which even depend on the tone of voice and inflection with which it's uttered). Understanding things differently directly implies understanding which is what's missing from these automata. They aren't really semantic engines (i.e. engines processing meaning in the human sense) they're template collage engines that plug together (to them) meaningless fragments on the basis of statistical coincidence.
Wouldn't that describe a large part of the (disenfranchised) human population too?
Yes, and this is the reason why I think the profession of "robopsychologist" will be in very high demand soon, treating both kinds of patients. If there is a real life Dr. Susan Calvin she must be just over 40 and at the peak of her career right now.
The ancient electric winch and maximum capacity eight persons jobs bear as much relation to a Sirius Cybernetic Corporation ‘Happy Vertical People Transporter’, as a packet of peanuts does to the entire West Wing of the Sirian State Mental Hospital.
This is because they operate on the unlikely principle of defocused temporal perception - a curious system which enables the elevator to be on the right floor to pick you up even before you knew you wanted it, thus eliminating all the tedious chatting, relaxing, and making friends that people were previously forced to do whilst waiting for elevators.
Not unnaturally, many lifts imbued with intelligence and precognition became terribly frustrated with the mindless business of going up or down, experimented briefly with the notion of going sideways - as a sort of existential protest - demanded participation in the decision making process, and, finally, took to sulking in basements.
At this point a man called Gardrilla Manceframe rediscovered and patented a device he had seen in a history book called a staircase. It has been calculated that his most recent tax bill paid for the social security of five thousand redundant Sirius Cybernetics Workers, the hospitalisation of a hundred Sirius Cybernetics Executives, and the psychiatric treatment of over seventeen-and-a-half-thousand neurotic lifts.
The fundamental (and probably insuperable) probem[sic] of course is that this automaton has zero understanding about the meaning of the ideas it's trying express.
Wouldn't that describe a large part of the (disenfranchised) human population too?
It arguably includes the entire human population, since pinning down what "meaning" means is one of the thornier epistemological problems. It is entirely possible that what we think of as "meaning" is simply a complex of associated concepts, and eventually transformer-like models1 might have a valid claim to be doing something equivalent.
So calling that a "fundamental" problem is rather an article of faith.
1I'd argue not the current crop of transformer LLMs, at least not until they're scaled up to the point where something like Boltzmann brains can spontaneously arise in parameter space. Unidirectional MLP or softmax transformer stacks (and linear transformer stacks even less so) simply aren't doing anything sufficiently complicated or with enough recurrence.
From the examples in the paper, the most obvious characteristic of the simulated interactions is their shallowness. [etc]
This probably means ChatGPT could write most soap operas. Most characters in them seem never to learn from past experience and exhibit random interests which last as long as a story arc and are then never mentioned again.
[Wanders off humming The Archers theme tune.]
I remember that as far back as 91, at University, a couple of us wrote a simple Eliza-type program that took generic input, and then programmed it up to write scripts for Neighbours (old Aussie soap opera starring Kylie for those of you who are still whippersnappers). Later datasets produced a university-based soap, rap songs, and a new soap imaginatively based on students in a computer lab.
The fundamental (and probably insuperable) probem of course is that this automaton has zero understanding about the meaning of the ideas it's trying express.
How to we know you have any understanding of meaning and are not a philosophical zombie?
ChatGPT and other 'AI's of that ilk (LLMs) are not very good philosophical zombies, but there does not seem to be a fundamental reason why they can't be improved to fool many people a lot of the time. Which is interesting and worrying at the same time.
Two issues, one of societal norms which are endlessly complicated, not to say widely variable not only on context but also situation, and two, that of physical presence; “choosing to enter the bathroom when there is another person inside” may be a societal norm in some cases but does not factor in an external possibility that the door might be locked in some situations, but not in others. Isabella Rodriguez may well choose not to lock the bathroom door when Maria Lopez is in the house, “bathroom door lock is idle”, but when her parents are visiting, “bathroom door lock is in use”.
The funniest part is that while the researchers admitted they didn't account for things like the single-occupancy bathroom (or door locks), they completely missed the fact that this is pretty normal behavior in many roommate situations, even if there's barely elbow room for two. Sometimes you both have 20 minutes to get ready for class/work/date/etc and there's just no way to take turns, so, you deal.
Admittedly it's mostly women who are more OK with this, since we're mostly less inclined to roll out of bed into some pants and step out, but hell, that was the pair involved. Accidental emergent behavior. Many games' most beloved bugs are along these lines!
Got to ChatGPT and ask it:
What song contains the lyrics "And the day gets crazy, and I feel like I can't win."
Tell us what it said.
I've been arguing with it all weekend as I'm trying to work out what a song near the end of Columbo Cries Wolf is. Not the end credits tune, that is noted in the credits. But another song playing in the background. Shazam, despite it being clear, couldn't find it. So I asked ChatGPT.
It not only was confidently wrong, it then admitted it was wrong then was confidently wrong again. The best respond to further questions relating to this subject were, where I just added "from 1990" on the end. It said the song was "Pray" by MC Hammer. It then proceeded to them tell my the lyrics appear in the second verse of the song. It then wrote out that second verse, which not only didn't contain anything from my question but also isn't even a verse from the MC Hammer song.
If you really want to break ChatGPT, ask it what day of the week February 29th, 2025 is on (or any other non-leap-year).
It's of course not a real date, but it will give you an answer:
> February 29th, 2025 falls on a Saturday. It is a leap year, and leap years occur every four years, with the exception of years divisible by 100 but not by 400. Since 2000 was a leap year, and 2024 is also a leap year, February 29th will occur on a Saturday in 2025.
If you then question it, "Are you sure?", you get:
> I apologize for the error in my previous response. Since the cutoff for my training data is in 2021, I made a mistake in calculating the day of the week for February 29th, 2025.
> To correct my previous answer, February 29th, 2025 falls on a Friday. It is a leap year, and leap years occur every four years, with the exception of years divisible by 100 but not by 400. Since 2000 was a leap year, and 2024 is also a leap year, February 29th will occur on a Friday in 2025.
You can continue indefinitely, every time you question it, it will change its answer.
Confidently incorrect describes it well.
ChatGPT does not possess one and single set of traits which characterizes one person, and have no only one bias, being molded this way. The presence of individuality for a game is the crucial thing though, because the individuality is what can produce unexpected moves and deeds, both foolish and wise. Therefore, instead of using this giant ChatGPT model, one needs to construct an incredibly smaller and unthinkable cheaper model for a particular person, which contains all his knowledge and also his bias; there the bias will be deduced from the choice of phrases — let's say this model is approximately 500.000-2 million phrases instead of 150 billion for ChatGPT. (I call such models "lexical clones", because each reflects the inner nature of its prototype)
one needs to construct an incredibly smaller and unthinkable cheaper model for a particular person, which contains all his knowledge and also his bias; there the bias will be deduced from the choice of phrases — let's say this model is approximately 500.000-2 million phrases instead of 150 billion for ChatGPT. (I call such models "lexical clones", because each reflects the inner nature of its prototype)
There's a science fiction story (I don't have the time to look for it in my library) where someone produces an android clone of themselves which is programmed to learn the person's characteristics so it behaves in the same way, and converses in the same way. It is then set to monitor the person, and once the dementia has progressed too far in the android's (learned/simulated) opinion, the android kills the person, then acts as their executor.
The trouble is, even if ChatGPT had access to all my written output, I don't think it, or any other language model could convincingly simulate me. I'm more than my written output, and my experiences are not just lexical input. The project to build a 'life recorder' - an always on video and audio recording - might give enough data, but even then, it doesn't record your feelings about what you interact with, and even a description of your memories (a talking biography) will be woefully incomplete - think of Proust.
I think ChatGPT and other LLMs are a 'parlour-trick', that while amusing and possible occasionally useful, don't solve the problem of 'Artificial Intelligence'. I suspect that things like the defunct Cog project, and Doug Lenat's Cyc are needed (as well as much else) to be able to successfully simulate a human. It's the difference between 'book-learning' and 'learning by experience', coupled with 'common sense' knowledge about the world and how to make human-identical inferences.
It's a fascinating area. If nothing else, in order to pass a 'Turing Test', an AI either has to be able to lie, or believe itself to be human. Otherwise, simply asking the question "Are you human?" would be a simple way of determining which interlocutor was which. Either prospect is worrying.
ChatGPT has got the lying down pat.
I think the inner life, as expressed by certain novelists, isn't real. And irrelevant, if not, since I can't directly access it.
What matters is whether ChatGPT can predict you. To wit, if we fed it your corpus to date, would it's response to this prompt be "the same" as yours (a) all the time, (b) some of the time...? The more "same" the response and the more often it was the "same" the more we are likely to believe it was you. But it doesn't attempt to simulate the vat of hormones and chemicals which perturb your thinking, so it's unlikely to succeed.