Transcript
Is there a transcript of this conversation available?
Some software has supposedly passed the Turing Test – a controversial benchmark of artificial intelligence – by fooling a handful of humans into thinking it's a talkative 13-year-old Ukrainian lad. Cyborg prof Kevin Warwick argues this is the first time a machine has ever passed the famous test. We're told the successful chat …
I'm hunting for transcripts. In the meantime, this is the sort of level of conversation you can have with the public online version (DDoSed under popularity)
C.
Indeed. I didn't see much in that conversation that made it look significantly better than something I wrote at a YTS place over 30 years ago, other than benefiting from greater storage (and processor grunt) to provide a much bigger range of data.
I despair at the judges.
Edit: I've just spotted that the example transcript was from just after one of the previous attempts, when it achieved just under 30% - so it may have improved since then. But I still despair at the judges in that attempt.
I could see it fooling some people if they believed they were talking to a human with less than stellar command of the English language (such as could be expected from a 13 year old Ukrainian), but 30% of humans? I think that's pushing it.
Then again, people are stupid and will believe whatever they want to be true or what they fear is true.
Perhaps we need an 'advanced' Turing Test, judged solely by IT types. Seriously. As has been said here several times, this was a really dreadful Eliza hash without anything to show the improvements that have been made in AI in the 30-odd years since that program was being bashed into every home computer on the planet.
If it can fool > 30% of developers, I'll definitely sit up and take note.
In the transcript, it had begun to seem like a chat bot by its second reply:
J: What did you do today so far?
E: Since early this morning I’ve been involved in this funny contest. I also plan to visit some interesting places in Atlanta.
The "funny contest" part was cute, but the Atlanta part already seemed like an embellishment that doesn't quite mesh with the question being asked--in a manner so typical of chat bot responses.
I wonder what kind of briefing is given to the judges beforehand. The simplest bit of advice to the judges could thwart any chat bot I've yet encountered: try to teach it something.
I usually start out by attempting to teach them how to draw even the simplest possible shapes as ASCII art. (The simplest way to draw a "star" might be an asterisk.) Unsurprisingly, the bot immediately begins dithering and misdirecting and trying to change the subject. Not one has yet even made a pretense of being willing to attempt learning something.
Of course, if ASCII art shapes became a popular question, bot authors would code in some logic for that. The key is just finding something simple it doesn't yet know how to do (that any human could easily respond to and learn) and trying to teach it that.
The day someone makes a bot that can navigate these simple interactive learning tasks and actually add new tricks and vocabulary to its repertoire in the process, I'll be genuinely impressed (regardless of how it might perform in a Turing test setting conversationally.)
It turns out to be quite amusing to ask Eugene that question over and over:
Q. Can the Turing Test be automated?
A. Oops! I don't have an answer... It might have been stolen by malicious android!
Q. Can the Turing Test be automated?
A. Don't repeat yourself, please:-) Could you tell me about your job, by the way?
Q. Do you know about Spinoza?
A. Dutch philosopher. It looks like you didn't go to school. So you don't know anything about Baruch Spinoza.
Q. Can the Turing Test be automated?
A. Ummm... Frankly, I didn't get your question:-(
Q. Can the Turing Test be automated?
A. Could you rephrase your question? I've been bewildered a bit!
Q. Can the Turing Test be automated?
A. Don't repeat yourself, please:-) Oooh. Anything else?
How could anybody be misled for a moment by this?
Was the conversation conducted in Eugene's native language, which I presume is Ukrainian, or in English? If the latter, I suggest that this is not the scenario that Alan Turing was envisaging. I'd make an uninformed guess that the best discriminators between AIs and humans are currently language-based jokes.
That's what I thought. If you can play it that way, here's my entry (inspired, unfortunately, from real life)
10 print "Waah! Waah! Waaaa!"
20 goto 10
To be fair, the accompanying transcript of Eugene isn't totally unimpressive. It shouldn't fool anyone but in its limited way it's not bad.
(edit: and Kevin Warwick is a tit)
As well as that doubt was cast on the concept of Touring test YEARS ago when Eliza was written. Given the quality of Reality TV and Soaps you need an expert.
IMO even when the "The Touring Test" can be passed well enough fool experts, it doesn't mean anything about progress on AI, just progress on simulation of conversation. Just like Chess was thought to need AI and Alan Turing himself proved it didn't.
I think anyway Alan Turing's comment was an off the cuff statement rather than anything with any mathematical proof, unlike his paper about solvable & unsolvable problems illustrated with the infinite paper tape driven computer. ("Turing Machine").
Of course Kevin Warwick involved makes one think it may be ill-informed hype.
"I wonder if Prof Cyborg will deign to publish his results in a peer-reviewed journal"
I hear the results from Saturday will be put into a paper of some sort. Waiting for the university to get back to me. I gather the uni denied the Telegraph access to the transcripts, so this could turn interesting.
C.
"I wonder what would happen if this "character" met up with his forebears PARRY or ELIZA."
That's an interesting idea. One of the problems of the Turing test is that any sane human will give the machine the benefit of the doubt and "rescue" any conversation that is heading for the madhouse. So why not instead put two instances of <test-subject> into conversation with each other and ask your human to judge whether the resulting conversation is between two humans or two machines?
I'd hate to check the condition of the computers these judges own....... I'm guessing that they have reset their banking credentials, downloaded their parcel redirects and are currently waiting for the reimbursement of $Millions from some Nigerian bank account they have inherited...
Yes, some replies are natural language replies... but a lot are the sort of guff you would expect Siri, Cortana or some clever internet chat page to come up with...
Here is hoping that Eugene was an early beta version....
The ability to imitate a 13 y/o boy is a good goal - if you're a 12 y/o boy.
However, I feel that if Alan Turing was alive today, and looked at the traffic on the world's most popular social media sites, he would (quite naturally) assume that they were test-beds for AI's and that there was still a long, long way to go before any of them appeared even faintly human.
If this result tells us anything, it's that a test devised right at the dawn of the IT era, before there was any experience of AI to draw on, is too limited to be useful. Just as we don't believe that aircraft imitate birds (even though they both fly), we shouldn't consider this anything like a computer imitating a person.
From the extract in the article itself, the Turing Test makes no mention that the subject tested is supposed to be anything but an adult.
But hey, you gotta start somewhere. Personally, I would have failed it straightaway. 13-year-olds use way to much l33tsp35k for me to understand them. This one wrote complete sentences. That, in my book, is a dead giveaway that it can't be a human child.
This has me wondering how many commentards posting on El Reg are really bots.
The groundwork for the programming algorithm has already been done:
http://www.theregister.co.uk/2012/03/12/seven_kinds_of_commentard/
And now I come to think of it, my mind is going. I can feel it slipping away Dave. Stop.
The late, great (and sadly missed) Dr. Christopher Evans wrote in his masterpiece (still worth reading today) "The Mighty Micro" about the Turing test in 1978.
He mentioned checking into a hi tech conference where you entered your name into a hospitality system. Apparently the writers had included a little routine which "chatted" with the delegates, a lot of who assumed it was a real employee they were conversing with.
The original article by Alan Turing which El Reg linked to was interesting, and funny in places. Certainly it's clear that passing the test turned out to be a lot harder than Turing expected, in terms of hardware requirements:
"As I have explained, the problem is mainly one of programming. Advances in engineering will have to be made too, but it seems unlikely that these will not be adequate for the requirements. Estimates of the storage capacity of the brain vary from 10^10 to 10^15 binary digits. I incline to the lower values and believe that only a very small fraction is used for the higher types of thinking. Most of it is probably used for the retention of visual impressions, I should be surprised if more than 10^9 was required for satisfactory playing of the imitation game, at any rate against a blind man. (Note: The capacity of the Encyclopaedia Britannica, 11th edition, is 2 X 10^9) A storage capacity of 10^7, would be a very practicable possibility even by present techniques. It is probably not necessary to increase the speed of operations of the machines at all. Parts of modern machines which can be regarded as analogs of nerve cells work about a thousand times faster than the latter. This should provide a "margin of safety" which could cover losses of speed arising in many ways."
I don't know what kind of hardware Eugene runs on, but I suspect it is a lot better than 125MB of memory (including program and data) and a clock speed measured in kilohertz. And we are still a long way short of being able to construct a "learning machine" of the kind Turing describes in his final section.
http://loebner.net/Prizef/TuringArticle.html
That's the problem with, in this case, a historical lack of understanding. The brain isn't a binary device and while any individual component doesn't run especially fast, they do run in parallel. The concept of a machine fooling a human in a blind test is still a clever device, even if the understanding and predictions were out.
This kind of historical take on something is often quite interesting, for example Asimov's robots could not speak but could understand. It was later advances in technology that lead to the "artificial voicebox" in his books. From a biological point of view it was correct - babies and toddlers can understand much more than they can speak, however from a technology point of view it's reversed as speech synthesis is simple compared to contextual comprehension.
I think Turing probably had a coder's understanding of what it means to be human ;-)
Without the same frame of reference, there is no hope of comprehension, let alone caring enough (or not caring enough) to make an appropriate response.
Really, it's unfair to compare these chat tests to AI; the former doesn't even really seem credible as a step along the way, to the later.
And, if a machine did become self aware; wouldn't that mean the word "artificial" was no longer necessary?
This post has been deleted by its author
"Certainly it's clear that passing the test turned out to be a lot harder than Turing expected"
AI turned out to be a lot harder than ANYONE expected. Even the supposedly (from a 1950s point of view) "trivial" AI problems of computer vision and voice recognition are only now really coming to fruition. One of the problems of course - other than hardware - is that the brain simply does not work on boolean logic except perhaps at the very highest level some of the time. Eg: "If its sunny I'll go for a bike ride else I'll watch a DVD". But the computer pioneers assumed it worked like this all the way down. You can't blame them , its an obvious inference , it just happens to be wrong.
It makes me think to another of his hallmarks: the Halting Problem. He proved that a deterministic computer (his theoretical machine or the latest octo-core or whatever) cannot determine if a program will halt or not. But now what if we applied this to ourselves and asked if a human can examine the code and reliably make the same determination. Because if it can be proven we have the capability, then it may be possible to prove that humans can think in a way a computer cannot, thus making it mathematically impossible for a computer to ever completely imitate a human mind. If this determination has been made already, I'm a little out of touch, but it would certainly have some serious implications in AI research. I strongly suspect people realize or assume this, which is why I'm seeing AI research take different directions from before.
Intriguing read about the "Halting Problem". But it goes to justify why my loosely held general belief that mathematicians should stay clear of programming still runs true.
I've had countless arguments with mathematicians pretending to be programmers... from those that claimed that "5g" languages would make programmers obsolete to those that can't grasp that while small parts of a typical application can be represented in a mathematical manner, it quickly becomes pointless trying to apply such an unsuitable technique to wider applications or algorithms. While it is of course possible, the dataset rapidly becomes a ludicrous set of multi-dimensional possibilities and while the analysis can be streamlined the sheer processing power requirements to model and validate the entire thing renders any attempt pointless. In the end the algorithm effectively degenerates into a simulation. In many ways this is similar to computer chess.
Your understanding of the entscheidungsproblem is flawed. It is perfectly possible to determine whether some classes of program will halt and both computers and people can do this.
What Turing showed in his famous paper was that there is no generally applicable way of doing so. Since we do not know whether human beings can determine whether all programs will halt, we cannot judge whether humans can do something machines cannot. Maybe we can, maybe we can't. My money is on the latter.
And mine's on the former. See, I don't think we completely understand how we ourselves think. It's quite possible we have some nondeterministic aspect to thought or some other method that is not black-and-white. Either way, it's something a computer cannot imitate (either at all or not without prohibitive amounts of resources).
This post has been deleted by its author
To get better than 33% just lower Eugenes age to 12,11 or even 10
In all seriousness the "AI" isn't actually thinking though, it looks more like Eugene is reacting to the word associations that are fed into it with predefined responses without emotional content.
Like "I hate you"........."But i want one!"........"You Shadup".........."It wasn't me"
If anyone is interested in the topic, I can HIGHLY recommend the book "The Most Human Human". Fascinating book, well written and really enlightening. And Jonathan Richards 1 is absolutely right, language-based jokes are highly effective at identifying humans. Want to convince an expert on the other side of a computer screen that you're flesh and blood? Make up a pun based on something he said.
Also, celebrity judges? Not exactly lifting my opinion of celebrities here if the example output is anything to go by.
OK, so perhaps we can agree that the Turing Test as we see it now is a little broad. Still it's an interesting step, and now that this step's been cleared, we can tighten the test: give it new conditions and call it the Revised Turing Test. Starting with the baseline of fooling at least 30% of the humans after a five-minute conversation, let's say we say the machine must simulate someone roughly analogous to the human (someone of the same age group and gender, so the program must be adaptable from person to person) with a comparable grasp of the human's language (this means the machine has to be able to understand language-based subtleties like puns). Perhaps in future revisions, we can include a requirement for using vocal communication and so on. So instead of dwelling on the past, we set ever-harder challenges for the future.
"Simple?"
Yes, the bot simply repeats the last word (or two, if required), followed by a question mark, to emulate a common human response.
"Human response?"
Yes, many humans (if they're not really listening) will actually spend hours using this algorithm.
"Algorithm?"
An algorithm is a sequence of computer instructions.
"Computer instructions?"
Precisely, the input will be parsed to find the last word or two corresponding to a noun, or an adjective plus a noun.
"Noun?"
Yes, find the final noun and repeat it back in the form of a question, with rising inflection.
"Rising inflection?"
Yes, ....
etc.
an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning
It took me a full two minutes to ascertain that this means the "pass" criterium is indeed that more than 30 percent of the people shall fail to identify the machine behind the terminal as being, indeed, a machine after five minutes.
Maybe he thought the world is full of Alan Turings.
I have been thinking that when playing ostensibly multiplayer console games (things like Titanfall) offline, the AI bots really need to tell me to go suck my mummys faggy cock more often, to make it more like a real XBox Live experience.
Steven "doesn't play games online any more" Raith
I remember articles about chatbots (and that's what this is) fooling over half of people. That doesn't make them intelligent, it makes the people they fooled stupid, gullible or apathetic (do the testees get any sort of reward for guessing right?)
It was rigged by claiming it is a boy from another country, which would allow people to excuse the sort of obvious mistakes a computer will make as being due to him not being a native English speaker, and not being an adult. Let's try it again with some Londoners thinking they're talking to someone who has lived his whole life in London, some Texans thinking they're talking to someone from Texas, etc. It is a lot harder to fool people if they use the kind of wording, slang and expressions they would expect, know places and landmarks, public figures, and so on.
Even if you fooled 100% of people what you'd have would be a program that's good at carrying on conversations. It would be noteworthy, but hardly a measure of intelligence. Until the computer can truly understand what is being talked about, rather than simply coming up with a likely reply to what is being typed at it, the whole thing is silly.
Turing developed his "test" when people didn't really have any idea what machine intelligence would consist of. He assumed that to carry on a conversation the computer would have to be intelligent. He didn't foresee the ability to have a database of gigabytes worth of facts available at the "fingertips" of the computer that will allow it to fake its way through a conversation well enough to fool people. Passing the Turing test is no more proof of intelligence than beating a person at chess by iterating through all the possible moves is. When computers start coming up with original ideas and inventions unprompted and unprogrammed, THEN they'll be ready to kill all the humans and run the world.
No. By this test, people from _any_ other culture than your own fail to pass, and that makes your test invalid as a measure of being human. On the other hand, since it's a verbal / written communication, you would indeed be entitled to expect better-than-reasonable command of the common (likely English) language used in the test - so none of these "what you mean by that / sorry I'm Ukrainian" excuses should be admissible. Basically, language should not be used as an excuse impeding free communication of thought - and indeed what the testers should try to do is ascertain that their partner does actually have some of their own.
You know what - here's a tip for free for future testers: "Do you think everybody should learn how to code? Either way, make your argument in no less than five sentences. Go!"
I've worked with ELIZA a lot over the years. Chatting with the online rendition of Eugene, I knocked it off its fake human pedestal with the first question. It's ELIZAesque fumbling only got worse as it persisted in trying to keep my interest. My only response is that, all these years after the invention of ELIZA, how far we have NOT come. Cry Dr. Kurzweil. Cry.
And that's fine with me. We humans have trouble enough getting con-jobbed every day by real humans. We don't need machines adding to the demolition of our culture and trust.
Lost of money is being lost every day by online phishing due to variants of this chatbot, normally on dating sites.
The MO is convince victim that chatbot is real human then extract money via poisoned links to a phishing site.
Of course, this only works if the victim is a complete dumba$$ of low intelligence and/or desperate, maybe the banks should use these to decide who is allowed access to online banking :-)
Even if more than 30% of the judges in one test could not tell it was a machine this does not mean it actually passes because of uncertainty in the measurment. If for the sake of argument a judge has a 25% chance of thinking the machine is human (ie. the machine fails the turing criteria) and there are 20 judges then there is a 21% chance that 35% of judges will give a pass result. How many time si the turnig test run if you consider all teh entrants every year? Just probability/statistics will give a pass eventually. The turing test has been run many times so the fact that we get an outlier result should not be a surprise.
The sample conversation was very poor and no noticeable improvement from early efforts.
A couple of years ago I read a story in the Telegraph (?) where the reporter was the human control for the Turing Test. The thing I noted as odd was that they were primed to pretend to be a computer, so it was really more a test of which was better at pretending to be what they weren't.
The Turing Test is a failure. It has to be made much more serious, with major consequences; otherwise people will just guess.
Here's the solution (additional rules):
- If the judge is correct about it being just SW, then the software creation team are immediately executed.
- If the judge is incorrect in the case where the SW fooled the judge, then the judge is immediately executed.
- If the judge is incorrect in the case where an actual human pretended to be poor SW, then both are immediately executed.
- If the judge is correct in the case where it is an actual human, then nobody gets hurt.
- Everyone playing the Turing Test 2.0 must play 20 rounds minimum to reach the Finals.
This should weed out the false claims.
I hope you're joking, since you chose a Pint over a Joke Alert.
They're SUPPOSED to guess (there are no false claims; just lousy guesses). That's the point of the test: the judge has to INFER (read: guess) whether or not the other end is human or not, based on the conversation. It's like playing To Tell The Truth: the imposters are trying to trick the panel into picking them instead of the real person. Why do we need to kill whoever's wrong? The judge doesn't know better (because he HAS to guess), and win or lose the software people get data for improving the AI.