I preferred his work on Blackadder...
Brit neural net pioneer just revolutionised speech recognition all over again
One of the pioneers of making what's called "machine learning" work in the real world is on the comeback trail. At Cambridge University's Computer Science department in the 1990s, Dr Tony Robinson taught a generation of students who subsequently turned the Fens into the world speech recognition centre. (Microsoft, Amazon and …
COMMENTS
-
-
This post has been deleted by its author
-
Monday 17th July 2017 08:28 GMT Mage
Good
Basically Amazon, Apple, Google, Microsoft are using 10 to 20 year old technology (envisaged 30 years ago) made accessible via gadgets.
It's great to read about some progress.
There is more work to do in terms of parsing of phrases and context so as to not just have fairly dumb (but speaker independent) Speech to text on an existing search engine. However that starts to move into the edge of real AI.
Compare using Google translate (essentially Rosetta stone + brute force), to your OWN language when it's a subject you are familiar with compared to trying to explain something to another language user (especially NOT English) who doesn't understand the subject. You'll realise that current speech based real time translation (needs voice recognition and then text translate) is mostly hype.
There is decent speech synthesis, but 2010 Kindle DXG is barely better than 1980s, and decent speech synthesis, like recognition, ultimately needs phrase / sentence parsing though for a different reason (intonation and timing which is not in written dialogue or narration and lead pipe vs lead on a dog, or polish wax vs Polish person etc). Spoken languages are not identical to written, certainly in English, where even written dialogue is nothing like real speech.
Icon of someone trying to understand.
-
Monday 17th July 2017 08:44 GMT David Roberts
Sounds similar to the way we work.
Context and hidden cues are important.
One fine example is Peter Kay who does part of his act telling you an alternative lyric to a song then lip syncing to the track. You hear the alternative lyric because your mind and eyes have been given misleading additional information.
Which leads me to conclude that a fine test for this kind of software would be the accurate transcription of pop song lyrics.
-
Monday 17th July 2017 09:22 GMT John Smith 19
All human languages are undersandable by neural networks, as that's what humans use.
Logical when you think about it, but something a lot of researchers have forgotten.
I think the clever parts are a)Leveraging phonemes (IE the building blocks of all words in all languages) in deep learning and b)Making it run on a phones processor (I suspect a lot of chewing on existing language samples).
Now the bad news.
1) Connected speaker independent voice recognition in near real time is very interesting to groups who want to spy on large numbers of people simultaneously. as a UK national Dr Robinson can be "deputized"
2) Experience with spinvox is concerning. It sounded too good to be true. It was.
-
-
Monday 17th July 2017 16:15 GMT John Smith 19
It's terminology inspired by biology.
Well it's also architecture inspired by biology.
It's true single layer NN were very limited, allowing Marvin Minsky to write a book killing funding for them for decades, but the fact remains you're reading this entirely through multiple layers of cellular processing.
It's the only architecture we know can produce behaviour that humans class as "intelligent" over the full spectrum of what humans think of as "intelligent."
-
-
-
Monday 17th July 2017 12:26 GMT Craig 2
"Icelandic has only about 400,000 speakers, and they're worried that their language will die out. "
I sympathize with small nations losing their lingual identity (Except Wales*) but wouldn't it be great if everyone in the world spoke the same language? A naive thought I know, it seems difficult in my house just to get teenagers to speak the same language..
*Just printing all those dual-language road signs must cost, and are they REALLY needed?
-
Monday 17th July 2017 15:40 GMT Hollerithevo
It's the complexity we need
We've lost hundreds of thousands of languages and can't therefore measure the ways of thinking they represent. I remember learning that Gaelic didn't have words for the same colours as English, they had ones for blue-greens and grey-blues that we don't have. Even English has lost many precise words -- a good half-dozen simply for parts of a wheel. As we simplify and homogenise we end up with 'stuff' and 'thing' and other generalisations that lessen exactitude and the joy of language.
So I don't want one world language, just as I don't want one form of music and one kind of car.
-
Monday 17th July 2017 16:32 GMT Peter X
Re: It's the complexity we need
<quote>I remember learning that Gaelic didn't have words for the same colours as English, they had ones for blue-greens and grey-blues that we don't have.</quote>
I seem to recall seeing a programme on TV in the last... year or three... about somewhere foreign* (even more exotic that Scotland), where they also had names for colours that we* would consider mere shades. To their way of thinking, those colours were utterly distinct. The opposite was also true, so (I can't remember the colours in question) there was this funny thing where they'd ask them to spot the difference between one colour and another, and they honestly struggled.
So it's interesting how language affects how individuals perceive the world. It's also probably a reason why I *should* learn at least one other language... I won't though! ;-)
* For context, I'm from England, don't speak anything but English, and anywhere outside the British Isles *is* both foreign, and probably exotic to my mind! :D
-
Monday 17th July 2017 20:51 GMT John Brown (no body)
Re: It's the complexity we need
"I seem to recall seeing a programme on TV in the last... year or three... about somewhere foreign* (even more exotic that Scotland), where they also had names for colours that we* would consider mere shades. To their way of thinking, those colours were utterly distinct."
I have that same vague memory. Aussie Aboriginals, or maybe a tribe somewhere in Africa. Many words for different colours of blue but none for green, or something like that.
-
-
Monday 17th July 2017 19:36 GMT Anonymous Coward
Re: It's the complexity we need
It's quite scary just how quickly words are being lost due to homogenisation. There was an article on the BBC News website a few weeks ago and even something as simple as 'splinter' had 10 different words regionally throughout the UK in the 1950s and now we're down to just two with 'splinter' in most of the UK and 'spelk' hanging on in the North East of England.
http://www.bbc.co.uk/news/uk-england-cambridgeshire-36388364
To paraphrase: All those words will be lost in time, like tears in rain. It's been happening since language first evolved and will continue to happen, it's a natural thing but at least we have the technology and knowledge to archive as much as possible now.
-
-
-
Monday 17th July 2017 13:57 GMT Count Ludwig
Re: The new bio metric?
But that doesn't matter because... repeat after me...
A voice-print is a username, not a password.
A voice-print is a username, not a password.
A voice-print is a username, not a password.
...
and no-one would be silly enough to use it to authenticate anything, would they?
-
-
Monday 17th July 2017 14:51 GMT Anonymous Coward
Markov models
FTA:
"Hidden Markov models have this really weird assumption in them that all of that history didn't matter."
Umm, markov models work based on the probabilty of B (or C or D) following A. Using the history of the particular data in question to work out a probability tree is exactly how they function. Or am I missing something here?
-
Monday 17th July 2017 16:47 GMT The Stormcrow
Re: Markov models
Hidden Markov models assume the system being modeled is a Markov process - i.e., it is stateless.
They use collected data to figure out how likely you are to go from A to B, C, or D but ignore the path you took to reach A. The probability of transitioning from A to B is the same regardless of whether you transitioned to A from C or from D. That is the "history" being referenced.
-
Tuesday 18th July 2017 08:33 GMT Anonymous Coward
Re: Markov models
"They use collected data to figure out how likely you are to go from A to B, C, or D but ignore the path you took to reach A"
There's nothing preventing someone writing a markov model algorithm that takes account of more than one hop: eg probability of going from A to D via B or C. Sure, the tree would start to expand exponentially but it is possible and you wouldn't need to store that many hops to approach decent prediction.
-
Tuesday 18th July 2017 12:49 GMT ivan_llaisdy
Re: Markov models
"There's nothing preventing someone writing a markov model algorithm that takes account of more than one hop" People do do this, but such a model is no longer a Markov model. It wouldn't be a model of a Markov process. In a Markov process, future states of the process depend only on the current state.
Models that take account of more than one hop would be more general graphical models like Bayesian networks, or neural networks.
-
-
-
-
Monday 17th July 2017 20:24 GMT TheElder
Might work with some US Americans...
I very much doubt it can work with me (Kanuck). I happen to speak Danish since I was born as well as some German, Swedish, Norsk, French, and a bit of Ruski as well as a small amount of Tagalog and some others. Not bad with tonal languages since I have true perfect pitch. When I go into the European languages I immediately go into a pan European dialect, depending on which language(s) I dream in. I can also drop my voice to Basso Profondo when I like to do some sing along.
Also, Danish has 52 phonemes such as the swallow your tongue Glottal Stops. I can also roll my "R's" much the same as Rammstein industrial metal, one of my favorite groups.
I wonder what the speech detector would would make of a sentence like this:
Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo.
-
Tuesday 18th July 2017 10:58 GMT MT Field
Tremendous
But also disconcerting. It's taken years and much evolution of technology to make speech recognition with neural nets usable in a commercial sense.
I get the feeling that this is just the start of a very big thing. Perhaps equivalent to the first steam engines of the industrial revolution. But will it really take two centuries to become ubiquitous?