back to article Look ma, no hands! The machines are speaking our language

From Hal, the malfunctioning computer in Stanley Kubrick’s ground-breaking film 2001: A Space Odyssey, through to last year’s Spike Jonze film Her, in which a man falls in love with a computer voiced by Scarlett Johansson, film makers and sci-fi writers have long been imagining a future where speech recognition is commonplace …

  1. Trevor_Pott Gold badge

    Ah, Dragon. Some times, you get an idea. Dictate into the recorder on your smartphone, then feed the wav file into Dragon.

    *poof* An article appears. Love it.

  2. Anonymous Coward
    Anonymous Coward

    Not even close...

    "1968 – 2001: A Space Odyssey released, introducing the world to the idea of talking to computers"

    First Hollywood blockbuster, perhaps. The idea of talking to computers goes back much further.

    Just off the top of my head: "The Last Question" (Asimov 1956 short story, and the one he was asked about the most), and "Second Foundation/And now you don't" (Also Asimov, magazine (1949) and book (1956)). Admittedly the latter is a voice to print device, but it briefly discusses the issues with voice recognition.

    I'm sure there were others, and not just Asimov - Star Trek in 1966, perhaps?

    1. Jungleland

      Re: Not even close...

      WARNING WARNING Will Robinson!

      I'm sure the original Lost in Space series was earlier than 1968.

      1. Anonymous Coward
        Anonymous Coward

        Will Robinson do what?

        Have an upvote! Lost in Space was 1965.

        Better yet, the robot was inspired by Robbie the Robot from Forbidden Planet, which was in 1956.

        1. Somerset John

          Re: Will Robinson do what?

          Don't know about inspired, both robots were designed by the same man, Robert Kinoshita. Robbie appeared in a couple of LIS episodes, as well as a number of films after FP.

  3. Whitter

    Very positive slant

    Which is all well and good, 'cos the tech is impressive.

    But when it comes to helping dyslexics and so on: salesman says yes; answer: maybe yes, maybe no. A dyslexic has a tougher time writing, not an impossible one. And tougher needs more practise, which this would try to remove. Add to that the loss of ability to scan and pick up on the auto errors (trusting the tech to get it right more often than a dyslexic would) and it may not always be the thing to do.

    Not that it isn't a help; it just might need a little more thought in some circumstances.

    1. John Brown (no body) Silver badge

      Re: Very positive slant

      "Add to that the loss of ability to scan and pick up on the auto errors (trusting the tech to get it right more often than a dyslexic would) and it may not always be the thing to do."

      A dyslexic could always proof read it by feeding the text back through a text-to-speech engine.

  4. Anonymous Coward
    Anonymous Coward

    Keystroke loggers on steroids?

    The fear must exist that the traditional online data slurpers (MiApGle) get juicy amounts of info about you when using their speech recognition systems. I know that Android's system is tied to downloading the Google Search app. Not sure about this, mind you. but that's the style of these companies.

  5. Anonymous Coward
    Anonymous Coward

    Lernout & Hauspie

    I'm surprised at the little mention that Lernout & Hauspie got, given that their names are deep down in the foundations of the Nuance SW. And I mean that literally, as there is LH_ stuff everywhere. I know many people laughed at them but their biggest mistake was in their accounting, not in the tech.

    And not all that got acquired by Nuance was used, (s)crap code is donated to open source, mostly to keep them off (and send them down the wrong track).

  6. Primus Secundus Tertius

    Minutes of meetings

    I, so often the 'acting minutes secretary', would like to see a system that could listen to a meeting with one or more microphones, and five minutes after the meeting ends the system produces a coherent set of minutes.

    As an easier task, it could proofread documents.

    Some major newspapers seem to rely on their reporters dictating to robots and printing the words without sub-editing. The results are dire: homophones, bad sentences... I do not accept the premise of this article that voice recognition is looking rosy.

    I remember the BBC trying out, on BBC2, a voice recognition commentary at the wedding of Charles and Diana in 1981. Very disappointing. Since then there have been complaints by organisations for the deaf at the poor quality of TV subtitles, and the time delays they exhibit.

    1. Anonymous Coward
      Anonymous Coward

      Re: Minutes of meetings

      > 'acting minutes secretary'

      Mmm, serious AI required for this. An example of valid minuting may be simply "Vigorous discussion ensued, the outcome being an agreement to declare Pimlico independent." A transcript of an hour of various people talking, often over each other, do not minutes make.

      I have recently used Android speech recognition to fairly good effect when a group of elderly people in our community were supposed to bring in some reminiscences to a meeting. Three brought word processor files which we uploaded to an Xwiki. The fourth brought a hand written note, which I simply read into my Cyanogenmod Samsung. It had trouble with local Scottish place and people names but was generally good enough for a one-pass proof read.

    2. Michael Wojcik Silver badge

      Re: Minutes of meetings

      I, so often the 'acting minutes secretary', would like to see a system that could listen to a meeting with one or more microphones, and five minutes after the meeting ends the system produces a coherent set of minutes.

      Conversational entailment (figuring out whom someone's responding to), plus summarizing with a really large and site-specific knowledge domain. Simples! We should have it working in another fifty years or so.

      Some researchers have been making good progress on the entailment front, at least. And general summarization is a largely-solved problem (for English; I don't think the necessary databanks are available for most other languages). Unfortunately, domain-specific summarization is a lot harder.

  7. Anonymous Coward
    Anonymous Coward

    voice recognition and speech recognition are different

    Article spoilt by sloppy confusion of

    "voice recognition" - identifying the speaker - with "speech recognition" - recognizing the words that are spoken.

  8. Teiwaz

    Is there a non-proprietary 'app' for this?

    Is Dragon the only viable option for Voice & Speech Recognition?

    Seems to me this is a key technology for the future, forget your touchpads, mice, keyboards (although all have their uses, and 'best tool for the job' moments), but speech recognition is more important as it is our natural method of comunication.

    Siri, Cortana, (whatever Google call theirs), are all great as another way of mining us for our information, but for personal control, a minimal workable package is needed for local processing of voice recognition and speech input.

    There hasn't been a lot on handwriting recognition recently, but judging from the handwriting of younger people recently, I guess they don't do a lot of writing at school anymore. (first photocopy handouts, then e-mailed text at schools in the last twenty years I guess, photocopies were expensive when I was at school, cheaper to get everybody to copy it down, aids memory rentention too).

    1. Primus Secundus Tertius

      Re: Is there a non-proprietary 'app' for this?

      Photocopy costs.

      When I was at Uni, 1965+or-, 6d per A4 sheet. 2.5p in modern coins. Today, ca 12p per sheet in small numbers. That is price growth at ca 3.25% compounded.

      1. John Brown (no body) Silver badge

        Re: Is there a non-proprietary 'app' for this?

        I think you may be disregarding inflation. My first job in 1982 paid £1.20 per hour and that was a "living wage" back then.

  9. Christian Berger

    There's a dark side to it

    One important thing you need for voice recognition is a large data set to do statistics on. The more data you have the better your recognition will be. Therefore cloud based services will tend to have better recognition rates, and the ones that do record and archive every one of your utterances for eternity will even have better rates.

    1. Michael Wojcik Silver badge

      Re: There's a dark side to it

      Yes, that's why Google had their free voice-search 411 (telephone directory services) service for a few years. They admitted publicly that it was offered simply so they could harvest speech input and automatically confirm recognition - if the user used the results returned by the search, Google could assume they'd recognized the query successfully.

      Speech input lowers the cost of use (for users who don't find it annoying), which encourages use, which lets the provider harvest more data, which improves recognition, which lowers the cost of use (because greater accuracy means the search is more likely to be successful on the first try). It's a virtuous cycle, for very particular meanings of "virtuous".

  10. Terry Cloth

    Killer app

    Some years ago I tried Dragon Speak for conversational speech recognition for my increasingly-deaf father. It couldn't hack it (or I didn't know what I was doing).

    In a continually-aging society, a small handheld capable of displaying speech of one person while ignoring that of another would be a Godsend to many otherwise-isolated people.

    Extra credit for handling conversations of more than two speakers; the Nobel for handling background noise in a loud restaurant.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2021