back to article AI brains take a step closer to understanding speech just like humans

Machine learning researchers are on a mission to make machines understand speech directly from audio input, like humans do. At the Neural Information Processing Systems conference this week, researchers from Massachusetts Institute of Technology (MIT) demonstrated a new way to train computers to recognise speech without …

  1. frank ly

    "... participants were paid three cents per recording"

    The end results were not worth the money.

    1. allthecoolshortnamesweretaken

      Re: "... participants were paid three cents per recording"

      Indeed. Makes you wonder how many recordings were made in an, uh, altered state of mind?

    2. Rich 11

      Re: "... participants were paid three cents per recording"

      The examples given are of instances where the matching algorithm failed rather than of text which correctly describes the image.

      1. Youngdog

        @Rich 11

        'There is a red building...in front of a green lawn...mowed recently'

        Hey guys - we're getting somewhere...

        'There's a fence in front of the house'

        DOH!

  2. Anonymous Coward
    Anonymous Coward

    "The goal of this work is to try to get the machine to learn language more like the way humans do," said Glass.

    Is that how people do it? I have aphantasia (my "mind's eye" is blind), and I had no idea visualisation was involved in most people's understanding of speech.

    I just convert straight to text.

    1. Frumious Bandersnatch

      Interesting. I was wondering about this the other day. I was reading about subvocalisation, which is where people mentally form words when reading. When learning foreign languages, this can be a necessary step, but if you get into the habit of hearing each single word in your head as you read it, it means that your reading speed is limited to how fast you can vocalise it (so reading speed = speaking speed, effectively).

      Anyway, that got me thinking about how people who are deaf from birth process written material. I suppose that's a variation on this "antiphasia" you mentioned, though I still wonder can people who were born deaf still have mind's-eye style auditory hallucinations even absent the signals needed to prime it? Is it possible that the brain uses other sense data (such as muscle memory of tongue position, mouth shape and so on, as gained from speech practice) as a proxy for subvocalisation?

      1. Primus Secundus Tertius

        @Frumious

        A good comment about subvoclisation. I do it normally, but when I am proof-reading I try to avoid doing it, as it is then easier to spot the typos, glitches, etc.

        Edit: then I see Scott's comment below.

      2. AndrewV

        @Frumious Bandersnatch

        "Interesting. I was wondering about this the other day. I was reading about subvocalisation, which is where people mentally form words when reading. When learning foreign languages, this can be a necessary step, but if you get into the habit of hearing each single word in your head as you read it, it means that your reading speed is limited to how fast you can vocalise it (so reading speed = speaking speed, effectively)."

        Well that explains my reading speed. Thank you.

    2. Anonymous Coward
      Anonymous Coward

      As I put it, I can't visualize my way out of a wet paper bag. What's strange is that I don't recognize words by the individual letters but as a gestalt (a unique symbol all its own). That comes in handy in that maths of any type are processed as sentences with their own symbol sets. Oh and pictographic languages are simpler. Still, knowing where I put something.is a chained set of vectors. Thanks for the name of the condition. I just thought I was one really weird autistic.

      1. John H Woods Silver badge

        "I just thought I was one really weird autistic" --- Jack of Shadows

        I heard about it through this BBC article, which contains a short test. I am boringly average, of course.

        1. albaleo

          @John Woods

          Thanks for the link. I also got a boring score - sightly on the low side. But I found the questions difficult to answer. And as the test progressed, I started to wonder whether what I was imagining was really an 'image'.

          The article has a quote from one person:

          "When I think about my fiancee there is no image, but I am definitely thinking about her, I know today she has her hair up at the back, she's brunette.But I'm not describing an image I am looking at, I'm remembering features about her, that's the strangest thing and maybe that is a source of some regret."

          If he's not describing an image, what is it he's remembering? I'm now confused, but very fascinated.

          1. AndrewV

            @albaleo

            "If he's not describing an image, what is it he's remembering?"

            I remember a list of features, and the more important the person is to me the more features I can rattle off when asked.

            It came as a shock when at the age of 37 I realised everybody else visualises, and I was weird.

  3. Scott Broukell

    At last!

    Eye hove bin wailing far this fur mere then dirty ears.

  4. Rich 11

    Context is everything

    There are 7,000 languages, and I think less than 2 percent have ASR [automatic speech recognition] capability, and probably nothing is going to be done to address the others.

    Very probably, since about 6000 of those are expected to have no native speakers left by the end of this century. However the 140 languages where ASR work is being done probably covers the primary and secondary languages of 90% of humanity today.

    1. Primus Secundus Tertius

      Re: Context is everything

      Some spelling systems, such as Gaelic, could be mistaken for a practical joke against English people.

      1. allthecoolshortnamesweretaken
        1. Anonymous Coward
          Anonymous Coward

          Re: Context is everything

          But Llanfairpwllgwyngyllgogerychwyrndrobwyll-llantysiliogogogoch is Welsh, which has very regular spelling, unlike English.

          1. John Gamble
            Headmaster

            Re: Context is everything

            Which shows that the alleged virtues of regular spelling have been highly overstated.

  5. Stevie

    Bah!

    Just feed in the lyric sheets from the Yes back-catalog and watch the fun.

  6. John Smith 19 Gold badge
    Go

    Good to know neural nets finally getting some love

    But this has got a long way to go.

  7. Allan George Dyer

    Cultural Differences

    "it may provide a new way to translate speech into other languages"

    Or, more likely, humorous and deadly anecdotes of mis-translation...

    Consider descriptions of a cow being slaughtered in Hindi and Texan.

    "My hovercraft is full of eels"

    1. allthecoolshortnamesweretaken

      Re: Cultural Differences

      "The Babel fish is small, yellow, leech-like, and probably the oddest thing in the Universe. It feeds on brainwave energy received not from its own carrier, but from those around it. It absorbs all unconscious mental frequencies from this brainwave energy to nourish itself with. It then excretes into the mind of its carrier a telepathic matrix formed by combining the conscious thought frequencies with nerve signals picked up from the speech centres of the brain which has supplied them. The practical upshot of all this is that if you stick a Babel fish in your ear you can instantly understand anything said to you in any form of language. The speech patterns you actually hear decode the brainwave matrix which has been fed into your mind by your Babel fish. [...]

      "Meanwhile, the poor Babel fish, by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation."

      -- DNA, H2G2

      1. Ozzy

        Re: Cultural Differences

        What's the use of having a drinks machine that can understand your spoken order if it inevitably provides you with a cup of something which is almost, but not quite entirely, dissimilar to tea?

  8. martinusher Silver badge

    What's with the bug-eyed monster?

    We've had the Echo in the US for some time, enough time to get used to Alexa. Unlike the picture you don't communicate with Alexa by grabbing a handheld and yelling at it, you talk in a conversational tone, albeit in the imperative. She's pretty good at picking out requests even in a noisy environment. Her speech is a bit like the machine's from the movie "Her" -- you don't really think of it as a machine so you find yourself inserting random "please" and "thank you" phrases into the conversation. (You only notice the machine when you get her to read from a book -- the tone's a bit flat, as if she was "on the spectrum".)

    We've now got to the point where most sci-fi movies, even relatively recent ones, look horribly dated. Alexa's more than an interface; she learns and can be uncanny in the way she selects music and the like. She's still a machine, though, with a relatively simple backend so I can't wait to see what gets served up in even the near future as things develop.

    1. Anonymous Coward
      Anonymous Coward

      Re: What's with the bug-eyed monster?

      I can't wait to see what gets served up in even the near future as things develop.

      What response do you currently get to "Alexa, talk dirty to me"?

  9. Anonymous Coward
    Anonymous Coward

    Show it a picture of

    Kate Middleton tickling her cat.

    Output:

    "Kate Middleton is tickling her pussy"

  10. Doctor_Wibble
    Paris Hilton

    cheap or expensive image recognition?

    Is this actually cheap image recognition or are they paying over the odds?

    And it's incomplete, I see a fence and a pavement in the first picture but neither seems to be mentioned. And did 'sidle' go out of fashion or can I just not see the crab that is doing that 'sidewalk' thing that they do?

    The 'tar everywhere' got cleared up, right? That can get expensive when you have cars sat in it, at least none looked like it got on their paintwork.

    +/- that 'divided by a common language' quote.

  11. Anonymous Coward
    Stop

    AI it ain't...

    Stop calling it AI, it's just pattern recognition. There is a vast difference between recognising a word and understanding what that word means.

    It is still clever stuff, but the only intelligence is in the researchers who developed this.

  12. Howard Hanek
    Childcatcher

    Opens Up New Opportunities

    I can now be addressed as the Speaker of the House I suppose and add that accomplishment to my resume.

  13. HKmk23

    A serious comment for once

    Very interested in speech recognition, I do not think we are anywhere near yet. Not sure but I think processing power and memory is the key and until an equal amount to the human brain is available cheaply I do not think it will work.

  14. Garfunkle

    Strongly linked to full general AI

    I think this goal is strongly linked to full general AI...

    It's because of the context sensitivity in our understanding of human language. One word can mean one thing in one context, and another thing in another context. Words vary in meaning, with everything from slight variation to extreme variation, depending upon the context which they're uttered in.

    Because of this, in order for a computer to understand language fully, like a human being, the computer will also have to understand the world fully (or, at least as fully as a normally intelligent human being would understand it today).

    It means that this goal cannot possibly be reached fully until we've also fully simulated the whole general intelligence of human beings, in all it's breadth and in all it's depth...

    1. Craig 2

      Re: Strongly linked to full general AI

      "fully simulated the whole general intelligence of human beings"

      A computer with that comprehension would likely tell any human it came into contact with to go fuck themselves...

  15. Howard Hanek
    Childcatcher

    My Fair Lady

    Let's exhume Prof. Henry Higgins and find out once and for all why can't Englishman learn to speak. Apologies to George Bernard Shaw.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like