back to article Stealthy UK startup drops veil on next frontier of speech wizardry

If you've been amazed by Amazon's Alexa, Microsoft's Cortana and Google Assistant, you might think continuous speech recognition is done and dusted – and that there are no mountains left to climb. However, a young British company has developed a radical new approach with spectacular results, based on low-level signal processing …

  1. Pete 2 Silver badge

    Send it to the colonies

    > an educational tool designed to improve an English* speaker's pronunciation

    When this is on the Google Play store for 99¢ I'll tell some people I know about it.

    Some of them claim to have "English" as their first language!

  2. Jay Lenovo
    Thumb Up

    5 Stars

    Think of it as Guitar Hero for Speech (Singstar, Lips, etc)

    Soon you'll unlock, Martin Luther King's "I have a Dream" or maybe Winston Churchill's "Finest Hour"

  3. Steve Aubrey
    Coat

    Selectively

    "applying that power selectively every a few milliseconds, a humble phone can perform better than a company with a vast investment in server farms"

    Selectively, and for a few milliseconds duration, I can run faster than Usain Bolt. Selections being that he's asleep, and I'm at the top of a large-ish hill.

    Mine's the one with the track shoes still in the pockets.

  4. Anonymous Coward
    Anonymous Coward

    But which native given pronunciation will they reflect as the "correct" one? Groups of native English speakers in the UK will sound quite different from each other - not to mention those from other English-speaking countries.

    1. Irongut

      I wonder what it would make of my Glaswegian accent or my Geordie mate's.

      1. Anonymous Coward
        Anonymous Coward

        Received Pronounciation (RP). It's the version of English commonly heard on World Service radio, as well that commonly encountered by businessmen.

        RP doesn't mean talking like the Queen, or even hiding all traces of your regional accent.

        1. Anonymous Coward
          Anonymous Coward

          "It's the version of English commonly heard on World Service radio, [...]"

          In my insomniac listening to the World Service - the "English" of presenters or reporters is often unintelligible. Radio 4 seems to be more consistent in what could be considered an RP - while allowing regional accents. No doubt Radio 3 would be similar.

        2. Def Silver badge

          Received Pronounciation

          That's probably close to my English accent. My friends always joked I had a BBC accent. I'm not sure where it came from. It's not entirely local to my home town (Portsmouth area), I'm certainly not posh, and I doubt many of my former teachers would claim I'm educated - through no fault of theirs... Well, if they'd made the classes even slightly interesting or challenging, I might have turned up more.

          Rather bizarrely a lot of non-English people I meet think I'm Australian these days, which I really don't understand.

  5. LeahroyNake

    don't tell

    Alexa or Cortana that this can be done locally on the device.. there is £millions/ billions of revenue from abusing this tech !

    Please please please give me the non cloud version of speech to text.

    1. Anonymous Coward
      Anonymous Coward

      Re: don't tell

      If Amazon or Google had the speech to text working locally they'd just send the text instead of the speech to the cloud. They're going to collect your info and advertise to you no matter what, that's their whole purpose for existing.

    2. John Brown (no body) Silver badge

      Re: don't tell

      "Please please please give me the non cloud version of speech to text."

      Me too! I've often wondered why no one tried to do something like this sooner. My theory is there a very few programmers and too many code monkeys season with tight deadlines and no budget. Too often, good and elegant code is forgone in favour of "doing what works" and getting it out the door.

  6. Alister

    Speaking of learning English, in Britain by convention one would lift the veil to reveal something, not drop the veil.

    The phrase comes from the traditional marriage ceremony where the bride's face is veiled until she meets the groom at the alter, at which point the veil is lifted.

    1. Anonymous Coward
      Anonymous Coward

      "[...] until she meets the groom at the alter, [...]"

      Skitt's Law: the typo should be "altar".

  7. Anonymous Coward
    Anonymous Coward

    "[...] in Britain by convention one would lift the veil to reveal something, not drop the veil."

    It is a subtle difference. Lifting the veil refers to showing something previously hidden. Dropping the veil refers more to the exposing of a false position aka dropping the pretence.

    1. Alister

      So in the context of the headline, it should be lifting the veil, then.

  8. LenG

    Song

    It is not just the non-native English speakers who need this - a significant proportion of the population spear with an accent which is close to incomprehensible by another significant portion. The problem is summarized in the song from My Fair Lady "Why Can't the English (teach their children how to speak)" beautifully articulated by Rex Harrison.

  9. Anonymous Coward
    Anonymous Coward

    So can I learn other accents?

    Would be interested to know if I (UK native speaker)could learn other strains of English accents with this...

  10. Anonymous Coward
    Anonymous Coward

    I need this

    I'm told I sound like Antonio Banderas, which my ladyfriend back then found sexy enough to marry me.

    16 years later, she's a bit tired of still trying to "improve" my speech, even though I speak English good...

    And the audio compression 'rhythms in cellphones haven't improved any in 25 years, if anything, maybe worse, even when she calls me with her Iphone (is it because I do Android? either way, can-and-string quality too often) it's guessing 1/3 of what she says, good she's patient, some of the time.

    Read Tomatis to understand why one-size-fits-all compression makes a mess of understanding.

    1. Mage
      Boffin

      Re: one-size-fits-all compression makes a mess of understanding.

      Also lower bit rate DAB, MP3 etc is MUCH more distorted and harder to follow for people with impaired hearing. Deafness is a bit misleading, like Colour Blind people don't see B&W.

      All audio compression is using an average psycho-acoustic model to throw away content. At higher rates (128K MP2, 64K on DAB+) there is too much loss of quality and intelligibility if your hearing is poor. 256K on MP3 is a reasonable minimum level of artefacts. Often speech codecs on phones are compressing too much. 3G has two qualities. One worse than GSM and one better. The 4G has no native speech, so called 4G voice calls either use 3G or VOIP on the data channel.

  11. Mage
    Black Helicopters

    The Cloud?

    They CLAIM all those TVs, phones, search on Desktop, home hubs/speakers need the Cloud for the AI / Machine learning (there is no AI, it's just a database), if that's true it's lazy, insecure, unpaid parasitical crowd sourcing like Google's pick the road sign "captcha". The reason for the Cloud is mostly to spy on you. The speech recognition doesn't seem much different to the built in XP speech to text (once trained), or some old Ford radios or Nokia phones. Google's Android TV and Phone speech recognition actually seems worse than stand-alone stuff 15 years ago. A lot of their Translation is garbage and search is turning into bookmarks and adverts.

    Anyone else used speech and search before Google was famous? Like Dragon and Altavista?

    1. Alister

      Re: The Cloud?

      Anyone else used speech and search before Google was famous? Like Dragon and Altavista?

      I used Microsoft Speech API (v4) to speech enable some software for blind users, back in the late 1990s / early noughties, using both text-to-speech and speech recognition, written in Delphi. I think I used Dragon's voice files for that at one stage but then swapped to Lernout & Hauspie British voices.

      1. DCFusor

        Re: The Cloud?

        I wrote an MFC wrapper for IBMs via voice for a startup back in the day, and if it was at all trained on a particular speaker (it could handle many but it wanted to be told who was talking) - it was super good. In many opinions better than the Dragon stuff, particularly in the case of custom vocabularies - this was used to transcribe doctor's patient notes, so it needed to know medical jargon and a multitude of ways to say any number (one hundred, a hundred, one zero zero, and on and on for more complex ones). It got so if a doctor often coughed in the middle of saying some weird drug name, it'd sill get it right - due to regular human transcriptionists error-checking and telling the speech engine what was really said.

        Adapting how a thing written for unix then to windows then rose some serious eyebrows and won the odd award. It was definitely a complex thunk operation. I've thought about resurecting the codebase, this time just using in linux as I abandoned windows around .net and the VB'ing of visual studio time, as no one was paying me to fix windows anymore - linux ever since.

        It's long been known in the speech recog biz that working for one person (or a few known ones) is a metric ton easier than "all ya'll out there". It is in fact easier to tell who is speaking (biometric fashion) than what they are saying for a limited population.

        This is one reason the big boys use the cloud. The other is of course, the obvious snooping and slurping.

        1. T. F. M. Reader Silver badge

          Re: The Cloud?

          @DCFusor: It's long been known in the speech recog biz that working for one person (or a few known ones) is a metric ton easier than "all ya'll out there".

          But then, as a consumer, that's all I am interested in - recognize what one person (me) or a few (members of the household) are saying, after a bit of training. Give speech-to-text to me on a pocket-size device in flight mode (to ensure nothing I say goes to "the cloud"), and I may consider spending some pounds on the app if I find a compelling enough use case. I'd consider "no cloud" an essential requirement.

  12. Soulhand

    Nice rework of a press release. Did you see the tech demo'd? Try it yourself? Is there any evidence at all that the claims are real?

    I'll believe it when it actually exists and can be used by someone other than a company employee.

    1. Matthew Karas

      We demonstrated the system working live to Andrew at our London office, as part of an interview.

      I don't think I've been mentioned in a press release, since leaving FutureLearn to set up this company with Josh, in 2013, but I might be wrong.

      If you are involved in language learning, feel free to get in touch, and we can show you what we are working on: https://eloqute.com/contact

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon