back to article VALL-E AI can mimic a person’s voice from a three-second snippet

Microsoft researchers are working on a text-to-speech (TTS) model that can mimic a person's voice – complete with emotion and intonation – after a mere three seconds of training. The technology – called VALL-E and outlined in a 15-page research paper released this month on the arXiv research site – is a significant step …

  1. Anonymous Coward
    Anonymous Coward

    there are still problems with the similarity of the voices used

    yes, those problems have long posed a challenge to scientistis, so great MS are FINALLY working to solve this!

  2. Anonymous Coward
    Anonymous Coward

    "This technology could be extremely dangerous in the wrong hands,"

    This is NEVER gonna happen!


    no, I'm NOT gonna add a wink! What d'you mean I will! What are you doin to me, here are you takin me!? No, stop, STO

  3. thosrtanner

    and they've managed to get blockchan into it. we're done for

  4. dwodmots

    It'll be very useful for fan made animated pr0n, but probably little else. :-)

    1. Lil Endian Silver badge

      but probably little else

      How long before the first claim by a politician that "That's not me, that's faked by cyber scoundrels!"?

      Sadly, it could be true. If said scoundrels use GPT-3 to produce a speech for TTS input, it might have enough errors in it to sound like a politician. "ChatGPT, produce a speech on how encryption can be backdoored safely..."

  5. breakfast Silver badge

    So Johnny Scamstain phones me up, I tell him to piss off, he uses that sample to create a copy of my voice and phone my gran saying it's an emergency and I need money quick and the solution our "expert" from MS recommends is the Blockchain?

    The blockchain is not going to help my gran.

    It seems to me that a legislative framework built around these kinds of AI tooling must include strict liability of the providers for the way their tool is used. If Johnny Scamstain has used this to con my gran, Microsoft are liable. It is the only thing I can think of that will persuade them to take it seriously.

    1. Anonymous Coward
      Anonymous Coward

      Nah, it just means they'll add another scoop of lawyers to the mix. Lawsuits are about the only thing Microsoft can handle reliably.

    2. JimboSmith Silver badge

      My bank have my voice on file to validate me over the phone. what could possibly go wrong with this technology apart from my accounts being drained? Somebody could impersonate me and call my boss and tell him what I think of him. Trouble is he already knows.

      1. Lil Endian Silver badge
        Thumb Up

        Good idea! You can get a recorded message from your boss promising that long overdue huge pay rise!

        You make a good point. Many orgs (eg. HMRC & DWP) desire the use of IVR-type systems to detect not only the caller's ID, but also their (eg) honesty. It was never a good idea, but now that's arguably redundant.

      2. Dante Alighieri

        Fixed passphrase

        me too, and they use a fixed passphrase not one of your own choosing, massively simplifying the attack surface

        1. that one in the corner Silver badge

          Re: Fixed passphrase

          "Hi, my name is Werner Brandes. My voice is my passport. Verify Me."

  6. Primus Secundus Tertius Silver badge

    Accents needed

    "...There also isn't enough coverage of speakers with accents..."

    English is a world language with many variants. Every nation gets restless if some different variant is forced upon them. The same applies to other languages: Spanish and Arabic especially, German and French also. And others.

    1. NiceCuppaTea

      Re: Accents needed

      Not only global accent but regional as well. A quick google recons there are at least 40 regional dialects in the UK alone for English language.

      1. Dante Alighieri

        Re: Accents needed

        'aa divnn knaa aboot thaat

        like, kidda.

        (native speaker)

        icon cos,

        (at least it's not the rising inflection, high pitched "ahmm gowing howim" (midlands)!!)

  7. Anonymous Coward
    Anonymous Coward

    Future leadership

    "raises various ethical and legal issues"

    not at all - we have the most political corruption ever on earth, and we have AI that can imitate anyone remotely, good!

    I for one welcome our new robot overlords, humans have done a poor job with the planet.

    Not sure if they will treat us better, but Terminator future is looking better than where we are going now.

  8. that one in the corner Silver badge

    isn't enough coverage of speakers with accents

    Time to point out that there is no such thing as someone speaking without an accent?

    Obviously, they meant there isn't much of a *range* of accents, but now I'm wondering which one is the dominant one in the data set. What is an easy accent to capture a lot of to make a public data set; they mention speaking on 'phones and it'll be one that is easy to record over any background noise...

    It's Essex girls on busses, isn't it.

    1. Piro Silver badge

      Re: isn't enough coverage of speakers with accents

      Yup, it's a perculiarly American thing to say "no accent". it's not possible to speak without an accent.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like