back to article VALL-E AI can mimic a person’s voice from a three-second snippet

Microsoft researchers are working on a text-to-speech (TTS) model that can mimic a person's voice – complete with emotion and intonation – after a mere three seconds of training. The technology – called VALL-E and outlined in a 15-page research paper released this month on the arXiv research site – is a significant step …

  1. Anonymous Coward
    Anonymous Coward

    there are still problems with the similarity of the voices used

    yes, those problems have long posed a challenge to scientistis, so great MS are FINALLY working to solve this!

    www.youtube.com/watch?v=MT_u9Rurrqg

  2. Anonymous Coward
    Anonymous Coward

    "This technology could be extremely dangerous in the wrong hands,"

    This is NEVER gonna happen!

    ...

    no, I'm NOT gonna add a wink! What d'you mean I will! What are you doin to me, here are you takin me!? No, stop, STO

  3. thosrtanner

    and they've managed to get blockchan into it. we're done for

  4. dwodmots

    It'll be very useful for fan made animated pr0n, but probably little else. :-)

    1. This post has been deleted by its author

  5. breakfast Silver badge
    Facepalm

    So Johnny Scamstain phones me up, I tell him to piss off, he uses that sample to create a copy of my voice and phone my gran saying it's an emergency and I need money quick and the solution our "expert" from MS recommends is the Blockchain?

    The blockchain is not going to help my gran.

    It seems to me that a legislative framework built around these kinds of AI tooling must include strict liability of the providers for the way their tool is used. If Johnny Scamstain has used this to con my gran, Microsoft are liable. It is the only thing I can think of that will persuade them to take it seriously.

    1. Anonymous Coward
      Anonymous Coward

      Nah, it just means they'll add another scoop of lawyers to the mix. Lawsuits are about the only thing Microsoft can handle reliably.

    2. JimboSmith

      My bank have my voice on file to validate me over the phone. what could possibly go wrong with this technology apart from my accounts being drained? Somebody could impersonate me and call my boss and tell him what I think of him. Trouble is he already knows.

      1. This post has been deleted by its author

      2. Dante Alighieri
        Facepalm

        Fixed passphrase

        me too, and they use a fixed passphrase not one of your own choosing, massively simplifying the attack surface

        1. that one in the corner Silver badge

          Re: Fixed passphrase

          "Hi, my name is Werner Brandes. My voice is my passport. Verify Me."

  6. Primus Secundus Tertius

    Accents needed

    "...There also isn't enough coverage of speakers with accents..."

    English is a world language with many variants. Every nation gets restless if some different variant is forced upon them. The same applies to other languages: Spanish and Arabic especially, German and French also. And others.

    1. NiceCuppaTea

      Re: Accents needed

      Not only global accent but regional as well. A quick google recons there are at least 40 regional dialects in the UK alone for English language.

      1. Dante Alighieri
        Megaphone

        Re: Accents needed

        'aa divnn knaa aboot thaat

        like, kidda.

        (native speaker)

        icon cos,

        (at least it's not the rising inflection, high pitched "ahmm gowing howim" (midlands)!!)

  7. Anonymous Coward
    Anonymous Coward

    Future leadership

    "raises various ethical and legal issues"

    not at all - we have the most political corruption ever on earth, and we have AI that can imitate anyone remotely, good!

    I for one welcome our new robot overlords, humans have done a poor job with the planet.

    Not sure if they will treat us better, but Terminator future is looking better than where we are going now.

  8. that one in the corner Silver badge

    isn't enough coverage of speakers with accents

    Time to point out that there is no such thing as someone speaking without an accent?

    Obviously, they meant there isn't much of a *range* of accents, but now I'm wondering which one is the dominant one in the data set. What is an easy accent to capture a lot of to make a public data set; they mention speaking on 'phones and it'll be one that is easy to record over any background noise...

    It's Essex girls on busses, isn't it.

    1. Piro

      Re: isn't enough coverage of speakers with accents

      Yup, it's a perculiarly American thing to say "no accent". it's not possible to speak without an accent.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like