back to article This open text-to-speech model needs just seconds of audio to clone your voice

Palo Alto-based AI startup Zyphra unveiled a pair of open text-to-speech (TTS) models this week said to be capable of cloning your voice with as little as five seconds of sample audio. In our testing, we generated realistic results with less than half a minute of recorded speech. Founded in 2021 by Danny Martinelli and Krithik …

  1. kmorwath

    In Italy such technologies were used to impersonate the Minister of Defence...

    ... and ask rich entrepreneurs to send one million euro to a bank account, citing the need to pay secretly to free kidnapped journalists in Middle East. At least one paid.

    I wonder why so many people are investing resources in technologies that have few - if any - good uses but are a huge help to crooks.

    1. Eclectic Man Silver badge
      Meh

      Re: In Italy such technologies were used to impersonate the Minister of Defence...

      Some fraudsters opened a bank account in my name, using stolen post as 'proof of address'. They then tried to empty one of my pension funds. I found out and contacted the pension fund company, and they called me back on some pretext. Having recorded the voice of the fraudster 'front man' and listened to mine, they concluded that we were two different people and that something naughty was going on. Had the fraudsters obtained enough recordings of my voice to fake it I don't know what would have happened. The issue is whether the AI voice can fool an AI voice recognition application well enough to transfer my pension.

      1. Filippo Silver badge

        Re: In Italy such technologies were used to impersonate the Minister of Defence...

        The idea that a fund transfer can be authorized on the basis of someone recognizing someone's voice is... disturbing. It would be a problem regardless of the existance of this tech.

        1. kmorwath

          Re: In Italy such technologies were used to impersonate the Minister of Defence...

          I think the fraudster called the bank and gave them some proof of identity using data common in countries that do not have ID cards. But they recorded the call and then were able to understand they were different people.

          Anyway even an ID card copy I guess could be easily forged nowadays with AI - it was possibile since Photoshop became powerful enough long time ago, but it sitll rerequires some skills.

          We are now into a full "Zero Trust" age - nothing we see or hear can be regarded as true. Maybe banks will need to sniff at us. But making more and more difficult identifying fake and true can lead to deadly mistakes, too.

          I do not understand if those companues are just made of children who liked to play with matches nearby a a leaking gasoline tank, or have a worst agenda. My guess is this will lead to ask for more and more biometric data "that cannot be faked", so some companies will be able to track us better whereever we go, whatever we do. And then make even more money selling ad placement based on such tracking.

        2. tip pc Silver badge

          Re: In Italy such technologies were used to impersonate the Minister of Defence...

          if only the financial behemoths had local branches where you could talk to a human and important paperwork could be handled securely and in person.

    2. Anonymous Coward
      Anonymous Coward

      Re: In Italy such technologies were used to impersonate the Minister of Defence...

      Fine ... as long as they don't use it to impersonate pizza! ...

    3. Anonymous Coward
      Anonymous Coward

      Re: In Italy such technologies were used to impersonate the Minister of Defence...

      The best use case I know of is voice banking for people who have motor neurone disease. Consider the robotic voice we came to regard as Stephen Hawking’s compared to the recognisable Yorkshire accent Rob Burrow was able to speak with, still a little robotic but much more like his own. Things have come a long way.

      It used to take a lot of work to produce some really patchy results. This new technology produces far better results with so much less effort.

      I agree it’s frightening to see these new technologies abused. But I also see the difference it makes to a person living with MND when they’ve lost their voice to be able to use one more or less exactly their own. It’s empowering. It gives them dignity amidst one of the most undignified experiences anyone could ever have.

      1. Flightmode

        Re: In Italy such technologies were used to impersonate the Minister of Defence...

        After seeing the founder's TED Talk, I registered as a "voice donor" with a company, VocalID, that had as its mission to provide voices for people who have lost theirs, as I thought it sounded as a wonderful idea. At the time I was going through some family issues and a move, so I didn't get the time to sit down to bank my voice with them for over a year after registering. It wasn't that you needed to record everything in one go, but it relied on you being able to record audio in a controlled, quiet setting; and that wasn't my everyday life at the time. When things eventually calmed down, I thought I'd give it another shot and went to their site only to find a message that they'd pivoted from their initial plan of helping people to "research into commercial applications of speech synthesis and artificial intelligence" or something similarly vague. I'm guessing they're into this kind of applications now, so I'm pretty grateful that I never got around to it. (I tried going to their site now, but our office proxies even block access to the site as "suspicious", so that's pretty telling.)

  2. Androgynous Cupboard Silver badge

    Magic

    Because the one thing I love more than anything is hearing my own voice on recordings.

  3. Tron Silver badge

    Cool.

    Do they do software to create lifelike masks, 3D print keys and rob banks too?

    Bonus points if you manage to phone Putin or Xi and declare war, bigly, in Trump's voice.

    1. Wang Cores

      Re: Cool.

      It would be deeply funny if someone just eroded party politics (not democracy) by generating extreme partisan AI candidates that get funding from the parties while ramping the rhetoric to an extreme. Then maybe we can move past this stupid era of party lines.

      1. kmorwath

        Re: Cool.

        You know "Big Brother" and "Emmanuel Goldstein" might not ever existed in "1984", do you? You would just get people voting for these fictional characters and give power to those behind them. You'll just get more extremism, not less.

        1. Wang Cores

          Re: Cool.

          Not arguing that's the most likely consequence, just an idle fantasy of turning tools used to manipulate against the manipulators.

          The romantic ideal of hack the planet, ha.

    2. RAMChYLD Bronze badge

      Re: Cool.

      Now some shock jockeys on a youth radio channel are probably going to do it. I've read of one Canadian troupe managing to prank the president of France, and another of a South American troupe managing to prank one the dictators of Cuba. And they don't even have access to AI voice technology. If those two can fall for it, surely the annoying orange, lacking any working brain cells, would easily fall for it hook, line and sinker.

  4. Ken Moorhouse Silver badge

    My bank...

    ...makes it extremely difficult to get a simple bank balance over the phone. They coerce me into saying a set phrase so that they can identify me by my voice. I never comply with this and I'm then stuck in a long queue to speak to someone. They then - somehow - after answering a few questions, say they've verified me by my voice. I want to be identified by *proper* security questions, not by the latest fad.

    This thread should demonstrate to them that they are playing fast and loose with my money [overdraft]. (I'm not holding my breath). So if someone rings up and hacks my voice and siphons money out of my account, how can I prove that I did not make that call?

    Speechless? Yes, probably the best policy under the circumstances.

    1. heyrick Silver badge

      Re: My bank...

      "say they've verified me by my voice"

      Good grief. Wasn't voice recognition demonstrated to be flawed thirty three years ago in the movie Sneakers? If they could pull something like that off by splicing tape, imagine what modern tech could do, especially given the number of people likely to have sufficient voice samples available in social media posts.

      The basic authentication rule is very simple: Something you have and something you know. If they want to use voice recognition as the "have" then fair enough. But it shouldn't ever be the only test made.

      "This thread should demonstrate to them that they are playing fast and loose with my money"

      Banks do that as a matter of course. The pay-by-bonk limit just keeps going up "because convenience" (so I've deactivated it on my cards). The PIN for the card is a mere four digits. The special prove-I'm-me PIN is only five digits, and rarely asked for. They still seem to think that possession of my phone means it is me, thus giving a possibility for somebody who stole my phone to access the necessary and required bank app [1], set up the passcode forgotten routine, and get a temporary code by text...to that same phone. Clever.

      I think it says a lot that it is more complicated to set up K9 Mail to access a mailbox with the likes of Google [2] than it is to access my money.

      1 - God only knows what happens for people who don't have (or want) a smartphone.

      2 - The "terribly insecure" application specific password stuff.

      1. Anonymous Coward
        Anonymous Coward

        Re: My bank...

        Then again, if they also analyze your breathing pattern (oddly missing from the genAI samples), and simultaneously diagnose you with asthma, COPD, or bronchitis, while you wait, there could be a net positive for everyone involved! ;)

        1. Anna Nymous Bronze badge
          Facepalm

          Re: My bank...

          And rat out on you to you health insurance so that they can jack up your insurance premium, all without telling you the reason why (and get a cut in exchange for providing such a valued service to their "partner"), you mean?

          What? You thought you would be the one receiving your diagnostic results? That's funny!!!

      2. IamAProton

        Re: My bank...

        Bank accounts requiring a smartphone are automatically off my list.

        Unfortunately said list is getting shorter and also the "security by complexity" is getting more popular, which isn't great.

        Hopefully this will cause more accounts to be hacked and security will be reconsidered overall.

      3. Flightmode

        Re: My bank...

        The basic authentication rule is very simple: Something you have and something you know. If they want to use voice recognition as the "have" then fair enough. But it shouldn't ever be the only test made.

        My suggestion is to agree with everyone you speak to regularly - at least parents, children, your spouse and close friends - on a "vocal handshake" you can perform so that you can always verify that the person asking you for money for a new phone / ticket home / medical bills / whatever is the person they say they are. It is absolutely vital that the information used to complete the handshake is not available anywhere online and can't easily be inferred from your profile. Either make it something you had together Before The Internet that you've never discussed in emails or on social media, or agree on a nonsense question with a deliberately wrong or fake answer - "Q: What's the capital of Bulgaria? A: Mike Dinosaur Junior" - or a favourite movie quote that needs to be responded to with an unrelated line from a song from a band you don't normally listen to. Make it a habit to always ask this question when being asked for money or whatever it is and NEVER volunteer the answer before being asked the question - "Can you please help me transfer 1000 moneys to this offshore account? Oh, and before you ask, the capital of Bulgaria is Mike Dinosaur Junior."

        We all love to think that we'd never fall for a scam, but I'm not so sure. Especially not with voice replication technology at this level. Better to have one more safeguard.

    2. Ken Moorhouse Silver badge

      Re: My bank...

      I rang up for a balance of account today. I today found that if you answer their questions as briefly and quickly as possible it's not enough for them to verify your voice. The next tactic is they ask "how're you doing today?" Just reply "ok". They then asked to repeat surname and DOB. Did so as quickly and briefly as possible. Delay whilst their system gives up: "Sorry I can't verify you by your voice. I'm going to have to ask you for numbers from your PIN."

  5. NapTime ForTruth
    Mushroom

    Fortunately...

    ...there is no way anyone would use this with existing media to clone, say, a national leader who has access to top secret information or nuclear weapons.

    Or every national leader who has similar access.

    So that's a relief.

    1. Anonymous Coward
      Anonymous Coward

      Re: Fortunately...

      Not a national leader, but somebody faked audio of London Mayor Sadiq Khan in 2023. The Met said no crime had taken place.

      https://www.bbc.co.uk/news/uk-england-london-67389609

  6. mark l 2 Silver badge

    Well there are a few genuine good uses for this tech as the article point out, I fear its overwhelming going to be used for nefarious purposes more than any genuine useful ones.

    Those scams that try to con people into believing that their loved ones desperately need money, will become way more convincing if all they need is 30 seconds of speech to create a AI voice they can then use to send voice messages to people.

    Jeff Geerling the Youtuber who does a lot of videos of Raspberry Pi's had his voice cloned by a Chinese company and used on one of their own videos without his permission last year. And hes a relatively small Youtuber.

  7. Long John Silver Bronze badge
    Pirate

    Does this do (received) English accents properly?

    Only asking.

  8. Simon Harris

    Things have come a long way since Stepford…

    When the wives had to read long lists of words.

  9. that one in the corner Silver badge

    The data was acquired from the web

    and not purchased from a data broker. Or anyone else, like the rights holders?

    Nah, it must be ok, because as we all know there is no copyright material on the web and if it just happened to be scraping every podcast, every bit of online radio playable in the browser, one or two million YouTube videos...

    1. heyrick Silver badge
      Pirate

      Re: The data was acquired from the web

      That's the way it is now. If it's online in any form then it's fair picking.

      Icon, because fair's fair, right?

  10. Michael Hoffmann Silver badge

    Does listening to it sound weird to yourself, like listening to your actual voice on a recording, or come across as someone else's?

    1. veti Silver badge

      For me at least, AI audio is in a weird kind of Uncanny Valley space right now. It sounds good at first, but the longer it goes on the more my flesh starts to creep.

  11. Mage Silver badge
    Black Helicopters

    I read the book

    Program for a Puppet

    by Roland Perry.

    Originally published 1979.

    Mine is the 1980 edition with a cruise missile on the cover.

    I think part of if was faking voices.

    Cheetah: The world's fastest and most powerful supercomputer. Lasercomp's plan is to install its own pawn as President of the United States.

  12. Rustbucket

    Difference

    I thought that the AI generated speech samples had better diction.

    However this is another warning to alert your loved ones to the dangers and not trust even family if they're suddenly asking for money or access.

    Exchanging prearranged code words for such occasions as children traveling overseas is a good idea.

  13. Sorry that handle is already taken. Silver badge
    WTF?

    Zyphra's Zypher Zamba Zonos

    Zounds!

  14. Anna Nymous Bronze badge
    Unhappy

    Current crop of AI only has only one purpose

    The current crop of AI has only a single purpose: deception.

    GenAI's explicit purpose is to deceive you, to make you think a human was involved when none was.

    And deception is not exactly the behavior of honest folk.

    The argument for "scale" ("but we can do so much more with AI, that's why we use, not to deceive you") falls flat on its face. If that were genuine, then they'd make it abundantly clear that the thing you're interacting with is a machine. But they don't, they hide that, and work very hard at hidden this from you. This is exactly what this article is about: trying to be the best at impersonating someone else to a believable level.

    Machines should identify themselves as machines, so that those forced to interact with them know exactly how much the other side of the line values them. "Your call is very important to us"??? Bah, lies!

  15. stiine Silver badge
    Facepalm

    signalized?

    Did I miss the OED announcement about this new word?

    1. doublelayer Silver badge

      Re: signalized?

      It's a Jules Verne book and likely an earlier translation of them. Modern translations of those are pretty good, but I wouldn't be surprised if that paragraph came off Gutenberg. Gutenberg has the original 1860s and 1870s English translations, and almost all of those are terrible translations by inept translators who mangled scientific details even when they were trying. Some of them also decided to change names and locations because who cares about accuracy. I am perfectly able to assume that one of them would have invented some words, possibly due to a misunderstanding of French or English, only one of which they understood too well.

  16. Winkypop Silver badge
    Thumb Up

    Great

    I’m going to teach it some Spike Milligan.

    Eccles will live again!

    1. DJV Silver badge

      Re: Great

      At last, a valid use for this!

    2. that one in the corner Silver badge

      Re: Great

      "'Ere, ooo stuck dat needle in my nut?"

  17. Howard Sway Silver badge

    I hope nobody

    trains it on the voices of Zuckerberg, Musk, Altman and other AI hypesters, and puts together a recording of a meeting where they all supposedly got together and plotted to use AI to destroy democracy and take over the world. I mean, that would be really bad. Especially if it leaked out.

    1. Anna Nymous Bronze badge
      Terminator

      Re: I hope nobody

      Even the dirtiest rag journos only publish stuff that is at least new or unexpected to their "readers". Who'd publish this?

    2. Anonymous Coward
      Anonymous Coward

      Re: I hope nobody

      Why not just use the actual recording of the meeting they made because they can't trust one another?

  18. conwaytwt

    To my ears, the second sample got the vocal pitch mostly correct, while the third didn't. But the third one eliminated the "straining to defecate" pauses audible in the second.

  19. druck Silver badge

    They Were So Preoccupied With Whether Or Not They Could, They Didn’t Stop To Think If They Should

    Jeff Goldblum, Jurassic Park.

  20. Paul Hovnanian Silver badge

    Seconds of audio

    Anyone want to hear my Gilbert Gottfried impersonation?

  21. KaptainKhaos

    back to in person banking !

    this kind of tech could kill things like online or telephone banking .

    All those branch closures might have to be reversed, because we'll have to go back to doing things in person !

    and in I guess now the T800 has it's voice copying part done. Now just need the cybernetics, rest of the AI and long life battery . Okay not going to happen , the battery will go flat too quick !

  22. tip pc Silver badge
    Big Brother

    Don't talk to phone scammers

    When i get a call from an unknown number i wait till they speak before i speak.

    i might start putting on a phone voice so if they clone my voice others i know hopefully won't be duped.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like