there are still problems with the similarity of the voices used
yes, those problems have long posed a challenge to scientistis, so great MS are FINALLY working to solve this!
Microsoft researchers are working on a text-to-speech (TTS) model that can mimic a person's voice – complete with emotion and intonation – after a mere three seconds of training. The technology – called VALL-E and outlined in a 15-page research paper released this month on the arXiv research site – is a significant step …
but probably little else
How long before the first claim by a politician that "That's not me, that's faked by cyber scoundrels!"?
Sadly, it could be true. If said scoundrels use GPT-3 to produce a speech for TTS input, it might have enough errors in it to sound like a politician. "ChatGPT, produce a speech on how encryption can be backdoored safely..."
So Johnny Scamstain phones me up, I tell him to piss off, he uses that sample to create a copy of my voice and phone my gran saying it's an emergency and I need money quick and the solution our "expert" from MS recommends is the Blockchain?
The blockchain is not going to help my gran.
It seems to me that a legislative framework built around these kinds of AI tooling must include strict liability of the providers for the way their tool is used. If Johnny Scamstain has used this to con my gran, Microsoft are liable. It is the only thing I can think of that will persuade them to take it seriously.
Good idea! You can get a recorded message from your boss promising that long overdue huge pay rise!
You make a good point. Many orgs (eg. HMRC & DWP) desire the use of IVR-type systems to detect not only the caller's ID, but also their (eg) honesty. It was never a good idea, but now that's arguably redundant.
"...There also isn't enough coverage of speakers with accents..."
English is a world language with many variants. Every nation gets restless if some different variant is forced upon them. The same applies to other languages: Spanish and Arabic especially, German and French also. And others.
"raises various ethical and legal issues"
not at all - we have the most political corruption ever on earth, and we have AI that can imitate anyone remotely, good!
I for one welcome our new robot overlords, humans have done a poor job with the planet.
Not sure if they will treat us better, but Terminator future is looking better than where we are going now.
Time to point out that there is no such thing as someone speaking without an accent?
Obviously, they meant there isn't much of a *range* of accents, but now I'm wondering which one is the dominant one in the data set. What is an easy accent to capture a lot of to make a public data set; they mention speaking on 'phones and it'll be one that is easy to record over any background noise...
It's Essex girls on busses, isn't it.