@AC
Its been done already.
Microsoft has pushed its Custom Neural Voice service to general availability, although you'll have to ask the company nicely if you want to use the vaguely unsettling text-to-speech service. Unsettling, because unlike the usual text to speech we've come to know and love over the years, which require a substantial amount of …
This post has been deleted by its author
This post has been deleted by its author
The magician (and technology first adopter, friend of Silicon Valley types like Gates and Jobs) Penn Juliette was presented with a magic trick where the contestant had trained an artificial voice on hundreds of hours of Penn Juliette speaking (from TV shows and podcasts). After performing the trick, the contestant told Penn that for erhical reasons he would delete the artificial voice - unless Penn would like a copy for himself.
Penn pointed out that he was the only person on Earth who had zero conceivable use for an artificial voice that sounded like Penn Juliette.
There is an enormous number of books out there in electronic text format and often quite expensive, if available in audio.
Personally, I use them on long drives, but there are a lot of visually impaired people out there who could benefit from improved "auto voicing" of text material. Some of the existing stuff is pretty good, but can not only get tedious when listening to longer texts such as novels, but can be quite disconcerting when they come across strange words, names or places that don't exist in dictionaries, eg SF and Fantasy.
I did once convert a series of 8 novels for a blind friend, manually finding all the highlighted "I don't know this word" markers and changing them by spelling and phonetics to pronounce them more realistically, adding them to the custom dictionary so at least I only had to do each new word once. It was ok, but a long chore and not ideal for long listening sessions. Things have moved on since then, but this sounds like a at least a small, possibly a large leap forward.
Blind person here. It isn't that great a step forward for quality, at least it doesn't sound like it from the examples. I don't have any reason to believe this is better for pronunciation. Also, you can get used to anything if you use it all the time. My typical screenreading voice is very robotic. I chose it purely for pronunciation accuracy and fast reading speed.
The main reason this doesn't help is that the neural speech synthesis is only available as a cloud service. I can't install it for local speech, which means it's out for most of my reading. It only works if I want to send some text in, get an audio file back, and listen to that later. Given that I already have relatively high-quality speech software which does run locally, and I'm used to poor-quality speech which modern software far exceeds, I'm unlikely to run up a cloud account for this. For the many others with a visual disability, that won't even be an option if they want to because it requires use of the API, which is going to confuse most nontechnical people.
Uhm errr yes.
Actually you could use it off the cloud.
The issue is training the model and then using it.
If you built out a device with a decent enough GPU or similar (Nvidia has some options)
You could get it to work locally.
The issue would be that you would routinely have to connect to the cloud to update as their models improve over time.
The problem is that its the 'cloud' companies that make money from this. And also use your use of their model to help improve it.
I'll be the first to admit I haven't read very much about this, but it doesn't look like that's an option. The pricing pages include the price for creating a model, storage of a model and running said model. I don't see anything about downloading the model, let alone downloading the engine that uses the model. All prices there are about sending text to the cloud, where they are converted to audio using the previously-trained model. If people want to run it locally, they'll need the synthesis software along with their created model. If that's actually available, I haven't found anything about it. I think the original contention about cloud-only may be correct.
... which will shortly be fronting up all manner of businesses that are hostile, careless and incompetent.
The technology is quite impressive (though did they really give human Zoe such downbeat material to read?), but I fear it would be more honest, and indeed more cost effective, for most "Customer Service" operations simply to have their CEO record the message "You're not getting your money back" directly to their premium rate answerphone.
So basically you can take a montage of various actors and create a face and body that is unique but based on several people.
And you can create a natural sounding human voice which is also 100% artificial.
Now you can replace actors / actresses with completely computer generated scenes or overlay onto some stock footage.
Imagine the next X-men movie where its not a cartoon, but none of it is real.
The next stop after that would be to replace the infinite room of monkeys with an AI to recycle old movie classics.
Being the smart person I am, I'm going to start working on the AI talent agency. Real Humans need not apply.
That at least one of the Just Eat adverts has a VoiceOver that's computer generated. It sounds... creepy; wrong somehow.
Of course, if it happens to be a real human, someone who has managed to impersonate a creepy AI trying to imitate a realistic human voice, then hats off to that voice actor!