back to article I'm sorry, Dave. I'm afraid I can do that: Microsoft unveils Custom Neural Voice – synthetic, but human-sounding speech

Microsoft has pushed its Custom Neural Voice service to general availability, although you'll have to ask the company nicely if you want to use the vaguely unsettling text-to-speech service. Unsettling, because unlike the usual text to speech we've come to know and love over the years, which require a substantial amount of …

  1. Anonymous Coward
    Anonymous Coward

    only a matter of time before I get a voice mail from the CEO asking for a wire transfer to a new vendor in Nigeria

    1. Anonymous Coward
      Anonymous Coward


      Its been done already.

  2. Falconitservices


    It’s only a matter of time before I get a voice mail from the CEO requesting a wire transfer for a new vendor in Nigeria

  3. Dave 126 Silver badge

    The magician (and technology first adopter, friend of Silicon Valley types like Gates and Jobs) Penn Juliette was presented with a magic trick where the contestant had trained an artificial voice on hundreds of hours of Penn Juliette speaking (from TV shows and podcasts). After performing the trick, the contestant told Penn that for erhical reasons he would delete the artificial voice - unless Penn would like a copy for himself.

    Penn pointed out that he was the only person on Earth who had zero conceivable use for an artificial voice that sounded like Penn Juliette.

  4. chivo243 Silver badge

    computer voices

    Love them in movies, but dislike them in real life.

  5. Stuart Halliday

    We know.

    Of course we all know what it'll be used for....

    That age old subject - pornography

    Well, what other use has technology got?

  6. Def Silver badge

    synthetic, but human-sounding speech

    Well, you wouldn't be able to pass it off as Stephen Hawking, but "human-sounding" is still a bit of a stretch if you ask me.

  7. John Brown (no body) Silver badge

    I see a use case

    There is an enormous number of books out there in electronic text format and often quite expensive, if available in audio.

    Personally, I use them on long drives, but there are a lot of visually impaired people out there who could benefit from improved "auto voicing" of text material. Some of the existing stuff is pretty good, but can not only get tedious when listening to longer texts such as novels, but can be quite disconcerting when they come across strange words, names or places that don't exist in dictionaries, eg SF and Fantasy.

    I did once convert a series of 8 novels for a blind friend, manually finding all the highlighted "I don't know this word" markers and changing them by spelling and phonetics to pronounce them more realistically, adding them to the custom dictionary so at least I only had to do each new word once. It was ok, but a long chore and not ideal for long listening sessions. Things have moved on since then, but this sounds like a at least a small, possibly a large leap forward.

    1. Anonymous Coward
      Anonymous Coward

      Re: I see a use case

      Blind person here. It isn't that great a step forward for quality, at least it doesn't sound like it from the examples. I don't have any reason to believe this is better for pronunciation. Also, you can get used to anything if you use it all the time. My typical screenreading voice is very robotic. I chose it purely for pronunciation accuracy and fast reading speed.

      The main reason this doesn't help is that the neural speech synthesis is only available as a cloud service. I can't install it for local speech, which means it's out for most of my reading. It only works if I want to send some text in, get an audio file back, and listen to that later. Given that I already have relatively high-quality speech software which does run locally, and I'm used to poor-quality speech which modern software far exceeds, I'm unlikely to run up a cloud account for this. For the many others with a visual disability, that won't even be an option if they want to because it requires use of the API, which is going to confuse most nontechnical people.

      1. Anonymous Coward
        Anonymous Coward

        @Blind AC Re: I see a use case

        Uhm errr yes.

        Actually you could use it off the cloud.

        The issue is training the model and then using it.

        If you built out a device with a decent enough GPU or similar (Nvidia has some options)

        You could get it to work locally.

        The issue would be that you would routinely have to connect to the cloud to update as their models improve over time.

        The problem is that its the 'cloud' companies that make money from this. And also use your use of their model to help improve it.

        1. doublelayer Silver badge

          Re: @Blind AC I see a use case

          I'll be the first to admit I haven't read very much about this, but it doesn't look like that's an option. The pricing pages include the price for creating a model, storage of a model and running said model. I don't see anything about downloading the model, let alone downloading the engine that uses the model. All prices there are about sending text to the cloud, where they are converted to audio using the previously-trained model. If people want to run it locally, they'll need the synthesis software along with their created model. If that's actually available, I haven't found anything about it. I think the original contention about cloud-only may be correct.

  8. FelixReg

    Language learning

    They mention Duolingo as a user of the tech. Interesting.

    Around 1990, I got close to nowhere trying to get a computer to speak natively in a foreign language with my own voice as I hear it in my own head.

    So, yeah, this is cool!

  9. Warm Braw Silver badge

    A natural-sounding voice that conveys friendliness, empathy, and professionalism...

    ... which will shortly be fronting up all manner of businesses that are hostile, careless and incompetent.

    The technology is quite impressive (though did they really give human Zoe such downbeat material to read?), but I fear it would be more honest, and indeed more cost effective, for most "Customer Service" operations simply to have their CEO record the message "You're not getting your money back" directly to their premium rate answerphone.

  10. fitzpat

    These Microsoft voices can't be any worse than Heathrow Airport's slightly pissed off yet disinterested synthetic female announcer.

    "Flight whatever is now boarding at gate 21, you might want to get on your plane at some point, meatbags"

  11. Anonymous Coward

    Deep Fakes

    So basically you can take a montage of various actors and create a face and body that is unique but based on several people.

    And you can create a natural sounding human voice which is also 100% artificial.

    Now you can replace actors / actresses with completely computer generated scenes or overlay onto some stock footage.

    Imagine the next X-men movie where its not a cartoon, but none of it is real.

    The next stop after that would be to replace the infinite room of monkeys with an AI to recycle old movie classics.

    Being the smart person I am, I'm going to start working on the AI talent agency. Real Humans need not apply.

  12. david1024


    I want the HAL-9000 voice on my echo.

  13. TRT Silver badge

    I'm convinced...

    That at least one of the Just Eat adverts has a VoiceOver that's computer generated. It sounds... creepy; wrong somehow.

    Of course, if it happens to be a real human, someone who has managed to impersonate a creepy AI trying to imitate a realistic human voice, then hats off to that voice actor!

  14. TWB

    I sometimes wonder...

    ...if soon we will have nothing to do at all.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2021