But really something like this needs to works universally with whatever services I choose but that would need these services all to work together. And not a chance in hell of that ever happening!
The UI is basically a final project for a graduate CS class in speech recognition. You could cobble it together out of free software like CMU Sphinx. Then just use it to drive each service separately.
Hell, I could have done pretty much all this stuff in the late '90s with a copy of OS/2 Warp Connect1 and some www3 scripting. There's no magic here.
If it's not in your wheelhouse, wander down to the nearest university with a CS grad program and see if they have any SR or NLP courses. Then see if you can talk a couple students in one of those courses into helping you build the system. If you want remote "cloud" processing for some reason, it's easy enough to spin up a little AWS instance, and S3 is cheap. Once you have the UI, it's just a matter of learning to drive the web APIs for the services of your choice. My undergrad web design students all managed to do that, and they were all professional-writing majors (so mostly not very technical).
1And I have one, still shrinkwrapped, in the basement. Alas, none of the VM supervisors or hypervisors support OS/22, and I don't have a spare machine to dedicate to it.
2For good technical reasons. There was an article in CACM a while back by some of the original VMWare developers that went into it a bit. Basically the OS/2 kernel did some obscure things that weren't worth a lot of extra development effort to support. (There's a right way, a wrong way, and an IBM way.)
3I don't think curl or wget had been written in the late '90s. Too lazy to check.