Re: Throwing the baby out with the bathwater
Just because it benefits you that doesn't mean it should be allowed or legal.
It is theft, plain and simple, if you applied your logic to anything else you probably wouldn't agree with it then. Is stealing money from people fine if it is given to charity or to poor people? Is stealing houses fine if they are given to poor people? If I buy one copy of a book and then print many copies of it myself and give them out is that fine? No none of that is fine, it is all theft.
That code you are writing with speech recognition, are you actually speaking every word and piece of punctuation yourself or are you basically telling the LLM how you want other people's code stitched together?
Everything you described can be done legally without using people's stuff without permission. Yes it won't be as easy and will be more expensive but it is possible.
Weather forecast models shouldn't have any problem with finding legal training data, neither should medical models, neither should financial models. Your statement that, "all the systems are based on a mixture of public domain and copyrighted information", has absolutely nothing to back it up. Also the medical industry has a requirement for privacy and laws specifically made to deal with medical privacy, at least where I am if anyone outside of who is actually treating me wants access to any of my medical records for any reason then they have to ask permission.
It has also been found through research that if you prepare the training data better and are more selective with it then it is possible to train a better or just as good but smaller models on less training data. The data can be prepared better by manual detailed labelling as one example that can be used for images.
For good speech recognition then why do you need to use data without permission? Either hire people to create training data or ask for volunteers and market it as helping disabled people. Ask people to spend a little time, maybe even just 5 minutes, reading out some text to be used as training data. This is something that people like you could do for yourself since if it is just your hands that don't work well then you can still read and speak. Yes it would be more work than just stealing every bit of audio with a transcript that you could find online but it is the morally and ethically better way of doing it. There would also be the requirement to use the data for a specific purpose and nothing else, like if it is collected for speech recognition then it can't be used for text to speech without asking for permission again. If it is code that is collected for a code speech recognition model then it is only used for that, not to train a code generator model.
There is no excuse for stealing, even if you believe it is being used for good. There are ways to get training data in a moral and ethical way, but these companies don't because that is more work and would cost more money.