"Which has no relevance at all to there being almost no choice in voices."
It really does. Let me explain. Your idea of how complex things are is flawed in multiple ways:
"Or to there being no easy way to make the voice models - they did this in 2002. It's clearly not cutting edge, mega gpu, nuclear powered datacentre work."
If the only metric in how hard something is is how much computing power you need, you're right. Obviously, that is not the only metric in how hard this is. The article should make this plain. In order to build that model in 2002, they needed many days of hours-long recording sessions in a professional studio with a professional voice actor who can take very specific instructions, not half an hour with a laptop mic. That's not the only thing they needed. I can guarantee you that they had a lot of audio editors chopping up that source data and programmers figuring out how to stitch them back together. I know this because open source groups have been doing the same thing. When you can't afford to spend a lot of time on those details, you get robots. When you try to do it with a small amount of source data, for example for projects that have been using the technology to provide people losing their ability to speak with a computer voice that sounds like them, you get this. They have to do that work separately for each person you record.
Nowadays, there are some systems using machine learning to automate a lot of this, and quality is much improved. However, we are getting into lots of GPU territory for training, and even though you don't need that much computing to run the generated models, they are large and intensive enough that they can't run in real time on embedded devices, for instance the phones and navigation units on which you would want them. So yes, the lack of choice is because you can't make a functioning model with a little time and effort.
Now, we have the complaint about Apple denying you choice. They are truly evil for denying you voice options. Looking through a modern iPhone's speech settings, they are cruelly providing only 48 choices for English alone, covering 7 accents. Imagine being so restricted.