Re: Too much complexity
There's a reason that microkernels and parallel computing aren't ruling the world.
It doesn't always work as well in theory as you think because most things just aren't parallisable and the overhead involved in managing the logistics of both loses all the (theoretical) performance gains that you can get.
In the case of the former, it's supposed to provide better isolation and security, not performance. In the case of the average home machine your GPU is basically an isolated machine for a reason - it's inherently insecure in itself and the parallel code it executes can play with all kinds of stuff on the chip that it's not supposed to. The performance vs security tradeoff means that they made it very fast, very insecure, and then they only secure the stuff you can send to the card.
It's been that way with OSs and CPUs for far longer - Windows was performant and simple, and sacrificing security for that caused almost the problems that older and even modern Windows suffers with. Even basic memory security requires so many more checks, balances, isolation, middle-men handling data, etc. that it slows things down. Remember Spectre/Meltdown? Basically caused by a security tradeoff for performance in speculative execution (literally a performance-first feature).
And all the "first technologies" are insecure for years buy that helps them work faster and get to market first, and that almost always wins. 3DFX cards allowed arbitrary DMA to any part of memory for decades before anyone noticed that's how they worked.
Despite everyone's wrangling, few people actually care about security - they just want stuff to work and work fast and will pay for a faster chip over a more secure one. People's primary gripe about the Spectre/Meltdown fixes was "Oh, but it slows my computer down".
Parallelism, though, is a very different beast - we can go to 128-core chips and we all have GPUs with highly-parallel workloads on them. What do we use them for? Games. Or things like farmed off the antivirus and other stuff that we don't want to "get in our way" for performance. Do we actually do anything that takes significant advantage of 128-cores and their parallelism? No. we just run 128 single-threads at the clock-speed we already had. No one program sees the benefit of such parallelism, we just run more programs (that are each unsuited to parallelism) simultaneously, and are still bound by the clock-speed. And if you have 128-cores at a slower clock rate than a 64-core chip, people aren't going to queue up to buy the 128-core chip.
Even our most popular programming languages struggle to express any significant use of parallelism. Architectures simply aren't built for it. And hardware is designed for speed first.
It's not any one component... it's that nobody is prepared to sacrifice raw single-thread performance for anything else, even security in many cases. It's a market problem, a software problem, a hardware problem, a sales problem, a programming language expression problem, and ultimately nothing is set up to go that way and nothing that goes that way really takes off how you would expect, and hasn't in all the decades we've had it. Most cloud computing is just bog-standard PC architectures lumped into highly-available groups, with workloads distributed by third-party software, no real parallelism even though it forms a perfect use case for such. Instead we deploy individual VMs on specific hosts, and software load balancers, rather than farming the work out across dozens of connected PCs able to parallelise the work required across them all.
Because, when you get down to it, parallel computing is hard to achieve in hardware, much more difficult to program, understand and debug, slower than other methods, and requires far-greater interconnects, memory speeds, etc. to get close to matching the performance of just lobbing it at a dedicated processor, or even chopping it up logistically and lobbing it at ten processors via software.