Re: As an amateur
I can speak to one of your questions with some authority, this one:
"2.) How can an OS (or any other significant s/w) update get into release without it having been tried on a good range of commonly sold machines?"
as I'm more or less in charge of QA for an OS (Fedora).
So, there's two key points to make in answering the question. One, what exactly is "a good range of commonly sold machines"?
The number of bits of PC hardware out there is probably literally uncountable. It's difficult to overstate just *how much goddamn hardware* is out there. Even if you say "well, Lenovo laptops seems like a reasonable subset to cover, how many of those can there be?", the answer is *still* tens of thousands, counting all the variants on all the base models they've sold in all geos over the years. So even just this part is...incredibly difficult. Especially for a relatively small OS, like Ubuntu or Fedora, which releases rapidly (we both release every six months). It is, practically speaking, an inescapable truth as a small QA team for a rapidly-releasing OS that there will be quite popular hardware you don't test on. There's just too damn much PC hardware out there.
I dunno the specifics for Ubuntu, but for Fedora for e.g., we have about 10 full-time paid QA staff, plus community volunteers. At a very rough guess, I'd say each Fedora release has probably been run on a few thousand different systems before it's released. But after it's released, it probably gets run on something like 100x or 1000x more hardware than it was tested on. There's just no realistic way to 'solve' this. It's always going to be a problem unless you make like Apple and say "we are only going to support a very small number of systems that we engineered ourselves".
Bigger, slower-moving OSes like Windows and RHEL can do *better* here, but they still can't be anywhere close to perfect.
Secondly, there's the specific failure mode here: what happens, AIUI, is that the system firmware effectively gets placed in a read-only mode, notably meaning EFI variables like the boot order can't be changed. But that's not a particularly *obvious* failure mode. If your test is "install the OS, check it works", then your test would *pass* on an affected system. You're only going to catch this failure if your test is "install the OS, check it works, then try and change the system's firmware settings in some way".
Which sounds like a reasonable test, and sure, it *is*. But there are thousands of potential 'reasonable tests' you could perform on any given bit of hardware - work in QA for six months and you personally will have a list far longer than you'll ever have time to run. And again, an OS releasing on a six month cycle with a QA team that doesn't number in the hundreds of thousands *absolutely does not have the ability* to perform all those 'reasonable tests' on all the hardware the OS might get used on. Or even a reasonable subset of the tests, on a reasonable subset of the hardware.
Frankly, trying to QA an operating system at all is like trying to bottle the ocean in an eye-dropper. Trying to QA one that releases every six months with a small team is more or less an exercise in futility. I have an awful lot of sympathy for my counterparts at Ubuntu on this one. We (OSes) have all had fuckups like this before - Ubuntu, other Linux distros, Windows. I remember the kernel code that could brick some CD/DVD drives, thanks to a bad interaction between some perfectly reasonable code in the driver and an appalling choice made by the firmware implementation for those drives, for instance. To be entirely frank, I'm really only astonished this stuff doesn't happen *more often*. (I have a theory that the longer you work in QA the more surprised you are that anything works at all...)