Reply to post: disable mitigations in OS - probably doesn't override firmware mitigations?

Do you want speed or security as expected? Spectre CPU defenses can cripple performance on Linux in tests

Nate Amsden

disable mitigations in OS - probably doesn't override firmware mitigations?

Most of my systems came from before the Spectre stuff so I haven't installed the firmware updates that have those fixes in them(read some nasty stories about them most recently the worst of them here https://redd.it/nvy8ls), I have seen tons of firmware updates for HPE servers that are just updated microcode, fixes to other microcode, implying some serious issues with stability with the microcode.

I have assumed linux commands to disable mitigation operate only at the kernel level and are unable to "undo" microcode level mitigations.

On top of that on my vmware systems(esxi 6.5) I have kept the VIB(package) for microcode on the older version since this started. The risk associated with this vulnerability is so low in my use cases(and in pretty much every use case I've dealt with in the past 25 years) it's just not worth the downsides at this time. I can certainly understand if you are a service provider with no control over what your customers are doing.

I can only hope for CPU/BIOS/EFI vendors to offer an option to disable the mitigations at that level so you can get the latest firmware with other fixes just disable that functionality. Probably won't happen which is too bad, but at least I've avoided a lot of pain for myself and my org in the meantime(pain as in having VM hosts randomly crash as a result of buggy microcode).

I do have one VM host that crashes randomly, 3 times in the past year so far, only log indicates that it loses power sometimes 2-3 times in short succession(and there is 0 chance of power failure). No other failure indicated, not workload related. HPE wants me to upgrade the firmware but I don't think it's a firmware issue if dozens of other identical hosts aren't suffering the same fate. They say the behavior is similar to what they see in the buggy microcode, but that buggy microcode is not on the system. So in the meantime I just tell VMware DRS to not put more critical VMs on that host, as I don't want to replace random hardware until I have some idea of what is failing(or at least can reliably reproduce the behavior I ran a 72 hour full burn in after the first crash and full hardware diagnostics everything passed), sort of assuming perhaps the circuit board between the power supplies and rest of the system is flaking out but not sure. The first time it crashed so hard the iLO itself got hung up(could not log in) and I had to completely power cycle the server from the PDUs(personally never happened to me before), iLO did not hang on the other two crashes. Server is probably 5 years old now.

Another Q is if version "X" of microcode is installed at the firmware/BIOS/EFI level, and the OS tries to install microcode "V" (older), does that work? or does the cpu ignore it(perhaps silently?). Haven't looked into it but have been wondering that for some time now. I'm not even sure how to check the version of microcode that is in use(haven't looked into it either). Seems like something that should be tracked though especially given microcode can come from either an system bios/firmware update and/or the OS itself.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon