A 'bug' involving prime number maths eh?
Guessing that Skylake's microcode backdoor tripped up on all the prime multiplication ops and ran out of space to store copies of P and Q on the management engine secret flash.
Intel has confirmed it's pushing out a BIOS fix for a bug that can freeze its Skylake processors. The good news is that the bug's triggered by complex workloads. It was turned up by prime number experts the Great Internet Mersenne Prime Search (GIMPS), who use Intel machines to identify and test new large prime numbers. A few …
Interesting you should mention P's and Q's and Stroud. My recollection (correct me if my memory fails me) of Laplace Transforms is that everyone uses "s" except people doing engineering who use "p". (Stroud's books are geared more towards an Engineering readership).
Ugh. Stroud. Had to wade through some of his books doing engineering in the 1980s too. The main problem we had was that the mathematicians all used "i" for complex numbers while we used "j" to avoid confusion with Amps. Our maths was taught by mathematicians from the maths department and things did occasionally get a bit confused. Not helped by the "clever" calculators just coming on the market as the decade turned with dot-matrix displays...
M.
Had similar problems around the same time - profs from the math faculty teaching would-be engineers. When worlds collide... However, my Casio somethinsomething (replaced by a HP48sx as soon as possible) and comrades Bronstein / Semendajew / Musiol / Mühlig helped a lot to get me through this.
We had the same problem only years earlier. A Pure Maths teacher trying to teach to people who were very applied. Most had come back into education after often years in Industry. The Pure maths textbook was just gobbledegook to most of us.
Enter 'Engineering Mathematics' and suddenly things became a lot clearer.
The Maths lecturer soon grasped that he had to teach us differently to those studying Maths.
One size did not fit all.
This was at a Polytechnic. Now that they are all universities it is a lot harder for people to take that route.
I know that I would not have had the career I did if I'd started out even in the mid 1980's.
Ah - Stroud, or "Engineering Mathematics for Dummies" as it would probably be called now, got me through the seriously tedious world of partial differential equations.
Always loved the tone in books 1 and 2, I think he slipped a bit in the Laplace Transforms tome and the Fourier Series book sent me catatonic.
"Your BIOS basically uploads a microcode update to the cpu and by updating"
Picky, but....
I'm pretty sure the microcode is not so much upgraded as patched and that patch needs to be applied on every boot.
The hassle with microcode patching is that it likely runs far slower than the native (hardwired) microcode it replaces (as a bit of a stretched metaphore, think software floating pt emulation rather than hw floating pt) . That's likely going to make these processors suck for number crunching of the form that revealed the bug.
I'm out of touch - the BIOS can now update CPU microcode? F'k me, is nothing sacred? Real security would appear to be an impossible dream with arrangements like this.
It seems that the CPU makers are jumping on the same ship the OS makers have been on for years: "We push out the crap, and the customers find the bugs for free."
BIOSes (and UEFI firmwares) could update Intel CPU microcodes for >years<, ever since Pentium Pro.
Threat to security of the microcode update process is quite low since the microcode update is checked by the CPU itself prior to update and will be rejected if the signature fails. Unless you get access to Intel's private key, you cannot do anything useful with the microcode BLOB, except to try sending it to the CPU and watch the process fail unless the BLOB is unchanged and newer than microcode currently "running" on the CPU.
And, anyway, after a cold boot CPU just reverts to the original microcode which was stored on it during the specific stepping production process. BIOS/UEFI then updates the uCode with its latest version followed by the OS, which typically has the latest.
This is nothing to say that Intel has royally f*cked up with this bug, but there is no justification for blaming microcode update for being a security liability. It is not.
I did not state the systems are infallible, but at least in Intel's and AMD's case (as far as we are talking about x86 world only), there are no known faults with the microcode update implementations. Absence of evidence is not evidence of absence, but so far, uCode has been proven safe and not because of lack of trying.
I cannot say for sure, but I would place my bet that Intel >really< tested the microcode management part of the silicon. Maybe not primarily for the benefit of customer security, but because of their own business.
Apart from the microcode used for bug fixing and, sometimes, implementing future instructions (such as AVX2 GATHER in Haswell) there is also a part which typical client almost never sees - stuff used for debugging and feature enabling/disabling / operating point control. With those, it is possible to have more control over the CPU compared to what the "normal" MSRs can do and enable facilities which are "not there" as per model information.
And >that< is protected with the strongest cryptography, for the manufacturer's own sake. Basically, CPU is controlling itself in this case, and without passing signature checking it will refuse to do anything with the blob, and you cannot "force" it from the outside, other than either:
a) Breaking whatever encryption Intel/AMD are using, which I am sure is the strongest available
or
b) Physically manipulating the CPU by cutting the package and doing in-silico modification. Let's forget the part where multi-million equipment is needed and such CPU would last only few hours, this method is hardly undetectable
or
c) Finding and exploiting a bug in uCode validation procedure on the CPU. I doubt this is realistic, since such procedure can (and probably is) made with simple and mathematically provable code. Maybe this is way more likely than a), but I really doubt in its existence.
I do not know for sure, but I would be willing to bet that authentication and checking of the microcode and operating point protection is most likely most audited and checked part of the CPU design :-)
If I am to think of ways to "exploit" Intel or AMD x86 CPU, uCode update would be very low on my list.
Old Intel microcode updates are just using some symmetric crypto. If you were really hell-bent on it and had a huge budget, you'd probably be able to grab the keys off the silicon.
Old AMD microcode updates aren't even encrypted.
Neither poses much, if any, security risk however as the size and abilities of the code are both VERY limited. You have a (low... 16 or so?) number of patch registers to override locations in the microcode ROM with some (small) amount of microcode instructions and that's it.
Certainly nothing that can be used as a remote backdoor. At most, with an insane amount of work, you could - possibly - use it for a local ring 3 => ring 0 or VMX escape. So this is nothing like eg. ME or other truly evil backdoorable "features".
And then there's the small issue of persistence. Microcode updates do not persist across resets - they are loaded by the BIOS and OS on each startup. If an attacker can modify BIOS/EFI or OS drivers, there are countless well-published ways to hose you that are undetectable from the running system, and none of them involves microcode updates.
There ways to make this extremely hard and unlikely.
Whether Intel adheres to such practices, I cannot say for sure. But considering that their multi billion dollar business literally depends on this, and the company does not have a lack of excellent security and engineering talent, I would say that the process is probably as secure as it could be.
Microcode is a lousy place for hiding rogue code anyway, since it gets reverted to "factory" code after every reboot, so you have to exploit the system firmware or OS and update the microcode after every reboot, and if you already got there, you probably already have what you need, there is no need to invest millions into trying to "crack" a CPU and redo such job with every new process shrinkage / generation change.
Also, microcode is almost certainly tied to the particular architecture, which means it changed at least every 2 years.
It does not make too much sense from the cost/benefit point of view.
I would be more worried about system firmware. Many of the modern PC / notebook motherboards are using system firmware implementation which was at least on TianoCore UEFI code. If somebody smart found a good hole there, it is likely that such hole could be exploited on multiple generations of system boards since big part of that code is platform-independent and probably not touched too much by firmware vendors.
Of course, since at least Haswell (and, I think, at least Sandy Bridge EP) platform, there are ways to prevent this by forcing hardware validation of the firmware images (making updating patched UEFI images impossible), but such things were rarely enforced and probably not even wanted in some parts of the home market where it was/is desired to be able to "patch" UEFI image for software piracy reasons.
But, at least in theory, it would be possible to make this process very >very< hard by insisting on hardware validation which cannot be overridden in software at all. This is still less secure since with many OEMs there is a higher chance that a private key leaks, which is why I would prefer to have a system with a jumper which prevents >any< firmware update, but this seems too much to ask today :(
"It seems that the CPU makers are jumping on the same ship the OS makers have been on for years: "We push out the crap, and the customers find the bugs for free.""
Loading microcode at boot up isn't a new thing, one of the steps of booting a VAX-11/78x was loading the microcode... The microcode was even documented so you could cook up your own - and some people did. Note: VAXen weren't the only big iron boxes that loaded ucode at boot time. :)
VAX even changed its microcode with one or more of the OS upgrades. The new OS came on some tapes with a wee box containing the new microcode in EPROM.
IIRC one of the Burroughs machines flipped microcode on the fly depending on the process executing. That allowed it to use different microcode (eg. different instruction sets), for, say COBOL vs FORTRAN programs. Pretty neat trick.
I also remember there being a bug in the VAX hardware on the Venus machine (can't remember the number - 7300 or 7400 maybe) and some bright spark reprogrammed the microcode on-site to work around it.
Plus some recollection of something called MEEP which basically converted an ICL 2900 machine (aka MU5 V2 :-) ) into a 1900 machine so that it could run GEORGE 3 on newer faster hardware with better performance than the native VME operating system.
I think the difference is these days the maths capability of the CPU is just some software that gets loaded into the CPU. I remember this sort of thing from the 68060 and 68040 days, functionality removed from the CPU and loaded in as code.
Transmeta took it to the max by making all of the x86 instruction set load into a CPU.
I don't think I ever had a P60, I was still on a 486-DX2 then. I'm pretty sure it was a 90 (I couldn't afford a 100, or didn't want to. This was also a time when rumours surfaced you could easily clock a 90 at 100 'if you had a good one'). The WIKI also mentions the relevant steppings, but as with all WIKI's, YMMV.
I had a Compaq Presario with a 75 MHz Pentium and a little heatsink on it. I replaced it with a larger fan assisted one and fiddled some links on the motherboard to up the processor to 90MHz. No problems at all.
Nice to see the art of drop dead simple overclocking is still alive with the Pi.
> Pentium 90 because of an FDIV bug
I was working at MotRot at the time and battling our 'technical architect' about which processors to use - AMD or Intel. He eventually decided (in his infinitie wisdom) that we would use Intel because 'they have never had a problem and you always know what you are getting'. The next day the news of the FDIV bug came out..
Oh, how we laughed.
Bugger ... just looked up 'using FFT to multiply large numbers'.
Why oh why do I allow my curiosity get the better of me? It's like mathematical spam - enticing the reader in with promises of mathematical benefits then confuzzles them totally with a random splurge from the symbol font table.
Instead of dangling a fat mathematical worm on a hook can El-Reg in future use such phrases as 'using a method', 'doing a mathematical thing' or 'magically'?
We had for a while with large consumer software: wait for the service pack 2 before buying.
With Skylake, looks like we reached that phase with the CPUs as well, and the advice will be: "wait for the second / third stepping before buying, unless you want to be a beta tester".
Anybody who has worked on the system/firmware level of the modern CPUs, or have seen/worked with the BIOS writers guide, knows that the complexity of the software needed just to initialize the CPU to the OS boot time has became staggeringly complex, probably more so than the entire operating systems of the late 90s.
This is in no defense of Intel, CPU crapping while doing hard math is simply inexcusable, but it does not surprise me one bit considering the fact what the modern CPUs have become, there are so many things (PCIe controllers, GPUs, complex power management, multiple levels of cache and ring buses shared by different on-die peripherals, memory controllers, etc.) which could go wrong when various "weird" conditions are created.
Of course, Intel should have found out about this on their own during the engineering phase, but seeing this just shows why the server CPUs need at least 1+ year of quality assurance in order to be "releasable" - maybe that is actually what the minimum quality should be, and what we are seeing in consumer space is just degradation of quality even in the CPU itself :(
I am not optimistic in this regard, and to me it looks like we will be seeing worse - TSX fiasco was not just a fluke, these things are becoming more complex and market wants them sooner.
In this particular case, HPC and financial industry was just lucky the bug showed before Skylake EP/EX platform was launched. I guess they would not be amused if their brand new Xeon chips crapped while doing heavy math.
My i7-3770 will hang in about an hour after boot running anything, if you don't load microcode updates.
I got the new motherboard and I was running Debian without the intel-microcode package which updates the microcode at boot (since the BIOS is bypassed and GRUB doesn't do it) so it was running what was on the chip.
It kept mysteriously hanging, except an identical board/ram/cpu did the same, so it wasn't bad hardware. I installed intel-microcode & iucode-tool and then it was all fine.
So apparently Intel CPU microcode is shit to start with.