I can hear a spectre howling...
I can hear a spectre arising from the grave of Intel's performance advantage over AMD and giggling madly at the benchmarks.
OpenBSD has disabled Intel’s hyper-threading technology, citing security concerns – seemingly, Spectre-style concerns. As detailed in this mailing list post, OpenBSD maintainer Mark Kettenis wrote that “SMT (Simultaneous Multi Threading) implementations typically share TLBs and L1 caches between threads. “This can make cache …
This post has been deleted by its author
Hyperthreading always was a kludge to desperately keep overlong pipelines full.
It is useful for stuff that has been properly optimised, but in the consumer space where the malware hammer mostly lands not so much.
Frankly, I wish Microsoft would take these issues as seriously and disable cruft like this in the consumer OS's by default.
Frankly, I wish Microsoft would take these issues as seriously and disable cruft like this in the consumer OS's by default.
Yes, this would be great. However it would annoy their long-term buddy, Intel.
That said, if they really are planning their own line of chips, maybe they'll do it to cripple Intel kit and bolster sales of their own silicon. (Cynical, moi? Yep.)
The whole point of hyper threading is that it works well on code which *hasn't* been thoroughly optimised. If you've got two vector instructions lined up for each tick, hyper threading can't get you anything; if your thread is waiting two hundred ticks for the L3 cache to divulge the next operand, having another thread running until it too needs to wait for the L3 cache is extremely helpful.
I am very sad at the way that people are using fairly hypothetical security arguments to disable the features that make processors actually good at computing: I am willing to do my banking on my phone if that means my actual computers can crunch numbers at a higher percentage of its peak speed.
Its a valid point, but border security is dead ( why else would Trump, eh, trump it ); the future is encryption everywhere, even between machines on a lan, which implies all machines need to be capable of ‘continent computing(TM)'.
The dispatcher should be able to identify two (threads, processes, actors) as privacy equivalent and permit intimacies such as smt without surrendering anything. Counter to that, the vendors have been deceptive about these attack surfaces, so at least somebody should be sceptically conservative in response.
"Hyperthreading always was a kludge to desperately keep overlong pipelines full."
I don't think that's fair. It keeps an OoO processor core busier than it would otherwise be if it could only take instructions from one thread of execution. You lose if the second stream of instructions blows your caches (so your workload is memory-bandwidth-limited) and you win if it doesn't (so your workload is computation-limited). It presumably takes fewer transistors than a second core but makes fault handling more gnarly.
End-users have always been able to switch it off in the BIOS, so OpenBSD are merely making it easy to do that after you've booted the OS. Obviously I don't know whether this is inspired by advance warning of another Spectre-like bug, but it seems more likely to me that they just thought it would be prudent in the current climate to have a switch already in place for their users. As the article points out, it is quite believable that being able to share a core with your victim makes a timing attack easier.
IIRC AMD said they don't allow cross thread access to those caches... or something about security checks before releasing the cache.
They may have a different exploit (it's still possible other ways), but mostly, other than the Spectre variants, they were doing things differently to Intel.
...of things we know are optimized for SMT. I'm willing to bet any hypervisor is. That means KVM, VirtualBox, Xen, and VMWare. I am almost positive that gcc would be on the list.
What about:
Python?
Java?
HTTP servers?
Databases?
Please add a reply if you know for sure about anything that is optimized for SMT so people can be ready if needs be.
Mostly true, maybe, but I am willing to bet that any mitigation will be multi-part: microcode, kernel and applications. Just like the rest of the Spectre work that has gone on. People will want to know if they have to patch an application that they depend on. Also, the way that code is optimized for SMT may have to change.
I could be wrong, but I don't think OpenBSD would make the change unless they have a damn good reason to. They are a very conservative distro known for tight security. If they are going so far as turning off SMT altogether then my money is on something big and complicated.
This post has been deleted by its author
So then it would up to the host kernel and only the host kernel for VM's? I suppose that makes some sense. It's just a little surprising that they would have no other optimizations for such a common technology.
As for languages, I sometimes wonder how many CPU cycles Java is spends being Java instead of executing code ;-)
Java computational apps benefit from HT being turned on. The language has been gradually undergoing changes to improve threading performance. 4 to 10 concurrently working hardware threads is no big deal. A couple days of contention profiling can get that over 30 without any drastic architectural changes.
Java web servers are much harder to optimize for concurrency. Even tiniest the GC pauses or external disruptions will cause I/O unblocking to clump up. If that clump of waking threads serializes through the same bottleneck, other threads will start getting caught up too. It snowballs and you get weird latency spikes. You can find these with profiling but they're not necessarily in your own code.
This was actually the vector of attack that designers were seriously concerned about fifteen years ago. It's a bit surprising to me that it has taken this long for this to come up.
However.....
Two threads per core ==> very easy to snoop on the other thread.
Four threads per core ==> much harder to know which thread you are snooping on.
Also, if an application says to the OS, "Please start this thread for me. I trust it to play nice with my main thread." I'm hard pressed to understand why the OS should say, "no".
There are a number of processor architectures that have taken things further than the Xeon 2 threads per core model. Off the top of my head the Sparc T1 had 4 threads per core, rising to 8 threads/core by the time we got the T3.
In current use, the XMOS processors use this technique, I think there are between 4 and 8 round robin slots per physical core, so the 500Mhz processor appears to run 4 independent threads at 125Mhz, for example (which is handy, as it hides fetch latency etc).
If you are writing memory bound software, hyperthreading isn't a win. If you are compute heavy, it can help, and of course it really depends on what else is running on the cores (outside your control).
I'm wondering how this interacts with AMD's memory encryption technology. In principle the data of one process can be encrypted, and not comprehensible to another process. However if this new problem is within the CPU, where of course a process's data is handled unencrypted, then it might be a problem.
Interesting stuff - can't wait to find out what it is!
(Can't remember if AMD bother with hyperthreading)
...and asking a bit of advice, I decided to share this.
First the two replies to the message on the OpenBSD mailing list:
https://www.mail-archive.com/source-changes@openbsd.org/msg99159.html
https://www.mail-archive.com/source-changes@openbsd.org/msg99161.html
This tells us who discovered it (Ben Gras of VUSec) and what they named it (TLBleed).
A Google search for VUSec and four mouse clicks leads to this:
https://www.blackhat.com/us-18/briefings/schedule/#tlbleed-when-protecting-your-cpu-caches-is-not-enough-10149
After reading a bit I disabled hyperthreading from the BIOS on my systems. I think that until I know more the folks at OpenBSD are right and it's better to play it safe. I am writing this because I think that security by obscurity is no security at all.
Personally I prefer the 400 thread count cotton sheets. They are very nice and comfortable.
Oh, you were talking about some computing thing? Never mind.
Regards to Emily Litella
The effects of hyperthreading vary quite a bit based on the workload you are feeding it. The simplest method would probably be to profile a day's worth of work, disable HT, the repeat the exact same workload to see if there is a performance difference.
The easiest way might be to build two systems exactly the same (Same hardware, OS, software, etc. but one has HT turned on and the other doesn't), then run some sort of mirroring device so that both machines get the exact same data and do the exact same work.
I've seen MySQL databases do everything from falling to pieces to flying like a speed demon with Hyperthreading in different states. I've seen it vary that much with the same data, but slightly different queries used to process the data. One of our web applications went from an application-based spin-lock structure to using MySQL's atomic operations, in this case disabling HT actually increased performance about 10-15% despite having half as many threads available.
From the parts of HW that hyper threads have shared (though they are increasingly becoming separate processes by virtue of less HW being shared as time goes on), threads were designed to speed up different threads in the SAME program.
It's never been a good fit for unrelated applications as it causes too much non-shared resource contention which can result in slower performance than running the two separate apps on different cpus.
We turned hyperthreading on because it seemed to improve our compilation efficiency. If we have a 6-core 12-thread machine, we compile with 12 parallel processes. In theory, most processes are waiting for the source file to be read from the disk (or the object file to be written to the disk), that's a lot of downtime when that same CPU core can be used to compile another file that has already been loaded. You have thousands of source files and that time adds up appreciably.
We don't do any heavy computing on our PCs.
This always made me wonder about AMD's older "FX" architecture where it had two integer cores that shared an FPU. So '8 cores' meant 8 ALU cores, but only 4 FPU cores. Seems fine for most jobs, but if you had a lot of math, it had to be terribly inefficient.
I use an AMD FX processor on my main machine (haven't upgraded to a Ryzen yet).
Honestly, if there is a performance impact in heavy math workloads, its not really that noticeable but its kind of a moot point anyway as compute's almost always handled by the GPU(s).
I've always wondered if the cooling effects of a core not being used all the time would give you enough room in the thermal budget to increase the number of cores overall.
I figure that if a core's usage with hyper-threading is less than half of its usage when it is turned on, that could free up enough electrical energy and thermal capacity to add in another entire core. The break-even might even be higher than that because you'd be dissipating the same amount of heat over a larger surface area.
It'd add to the complexity of the chip, but if you can squeeze out more performance, it could be worth it.
I like the way you think--but the designers are WAY ahead of you. For instance, Apple laptops would sleep between keystrokes--in 2000. I know that the AMD Athlon would power down the parts of the FPU not in use _every cycle_. That's not as big a deal as clock distribution, but I know I heard designers talking about powering down the clocks to the FPU when the fpu enable bit was not set.
Yes, crazy, really crazy, power saving measures became a thing near the death of bipolar (transistor) gates. The design teams saw no reason to ignore those lessons when MOSFET happened. Much has changed in the ten years since I moved on, but I am completely confident that they are only adding techniques when it comes to power savings.
It's only a way to come up with a large thread count without costing more in silicon.
If you're really lucky a hypermarketing fictional thread is worth 15% of a real one. But it more likely to slow things down.
So glad I have an honest i5.
The performance advantage of the i7 over the i5 has a lot to do with the larger cache not so much HT.