Hmmm...
I hope the kernel patches are intelligent enough to ONLY apply the 'slowness feature' when absolutely necessary, and not simply slow EVERYONE down to be "fair" (or, on the part of devs, LAZY).
A fundamental design flaw in Intel's processor chips has forced a significant redesign of the Linux and Windows kernels to defang the chip-level security bug. Programmers are scrambling to overhaul the open-source Linux kernel's virtual memory system. Meanwhile, Microsoft is expected to publicly introduce the necessary changes …
Who knows. With Linux of course we can tell. With Windows, possibly benchmarks are our friend if MS are keeping stum about the matter.
From a security point of view it would be better to leave things as they are if the hardware is not effected; better to be running mature code than to be running what seems like a major update out together in a bug hurry.
The other issue here is what MS does regarding Windows 7. It would not surprise me in the least if they tried a clever/efficient patch for Windows 10 and a simpler (and slower) bodge-job for Windows 7/8. Still, I guess we'll find out soon enough. They'd also better make sure that the changes only apply to Intel machines. I don't want MS to arbitrarily slow down my AMD PC as a result of this - you'll note that AMD submitted a Linux patch to ensure their CPUs weren't caught up in this, will MS do the same?
“you'll note that AMD submitted a Linux patch to ensure their CPUs weren't caught up in this, will MS do the same?”
We’ll find out, I guess - the two paths MS can take being “do we release patches that only impact performance on Intel” or “are we working together so closely with Intel that the competition authorities should be involved”?
By the sound of things it looks like the bug allows users (non-admin) to see the guts of the OS. This means, that once published, hackers will be given details as to how to take over the computer if you haven't upgraded (security risk).
What is naturally to fear is that the first generation code (patch) will be to block these details. New, ad-hoc code means there may be stability issues (return of the blue screen ??). If the OS developers run all their security test vectors, no old holes should open up. The question I think you have is will there be some new hole, I'm not sure anyone can know for sure - we can only demand that old bugs don't return.
What we can expect is for code to run slower. If I interpret the description well, the more threads you have the slower the code should run, ie it should impact JAVA code (scripts/interpreted code) more than C++(compiled code). It'll expose individual coding styles and change coders tactics for performance optimization.
If written intelligently, the kernel would detect the CPUID and supported instruction sets, and apply itself appropriately.
Ideally the kernel would flag all Intel chips, and once Intel fixes their problem, Intel releases a new feature flag that indicates a fix to the problem that the kernel could also flag off of.
This post has been deleted by its author
Kernel memory is mapped to user mode processes to allow syscalls (a request to access hardware/kernel services) to execute without having to switching to another virtual address space. Each process runs in its own virtual address space, and it's quite expensive to switch between them, as it involves flushing the CPU's Translation Lookaside Buffer (TLB, used for quickly finding the physical location of virtual memory addresses) and a few other things.
This means that, with every single syscall, the CPU will need to switch virtual memory contexts, flushing that TLB and taking a relatively long about of time. Access to memory pages which aren't cached in the TLB takes roughly 200 CPU cycles or so, access to a cached entry usually takes less than a single cycle.
So different tasks will suffer to different extents. If the process does much of the work itself, without requiring much from the kernel, then it wont suffer a performance hit. But if it uses lots of syscalls, and do lots of uncached memory operations, then it's going to take a much larger hit.
That's what I make of it from understanding of it, which might not be 100% correct.
"So different tasks will suffer to different extents." But the hardware...?
"The downside to this separation is that it is relatively expensive, time wise, to keep switching between two separate address spaces for every system call and for every interrupt from the hardware."
So now we'll have a little tax charged for every interrupt - *every* *interrupt*. How much software do you run that doesn't use disk or network or any I/O?
This was not the financial micro-transaction future I was thinking of.
I think we need to return to PDP11, where you had an alternative set of memory management registers for program and supervisor (kernel) mode. When you issued the instruction to trigger the syscall, the processor switched the mode, triggering an automatic switch to the priv. mode registers, mapping the kernel to execute the syscall code..
This meant that it was not necessary to have part of the kernel mapped into every process.
IIRC, s370/XA and Motorola 68000 with MMU also had a similar feature. I do not know about the other UNIX reference platforms like VAX (the BSD 3.X & 4.X development platform) or WE320XX (AT&T's processor family used in the 3B family of systems - the primary UNIX development platform for AT&T UNIX for many years), but I would suspect that they had it as well.
I first came across the need to have at least one kernel page mapped into user processes on IBM Power processors back in AIX 3.1, where page 0 was reserved for this purpose. In early releases, it was possible to read the contents of page 0, but sometime around AIX 3.2.5, the page became unreadable (and actually triggered a segmentation violation if you tried to access it).
That's true. Having a address space change would have to disable speculative execution, because it would also have to try to predict which address space it would be in.
Actually, thinking about it, it still has to, because if the mapped page is protected from view, there still needs to be some mechanism to lift the protection to allow the speculative execution of the branch of code, before the decision is taken. But in theory, the results of the branch-not-taken should be discarded as soon as the decision is made, so that the information gathered could not be used. Maybe there is something in the combination of speculative execution and instruction re-ordering (not mentioned yet) which allows data to be extracted from later in the pipeline.
Maybe this is the problem, and if it is, it's probably a design flaw rather than a bug, Interesting.
The leak may not even be data, it could be as small as a timing change from the spec-ex path taking longer to fault, or not, allowing the attacker to probe the kernel space for not/valid pages - so defeating to an extent the kernel memory-map randomisation? Worse might be a spec-ex branch on spec-read secret data which affects timing similarly, without directly exposing the data itself.
Now that would be embarrassing to the Rust evangelists, who think their one trick pony solves every possible security issue! They'll suppress that idea with their usual fervor I bet. To me it's the height of vanity to think you can pre-define and therefore pre-solve every possible side channel and backdoor attack. The endless game of thief vs locksmith is...endless...
"the results of the branch-not-taken should be discarded"
From the and hint the problem isn't speculative execution as such, it's fetch hardware that reads memory before checking permission, presumably changing cache state irreversibly. It speculative execution disables the privilege violation because the code path is discarded there's no way to detect the event or take any remedial action like invalidating the tlb. However invalidating the tlb would leak address layout information anyway!
The correct thing is blocking the fetch ops completely while still potentially raising an exception if that part is taken. Better, raise an exception anyway. Which appears to be the amd approach. Intel look like they saved some transistors and maybe gained a tiny speed advantage without thinking it through.
I think we need to return to PDP11
While the elegance of the PDP-11 design is beyond dispute, it wouldn't scale to the performance of current systems. Notably, its memory management was very simple: 8 base and length registers for each execution mode. As soon as you go much beyond a 64k address space it becomes infeasible to have a full set of mapping registers on the CPU and you're forced down the TLB road.
What might well make sense is if it were possible to run the (key parts of the) kernel in a "protected real" mode (i.e. with no virtual address translation at all, but with some form of memory protection using keys or limit registers. If you don't have enough physical memory to contain a kernel, you're not going to make much progress anyway. And it's only one of many areas in which improving performance with caching of one kind or another leads to potential anomalies with specific execution patterns.
Not that any speculation (sorry!) of that kind helps with the current debacle - but it does illustrate how the growing complexity and unauditably of processor designs has largely been overlooked until recently while we've been principally worried about software security.
"What might well make sense is if it were possible to run the (key parts of the) kernel in a "protected real" mode"
If the kernal is in the same address space as user code and using the same pipe then nothing changes. Either a seperate CPU/cache for kernal code and without speculative execution or scrap the whole model.
Basically no kernal code should be run before security has been validated and that means stalls in kernal code execution.
"If the kernal is in the same address space as user code and using the same pipe then nothing changes. Either a seperate CPU/cache for kernal code and without speculative execution or scrap the whole model.\"
Even in user space page permissions exist. There's nothing intrinsic to a flat, shared address space that stops a CPU enforcing all permissions at the thread/page/exe page level, all the way down to prefetch and cache access. Separate cache/memory systems per ring is a high price to pay to replace access control logic in the memory system. Dumping cache state is an even higher price to pay for not having that logic.
Not really. "key parts of the" kernel are interrupt service functions... and they need to be able to address the entire memory.
Real mode sucks.
What is really needed is better architecture.
One set of registers for each interrupt, kernel, supervisor, user
at a MINIMUM. Then add separate cache for each level - though not necessarily all being the same size.
The problem with the PDP11 method is the kernel code has to do fiddly read-userland-from-kernal operations to get the call's parameters and data. If the kernal is paged in and running in kernel memory at &o004010 and the caller has its parameters in userland at &o004010 the kernal can't read them by just reading from &o004010 as that would read kernel memory, it has to use MoveFromPreviousInstructionspace (MFPI) instructions to copy the parameters to local (kernel) space, and MoveToPreviousInstructionspace (MTPI) instructions to copy stuff back to userland, such as loaded data.
That is true, but for volume data moves was mitigated by DMA from disk directly into memory-mapped buffers in the process address space, using the UNIBUS address mapping registers, which allowed raw DMA transfers to addresses outside of the kernel address space.
Of course, not all PDP11 models had the UNIBUS (or, I presume a similar QBUS) feature, but pretty much everything after an 11/34 would have. I had an unusual 11/34e that also had 22-bit addressing, which made it much more useful.
Only for the parameters.
Once past that, the kernel could use a single page table entry to map to whatever user memory was needed.
For the intel processors.... this bug just about kills any microkernel which was already slow, now becomes 20% slower.
Microsofts hybrid microkernel is going to have fits with this.
@Gathercole
Back in the 1970s I was asked to make a PDP11 system go faster. It was running DEC's real time OS, viz RSX11. The critical part of the program needed to make a lot of OS calls. I arranged for that region of the program to be mapped into kernel space (using a 'connect to interrupt' facility) and got my speedup. The cost was a section of high-risk code.
There is a reason why interrupts used to return control to the kernel. It may be that a disk transfer has finished, thus allowing some more important program to resume. In more general terms, the context has changed and the system should adjust, particularly if several programs are running. The PDP11 could respond to interrupts within microseconds, perhaps to capture volatile data from fleeting registers on special hardware; but it could be a long time returning to the point of interrupt.
The reason why PDP-11 could respond so quickly to interrupts was this facility to switch address spaces without having to save the register context. On other architectures, in order to take an interrupt, the first thing that you need to do is save at least some of the address registers, and then restore them after you've handled the interrupt.
IIRC, the PDP-11 not only had duplicate address mapping registers, but also had a duplicate set of some of the GP registers, so you had to do pretty much nothing to preserve the process context that's just been interrupted. This is what made the interrupt handling very fast.
The time to return from the interrupt was entirely down to the path length of the code handling the interrupt. The actual return mechanism was as quick as the calling mechanism. There were unofficial guidelines about how long your interrupt code should take, which I believe were conditioned by the tick length for the OS. If you took too long, you would miss a clock-tick, which would result in the system clock running slow.
In addition, I also believe that there were a small number of zero page vectors that were left unused by either UNIX or RSX11/m (the version of RSX I was most familiar with) that allowed you to add your own interrupt handlers for certain events.
"I think we need to return to PDP11, where you had an alternative set of memory management registers for program and supervisor (kernel) mode."
There are already plenty of non x86 derivatives out there that don't have this bug, all that's required is folks to make the move. :)
Would be nice if vendors updated their benchmark results in the light of a 30% performance hit, so we can get an apples-apples comparison against processors that don't suffer from this particular fault.
I think that you should say were plenty of non-x86 processors out there. There really aren't any more, with just AMD (which is an x86 derivative, but may not be affected), Power, zSeries, ARM, and the tail end of SPARC, and I suppose Itanium (just) being around.
You could also say, I suppose, that there is a MIPS processor around still, but you'd have to buy it from the Chinese.
A lot of other architectures never made the switch to 64 bit (although Alpha was always 64 bit). Architectures we've lost include Alpha, PA-RISC, VAX, all Mainframe apart from zSeries, DG Eclipse, Motorola 68000 and 88000, Nat. Semi. 32XXX, Western Electric 32XXX, various Honeywell, Burroughs and Prime systems, and various Japanese processors from NEC, Hitachi and Fujitsu.
This is largely the cost of wanting cheap, commoditized hardware. You end up with one dominant supplier, and suffer if they get something (or even a lot of things) wrong.
"I think that you should say were plenty of non-x86 processors out there."
There are still plenty out there, not all of them will be a viable alternative for your application...
"There really aren't any more, with just AMD (which is an x86 derivative, but may not be affected),"
In my view AMD share the same problem as Intel: The x86 ISA (64bits, extensions, warts and all) are simply too complex to test properly. It's a scalability limit in the design space - and this isn't a new problem - it goes back decades. We are seeing bugs span multiple steppings AND generations of product line as a matter of routine. The x86 vendors are physically unable to sell us a fully functional chip even if we pay top dollar for it.
As I see it, as customers, we have no alternative but to go to other ISAs over the long run - simply to get a working chip without the "feature-itis" imposed by 30+ years worth of workarounds.
It seems to be an architectural bug from what I'm seeing - probably related to the need for speed being the highest priority for marketing while security takes a back seat. We never worried about "security" in the old days of processor design, we were far more worried about incorrect access causing a crash and that took priority - with the result that modern security issues were mostly nonexistent.
"We never worried about "security" in the old days of processor design"
How old is old ? MMUs have been around a long time now.
"We never worried about "security" in the old days of processor design, we were far more worried about incorrect access causing a crash and that took priority - with the result that modern security issues were mostly nonexistent."
Seems to depend on where you worked - some vendors never embraced KISS. The protection features of the DEC Alpha were far easier to understand, use, test and verify than the equiv plumbing on the much older i386 for example.
No, you don't get a new CPU, you get a software workaround.
There is no time like the present for China to become a world player in CPU design.
Around the turn of the millennium I was able to read and write SCO Unix kernel variables (specifically uptime) as root using a simple shell script. Utilities such as 'ps' likewise ran in user space and needed to read process tables from kernel memory.
"Hmm, good timing... I bought a new PC just before Christmas and decided to go AMD after about a decade of Intel..."
Good call. I just bought a new laptop myself and went with AMD as well. It's going to be some time before updated chips from Intel come out as this is a hardware problem with speculative instruction execution on the pipeline. Hardware means that Intel will have to redo all the masks that are used for the fabrication process.
There is precedent here. The kernel still tests for the F00F bug on boot and applies the kernel-level work around only on systems that need it. Thus the slow down is only applied where it is needed. I expect this will happen in this case, although we are talking about a significant architectural change here.
Here I was, all happy with my i7 6700 that has served me well for the past two years, and now I learn that I'm basically going to have to replace the hardware if I want to stay secure and have good performance. What a nuisance.
Another round of Windows reinstall, with another fracking call to Redmond to justify that I am indeed the owner of this shit. I hate the idea already.
Ah, the day games are made for Linux first . . .
My uneducated guess is that the brute force protection code is being implemented. This code should give Intel some time to arrive at a more sophisticated microcode solution where the overhead is perhaps one or two dozen microcode instructions. With a microcode fix, the patch can be removed, or made specific to certain models of CPU.
I like to be optimistic, not pessimistic. A low overhead fix will be developed AQAP. (As quickly as possible)
A problem seems to be that the data is feed into the data pipeline (and L1 cache?) via speculative execution. To simplify the problem... if you have some code like:
if(false) {
x=some data that shouldn't work
} else {
do something slow
}
y=some data that shouldn't work
The x= gets the data loaded into the cache while the slow code is slow enough to make sure it gets there. Then the y= pulls data that is in the cache (and whatever makes up the other 64 bytes in its cache line) and that might not be checked against the permission bits in the virtual memory tables. I can't think of any situations where the x86 does speculative writes that would hit memory so this should be limited to reading data. The trick might work to slow down memory sharing on multi-core systems. x86 I/O sometimes is read based so that reading a memory location could resets a counter or buffer and that would be a problem limited to some i/o device. If someone can come up with a way to have the speculative data being read and then written back through the cache, the security game is over.
"IANAL but surely the chips we have, once patched, will no longer be performing as advertised?"
I was thinking something similar. The Lenovo laptop I bought less than 18 months ago is still in warranty so is it reasonable to ask for a refund/replacement?
Intel's CEO Just Sold a Lot of Stock
https://www.fool.com/investing/2017/12/19/intels-ceo-just-sold-a-lot-of-stock.aspx
(Via Jackie Stokes)
Wonder if the SEC is going to investigate this.
Of course they're not going to be doing that, most regulations in the US that held back predatory forms of capitalism have been neutered and the ones that are still in place are not being enforced.
If you are a UK based consumer (not business) you have up to 6 years to make a claim against the business that sold you the CPU (not Intel) because of the wonderful Consumer Rights Act. If the manufacturer admits the fault, then all the relevant criteria have been met. A 30% performance loss would be considered unreasonable without compensation.
The "First 6 months" and "since months or more" paragraphs on the Which website explain it best here https://www.which.co.uk/consumer-rights/regulation/consumer-rights-act
Under Australian law this probably fails the ACL guarantee and would be classed as a major problem, in which case consumers are entitled to a remedy of repair, refund, replace. The choice of remedy is down to the consumer.
Quoting the ACCC:
A product or good has a major problem when:
* it has a problem that would have stopped someone from buying it if they’d known about it
* it is unsafe
* it is significantly different from the sample or description
* it doesn’t do what the business said it would, or what you asked for and can’t easily be fixed.
I think you can certainly argue that it has failed the first of these, and probably the 3rd and 4th as well.
More details here:
https://www.accc.gov.au/consumers/consumer-rights-guarantees/consumer-guarantees
https://www.accc.gov.au/consumers/consumer-rights-guarantees/repair-replace-refund
I'm wondering how this will affect the contractual relationships between compute cloud vendors and their customers when the N processors the customers are paying for no longer get the job done. Oh to be a fly on the wall during those discussions...
If you're in the United States then the phrase you want to suggest to your lawyer is "Defective By Design". I am not a lawyer, but if it takes a major software fix to ensure security, that's pretty much spot-on the definition of a DBD claim.
If an auto maker had to fix a serious issue that robbed the engine of nearly 30% of its power delivery capacity, there would probably be a general recall & more lawsuits than you could shake a stick at. So a CPU that has to be fixed & the fix robs it of ~30% computational capacity? Yup. DBD.
Intel's GOJF card
They are not slowing the processor, the OS is, so no comeback on Intel
OS is slower because "security" and they are informing you before it happens, so no recourse to MS
SO basically we all get srewed and Intel MS et al. get to keep their mnega bucks
> They are not slowing the processor, the OS is, so no comeback on Intel
I don't think that argument will work for Intel. Their chips are not working to specification: the user-mode/kernel-mode separation is supposed to be policed by the CPU, and it's leaky. There is no chance of them hiding this fact.
They could try to spread the blame by claiming that some of the slowdown is due to badly coded workarounds by Microsoft etc but they can't escape the blame for workarounds being needed.
@EnviableOne
"They are not slowing the processor, the OS is, so no comeback on Intel"
I don't think this will wash under Australian consumer law as one of the key guarantees a product must meet is that it has a major problem entitling the consumer to a repair/refund/replace remedy if (quoting the ACCC):
* it has a problem that would have stopped someone from buying it if they’d known about it
If you had been told at the point of purchase that x months down the line, a software patch is going to be required due to a flaw in the CPU that would slow it down by a significant margin would you still have bought it?
Going a bit off topic...
If you are really stuck for something to read, the legislation setting out these guarantees is part of Schedule 2 of the Competition and Consumer Act 2010
https://www.legislation.gov.au/Details/C2015C00327
For something more readable, I would heartily recommned that everyone has a look at this:
https://www.accc.gov.au/publications/electrical-whitegoods-an-industry-guide-to-the-australian-consumer-law
I always print this out to take with me whenever I need to take something back to a retailer. It is amazing how quickly they change what they are saying when you show them that you know the law.
Shout out for KMart, Bunnings and BigW as in my experience they usually provide excellent customer service and will refund or replace even when they don't have to under the law.
Big boo to JB HiFi who basically lie to you about your consumer rights and have to be shown the document above to get them to do anything. Their weasel word "Minimum Voluntary Warranty Policy Guide" states that your rights can't be restricted then goes on to attempt to restrict your rights by stating that you are not allowed a refund after 3 or 6 months, depending on the price of the goods in question. The in store staff will always tell you that these are hard limits and you can't get a refund after these times, assuming that they haven't instead tried to fob you off to the manufacturer, which is also illegal.
https://www.jbhifi.com.au/Documents/Consumer%20Warranties%20and%20Refunds%20docs/YourRights_July2014_HR_02_v3.pdf
This is from personal experience at two seperate JB stores. Maybe others are better, but considering this guide comes from their head office I won't be holding my breath.
Please note that I don't blame the shop floor staff. I have worked in retail and know what it can be like. The problem comes from corporate policy and training. The staff seem genuinely surprised when you show them the ACL guide and point out where they are breaking the law.
Sorry. Rant over.
I'm typing this from a PC with an AMD CPU and running Ubuntu. Aren't I feeling smug right now.
Given how many people are affected, I can't see Intel replacing the hardware for free. This is worse than the infamous Intel floating point math bug.
I'm now waiting for people to re-run loads of benchmarks after the patches come out to see how much performance was lost.
2017 was truly AMD's year. First they introduced several interesting CPUs (if you are into multi-core designs), followed by some decent GPUs, and now this.
I switched from AMD when Intel introduced Core 2 Duo. This year I think i will switch back to AMD. Finally some competition (again)!
"This year I think i will switch back to AMD. Finally some competition (again)!"
Well, for the last ten years, AMD have been competitive, it's been a trade-off between performance hit on AMD for a security hit on Intel.
AMD published their performance specs, Intel didn't tell you their hardware was compromised, but made much of their higher performance with regard to AMD.
Turns out your confidential data has been exposed on Intel hardware for ten years. But you got improved performance. That trade off could have been a commercial decision, Intel concealed the information, that makes it a problem.
I have a Ryzen, and have yet to notice any noticible performance problems, be those from unoptimised software or just generally.
And i'm running Win7 (which apparently can't be done...?) so if there is a patch released to cripple my machine then i'm not installing it on the basis that this bug doesn't affect AMD chippery anyway.
And a point? The opposition has just slowed down by between 5% to 30%, depending on workloads. I think this has just handed AMD's Ryzen processor the performance crown.
While I tend to roundly ignore benchmarks on the basis of them being utterly artifical and very unlike real world conditions i'd be very interested to see how much of a difference this makes.
"While I tend to roundly ignore benchmarks on the basis of them being utterly artifical and very unlike real world conditions i'd be very interested to see how much of a difference this makes."
Not enough. AMD's top offerings benchmark about 25% lower than Intel's. Unless Passmark have added a lengthy SQL operations section to their benchmarking this year, AMD are unlikely to be taking the top spot.
In price-per-Ghz, on the other hand, they were already running away with, and this has just turned that into an even more substantial lead.
And apparently the earth's core (magnetic, not DDR) is slowing down enough to cause massive earthquakes in 2018.
I imagine that if one of those plate tectonic things decides to nudge its neighbor a bit more forcefully than usual then we won't really give two shits about a couple of nano-seconds off our processor speed.
And apparently the earth's core (magnetic, not DDR) is slowing down enough to cause massive earthquakes in 2018.
Oh hey, someone found another copy of "Nostradamus' Terror Predictions That Are The Same As Last Year" while cleaning the attic on the 1st?
I predict war with Iran on behalf of Israel btw. My bookie says odds are good.
**Intel Marketing Memo - Strictly Confidential**
OPERATION SNAILPACE
You’ll have read in the press about this new ‘bug’ which cripples the performance of our CPUs - this applies, of course, only to older models for which we are no longer making any money. Unfortunately, customers had begun to notice that our latest chips are not significantly faster than our older models. With this exceedingly cunning plan we are now able to market our new silicon as 50% faster than the old stuff (after adding on the standard 20% hyperbole)
This isn’t a one off either - from now on each tock will come with a crippling bug that destroys performance of older CPUs and each tick will fix the problem.
But remember - this is top secret so don’t let the press find out.
To just leave my Windows 7 install offline (except for games... I'm not expecting too much exploits through there) and Linux for online?
I'll need a bigger SSD for the dual boot. :( But at least I'll be safe and avoid slow downs (30% on webpages in linux is ok, but not in games :( )
If your running Amazon S3 or Microsoft's cloud then you can't possibly ignore this. The threat is really in multi user enviroments where one user could run software that compromises another.
However, this is not so critically serious threat for a single user desktop. Ok, the flaw exists and can be exploited and could be as serious as a keylogger. But... we've been dealing with that with AV software since forever and one could choose to take a view that they could accept the risk of a malware infection that could exploit this in a single user enviroment.
Short answer: no. The problem is in the cpu, not the OS, so your course of action is to either buy an AMD cpu, or use an older version of an OS that doesn't apply the throttling and then make sure you only use trusted software (e.g. an audio or video production workstation where you control every application that is installed and you don't browse random websites). Switching from Windows to a *n*x while keeping your affected cpu trades the same problem for the same problem.
To just leave my Windows 7 install offline (except for games... I'm not expecting too much exploits through there)
Ben, you might want to consider some additional research. Malware targeting online games and digital marketplaces has been around for quite some time. Comparatively speaking, game consoles probably offer a somewhat reduced attack surface, but hacks of both the PlayStation Network and Xbox Live attest that nothing is 100% secure.
Intel likely to sell a bunch of new chips - $PROFIT$
How about a free issue replacement chip of the appropriate generation to everyone so that we don't have to have a performance penalty or for their mistake - think its called warranty. MS will probably need to disable the hardware change = new licence thing though.
Intel could also regain trust by wrapping in the fixes for other big screw-ups too - like the management chip, which can be disabled or better still removed.
Oh and don't forget all the microcode changes too.
Intel could also regain trust
What, after this? And following so quickly on the Puma 6 fuckup, and the Intel Management Engine fuck up, and a good fistful of older mistakes. Intel are as trustworthy as Microsoft, Google, or even Uber. The US tech sector at its finest.
Intel: Fuckups R Us.
"Intel could also regain trust by wrapping in the fixes for other big screw-ups too - like the management chip, which can be disabled or better still removed."
That wouldn't regain my trust at all. Trust is only regained over time, by having a track record of not doing bad things.
replacing free of charge
I don't see that happening, ever. Especially not for skylakes and older. The cost would be exorbitant, especially for all the machines where it's soldered on, like my new laptop. At best, they'll make a deal on class lawsuit, giving everyone $5 for their troubles.
... selling new CPUs without upping any specs. How convenient indeed.
Technically it is upping the specs since the new processors wouldn't need to be kneecapped by software in order to be "secure". I imagine the marketing line will be something like 'our new processor run as fast as the old one was supposed to without pissing ring-0 data!'
Intel could also regain trust by wrapping in the fixes for other big screw-ups too - like the management chip, which can be disabled or better still removed.
That would be a step in the right direction, getting rid of their mis-management chip might help clear complications in design.
It's like that damn IOT, always adding new 'features' and 'snoop'.
When exactly did good sensible engineering go out the window in favour of marketing ideas which seemed great in endless ego preening meetings and then were badly implemented in a hurry on the way to the next ego preening session,..
...Just asking....I've not bought Intel since Atom. The only d/w purchase I regret of late, an Nvidia GPU, nothing but an ugly boot up and a series of weird incompatibilities...
When exactly did good sensible engineering go out the window in favour of marketing ideas which seemed great in endless ego preening meetings and then were badly implemented in a hurry on the way to the next ego preening session,..
Relatively speaking....around the time that "return shareholder value" became a corporate catch-phrase and started being used in just about every corporate operation manual and press release.
How about a free issue replacement chip of the appropriate generation
Which *might* be OK for desktops and mini-PC formats (assuming that the processor isn't soldered in place) but isn't going to work for laptops (I've yet to see a laptop made in the last 20 years where the processor isn't soldered in place).
Nope - it's going to have to be a nice new AMD-based MacBook Pro for me, paid for by Intel..
(And 4 new Xeon chips for my server, without the bug plus compensation for the downtime and effort of replacement - probably easier for Supermicro to just send me a new server sans drives and let me just swap them over..)
And 400 new processors for work.
You may be right about soldered processors in Ultra books or NetBooks, but I can assure you that many business laptops from people like IBM/Lenovo, HP and Fujitsu still have their processors in sockets.
It's just the ones where the supplier does not care about maintenance and also tend to use glue to hold the systems together that don't.
Want to bet Apple is considering expediting their homegrown A series chips into products that currently use Intel processors?
Apple just released their iMac Pro that starts at $5000. How happy are they going to be telling their customers they’re about to see a 30% processor performance hit?
Intel has been the Gold standard in processors, turns out it’s Copper Inside(TM).
Apple has Intel at their beck and call via the implied threat to switch to AMD or their own SoC, they might be able to get Intel to supply replacement CPUs if the ones in the just-released iMac suffer from the bug. Issuing a recall and replacing the CPU for free would be good PR after the black eye from the iPhone battery business, and wouldn't cost a whole lot because they couldn't have sold that many of the new iMacs yet.
I think you'll find that Core, i series and Xeon processors are all installed in sockets.
Atom processors are designed in packages intended to be soldered onto system boards. Everything else is in sockets that allow the processor to be replaced. But the problem here is that Intel keep changing the socket design, so you just can't put new processors into old motherboards.
This means that if you are upgrading a system piecemeal, rather than all at once, you end up having to replace not only the processor, but also the motherboard and probably the memory as well.
I would very much like Intel to be forced to support older sockets for longer, so you could give a system a relatively non-intrusive processor upgrade without having to tear the whole system down.
But the problem here is that Intel keep changing the socket design, so you just can't put new processors into old motherboards.
To be fair, AMd do that too, all that 'AM' number malarky so you're never 100% certain what you've bought is actually going to fit what you've got until they're both sitting in front of you and it's 'meccano time' Then there's all that fun with fans and coolers that make me sweat.
It could be an expensive repair
Yup. Cheaper by far to replace the whole motherboard in one go. Unless your SSD is also soldered on. In which case it's a tad more complex.
And would you trust your data to a hardware replacement company? Let alone having a problem with a Bitlocker or APFS-encrypted drive where the local key is in the TPM locker.
I forsee that it could get very expensive for Intel, very, very quickly.
Intel has been the Gold standard in processors.....
That was a joke, right ??
Going right back to the original 80386 which had a pretty serious Virtual 8086 mode bug (it broke the EMM emulators, needed a motherboard fix), onto the Famous FP bug - "it's not a problem, it only happens every few million instruction executions...."
All Intel have offered that's a "gold standard" is backwards compatibilty, and the only time they tried to drop that (at least in the hardware - Itanium), it didn't end well and let AMD in to define the 64-bit x86 architecture.
Although, to be fair, the Itanium issues were more about the the difficulties with VLIW architecture.
>>Intel has been the Gold standard in processors.....
>That was a joke, right ??
There have been other black marks too. The 3.8GHz Prescott P4s that throttle under repeated high load. The whole dirty Rambus saga, the 1.13GHz PIII that had to be recalled due to stability issues, the CPU serial number privacy scandal, etc.
Although Intel seemed to have turned a corner since Core 2 Duo came along, they've made loads of previous muck-ups.
About 15 years ago I worked for a graphics software co that discovered a bug in Intel graphics chipsets. Our drawing software used a command (new I think in Win98) to draw a rectangle with a single command. Our competitors continued to use the old method of drawing four lines in consecutive steps (automatically from the user POV, but slower). Intel hadn't implemented the new command so our software suddenly crashed if a user tried to draw a rectangle. Other less elegant applications worked fine. We of course were blamed despite our devs making a simple demo app whose only function was to draw a rectangle, crash and log the problem. Intel refused to talk to us even after the company sent a copy of the app on disk via registered mail. We sent out an upgrade that probed the hardware before deciding which command set to use, which would have slowed the operation slightly but fortunately not really significantly.
If your 'Intel Inside' PC suddenly goes 30% slower what chance they'll talk to you?
Glad I use Linux on AMD.
I worked for Intel.
Other parts of Intel would not talk to us, even when we were finding significant issues with the silicon.
The way intgl works is every now snd thrn, it thorws lots of silicon at the wall. Some stick. Some fall off. Some slowly slide down.
Once a product meets an internal gate - poof - all the people go off and wotk in other stuff.
In another life time I did onsite warty fix for IBM PC/laptops for Intel. We would get the occasional laptop wi fi does not work. We get the lap top in and it worked fine. Then one day we found out that all of these laptops were not working at star bucks. So you can guess what we thought. Then we found out that certain intel wi fi chips used an buggy driver that will no work on certain Cisco AP if channel 7 or 11 is used. Kind of like kissing a frog under a blue moon will cause the destruction of the earth. Turns out intel did have a driver update burred deep in their webs site. Kicker is you can never update the driver from IBM on windows again else you ran the risk of wi fi not working when waking up from sleep.(solution was another driver update from intel that was well hidden) You would thinking being we were fixing computers for intel some at intel would of told us.
That's not new. All SoCs come with an errata sheet of hardware bugs that must be corrected by software. Be it Intel, AMD, ARM, Infineon, Qualcomm ... there is no such things as bug free hardware. That's because hardware validations are much more costly than software ones. A CPU vendor will not modify a verified design if a bug can be easily workaround with a software patch.
That said, this Intel bug is of epic proportions. A huge optimization flushed down the toilet.
While possible it seems more a "don't ask don't fix". As the performance difference of fixing it in silicone I assume is a percentage point or two difference. But the percentage difference above is noted as 30% for some in software.
So while Intel has an advantage, I think they could afford to give in that tiny bit and still lead. But if it slipped through... they could only keep going for risk of big problems if found out. And after 10 years? Well even I think the little "problems" will just go away... to then get told by tge Dr they are worse. :(
Has they been doing this for years knowingly for performance gain against competition?
It sounds closer to what I would call a "Pinto Defect" were I to make up a clever term (without all that nasty exploding gas tank business of course). I suspect Intel may have known, and hoped no one would notice (it was cheaper to ignore it tan to fix it).
Is there actually a single intel chip that performs as advertised?
What about their other platforms, do they make anything remotely secure, I ask as I seriously wonder if intel actually make anything that works properly or even just as advertised.
For years intel has been allowed to hide it's fails and personally I think it is high time for their ignored coustomers to be put first for a change.
Or, you could just develop on processors from a company that actually cares about transparency. Get a Talos with POWER9 and you know *exactly* what's going on in every corner of that chip. Or, stay with Intel and AMD and wonder just which proprietary blob or NDA-restricted and hidden silicon bug is interacting with your application in bad ways or leaking your GDPR-protected data....
Or, use ARM. NXP has some nice little 2U boxes for sale, and there's lots of Chromebooks and such with ARM inside. Literally anything is better than x86 right now!
IBM have had their share of processor bugs.
I was working on an HPC account when we had to break it to the customer that there was a fault in one of the complex floating point instructions in Power 7 that could, under certain sequences of instructions, lead to unreliable results. The fix was to put a No-Op after each of the affected instructions (added by the compiler), resulting in a low single digit percentage performance hit.
Because they were an HPC customer (with a very large number of processors in their systems), who were doing large amounts of floating point arithmetic, this was a concern to them.
IBM ended up paying damages (I think it was by reducing the maintenance charges) to the particular customer I was working with. Not sure what happened to other customers.
But at the end of the fiscal life of these systems (they were only about three and a half years old, but hey, there was new money to spend), they were still providing more than adequate performance, and the customer was a little sad to see them go (as was I, as it was the end of a very enjoyable assignment).
I believe that Intel makes a best effort. If not, rumors would have already destroyed the company.
They hire engineers, these are mostly talented designers. A project is created to release a cpu update (same chip, only internal cpu number is up by one). All kinds or work and testing goes into the chip upgrade until freeze date. Thereafter, whatever is fixed is part of the new version number. If chip volumes are still significant, a new project would be created for the next bug fix release.
You cannot have a trickle of fixes to silicon occurring as discoveries are found. The real attempt is to fix the problem with microcode updates.
No, I consider that one releases versions, so as to be able to manage the manufacturing, distribution/supply chain and microcode releases.
Employing the best engineers is no use when PHBs insist on RDRAM. I cannot see rumours doing Intel any damage whatsoever when headlines across the tech press never did any serious damage before. Take a look for Intel's previous epic cockups in the main stream news. If they are mentioned at all it is only a few words because non-techies will tune out the moment a news reader tries to explain what speculative execution and virtual memory translation buffers are. Outside the tech news this will be forgotten by Monday. Customers will keep buying Intel despite FUCKWIT because most of them do not realise they have a choice.
Almost everyone who bought or sold Intel kit will pay for this mistake and only a small portion of the damage will land on Intel. A few of the big players like Google and Amazon might get a financial apology from Intel - if they can switch their orders to AMD/ARM. If you do not believe me, join the class action lawsuit and three year from now watch Intel settle ... with the lawyers.
"Is there actually a single intel chip that performs as advertised?"
What is "advertised" ? Where have you seen the precise definition ?
Yes, there are HW bugs, yes they suck and have been forever. And yes, this one is quit massively epic.
But frankly, the "works as advertised" stance is meaningless. I'm sure 100% of your home appliances have also flaws, and and they are a LOT less complex, therefore more inexcusable.
Back 15ish years, I had a part time job helping to maintain a Novell network. Mostly clearing jammed print queues, changing backup tapes, keeping Windows and AV signatures up to date and sorting out the big boss with a new monitor every other week. Oh, and every morning resetting the clock on the Novell server that lost near enough to 10 minutes a day. And every time a user signed in their local PC time got synced to the wrong time. By the time I would get back into the office it could be out by 30+ minutes.
We got it fixed eventually. CMOS battery I hear you ask? No, nice try but guess again. Ah, your UPS? No, that was fine too. Come on, what's the obvious cause that you're missing? Oh of course, there must be a bug in the CDROM driver running on the wrong kernel ring and causing the ticks to be slightly slower than they should. Well spotted.
Companies like Apple that offer a 'no questions asked' refund policy are going to be very very busy refunding every Christmas gift with an 'Intel insude'. You think Apple (and other vendors) are just going to take that hit? Intel will be paying compensation to vendors for sure, certainly for every chip shipped in the past 3 months - but more likely 6-12 months since this will affect the pipeline and inventory too. Who's going to buy anything with 'Intel inside' unless the vendor can guarantee that it's new silicon?
Consumer law will also come into play as Aqua Marina detailed in "I wonder where we stand legally now?" (above).
But the really interesting thing will be whether companies like HPE go to bat for their enterprise support customers. Because that'll be a killer whitebox shakedown. i.e.: 'I bought HPE and they replaced my server CPU' and 'I bought a whitebox and now it runs 30% slower and I've got no recourse'. It's little cost to HPE and a marketing windfall - they just have to jump on the cueball-intel bandwagon.
This is going to be good fun to watch.
Wow, this seems like a bad one. The PII floating point instruction was, IIRC, somewhat of an unlikely bug to happen, which is why it took a long time to spot - it really didn't affect most people too often and Intel could probably claim it worked well enough for most users. In any case, quoting Wikipedia
On December 20, 1994, Intel offered to replace all flawed Pentium processors on the basis of request, in response to mounting public pressure.[5] Although it turned out that only a small fraction of Pentium owners bothered to get their chips replaced, the financial impact on the company was significant.
A really nasty security exploit doesn't have that unlikely-ness protection - every black hat is going to have at it.
If Intel did the decent thing and replaced the CPUs (very unlikely) to their OEMs and assuming warranty/fit-for-purpose protections do their job and force vendors to make good (equally unlikely)...
then Apple may suddenly rue their tendency to solder things everywhere. Ditto for the Surface and its well-publicized glue-it-all.
One of the compilers that I use still has an option for Pentium safe floating point division. A lot of people didn't bother swapping them over I guess because OS and compiler vendors whacked together a quick work around and pulling out a CPU is beyond the technical knowledge of most.
Either way, I'm not looking forward to this patch. No customer is going to call support and say "hey there, because of Intel's screw up we're not getting adequate performance". It's going to be a bunch of your product is a bunch of ... Fix it yesterday.
Apple would be able to force Intel to provide replacement CPUs to them to deliver into purchases over the last x days/months, thanks to their implied threat to switch to AMD or their own SoC. Maybe even pay the costs of the recall.
Intel will just ignore the PC OEMs, because they can't make changes overnight to switch and will forget about it by the time it could happen when Intel throws some marketing dollars their way.
Take a look at what happened with the memory translation hub. Intel will watch calmly while the smaller vendors get sucked into the wood chipper. Even if the big distributors get free replacement chips from Intel, the cost of distribution and installation will land on the distributors. Anyone - big or small - who soldered Intel CPUs to the PCB is in for a world of hurt.
Given that when Cisco was royally screwed over by Intel's Atom issues and subsequently has just passed the costs onto buyers of their kit, what do you think will happen with this latest Intel failure? Pretty much the same and I'm pretty sure that Intel's contracts (hidden under several miles of NDA) will disclaim any responsibility for anything.
It is understood the bug is present in modern Intel processors produced in the past decade. It allows normal user programs – from database applications to JavaScript in web browsers – to discern to some extent the contents of protected kernel memory.
That's taken long time to surface then.
Guess it has been quite useful for 5-eyes.
Given that a major share of the CPUs out there are from one vendor, this is what you get. A hardware bug that permeates over several chips. Nice to see that AMD (minority report) doesn't have the problem.
Maybe Chipzilla Intel is too big and needs to be sliced and diced.
Thought experiment: What would Intel be if IBM had picked a different processor for its PC back in 1981 (Motorola 68000?)?
>What would Intel be if IBM had picked a different processor for its PC back in 1981 (Motorola 68000?)?
I'm guessing Intel would be a footnote in that case… Question is, how would the market have turned out if IBM had chosen Motorola's 68000 or Texas Instruments' 16-bit TMS9900?
There'd have been Amigas and Macs everywhere as software would have been easy to port and nobody would have bought a PC if they could have chosen one of the other two machines.
Also Microsoft wouldn't exist as they only got where they were today by making quick and dirty ports onto x86 and Windows.
It'd have been brilliant.
Nah.
National Semiconductor's 320X CPUs were pretty nice. And if Moto had finally succeeded in doing the 880000, who knows.
The 68000 wasn't ready for production when IBM were looking for a CPU for the desktop PC (Back in the early 80s I played around with a dev board where the 68k was clocked down to 4MHz, half its rated speed). Motorola also didn't have the support chips needed to build a complete system so everything like interrupt controllers, floppy disc controllers, UARTs, graphics chip drivers etc. would have to be implemented with lots of TTL.
The 8086 (and the version IBM actually used, the 8-bit-bus 8088) was designed to use existing 8080-series support chips, being bus-compatible with the older device. In addition its internal registers and addressing modes were also backwards-compatible so migrating existing programs from 8080-series CP/M versions was piss-easy. The 68000's "clean sheet" instruction set and internal register structure meant everything would have to be rewritten from scratch, especially boot code, low-level device drivers and kernel code.
Whilst I don't have the experience of the 68000 that you obviously had, I believe that there were a significant number of support chips from the 680X 8-bit family that worked with the 68000.
I'm trying to think back, but I'm sure I saw a working 68000 system around the time that the IBM PC was new. Of course, that could have been because small companies were more agile than IBM, but I think that the IBM PC was a very quick development which didn't start until 1980, a year after the 68000 was released.
No, I think that the reason why IBM went with Intel was mainly cost.
If the 68000 had been chosen for the single-tasking, floppy disk only original IBM PC, I think that we would have had multi-tasking desktop systems much earlier, because the 68000 was designed as a 32 bit family of processors from the outset, rather than being an 8/16 bit kludge of a processor that the 8088 and 8086 processors were (and became worse still with 32 bit and 64 bit evolutions)
Whilst I don't have the experience of the 68000 that you obviously had, I believe that there were a significant number of support chips from the 680X 8-bit family that worked with the 68000.
There were indeed, the 68K was designed to be compatible with them, and would switch to a synchronous bus cycle when it detected a 68xx peripheral. I remember using 68xx-series UARTs with them, as well as the 68230 parallel/timer chip.
I'm trying to think back, but I'm sure I saw a working 68000 system around the time that the IBM PC was new.
The Sun-1 would have come out around the same time, with the Sun-2 and Sun-3 roughly aligning with the PC-AT and PC-XT286s.
No, I think that the reason why IBM went with Intel was mainly cost.
I had been to a talk some years back (probably like 25 years ago) given by one of the original IBM-PC designers. The story I heard there was even more ironic, given the situation these days. IBM had been concerned at the time about having multiple sources for all the components. At the time the 68000 was only sourced from one location, while Intel had licensed the 808x processors to multiple manufacturers.
Ironic in that Intel has long since decided they wanted to be the only manufacturer of their chips. Had the manufacturing situation back then been like it is now, IBM would have gone with the ARM (OK, the equivalent would probably have been the Z80).
Motorola also didn't have the support chips needed to build a complete system so everything like interrupt controllers, floppy disc controllers, UARTs, graphics chip drivers etc. would have to be implemented with lots of TTL.
Bollocks. With peripherals there are some support chips you actually want from that particular CPU family to make life easier, but as long as those UARTs, FDCs, video controllers etcetera use the same signal levels and are roughly compatible with regards to clock speed, then you can just mix and match with maybe a bit of address select and glue logic. And custom chips to do the heavy lifting regarding tacking all those things together were quite common back then already
Look at the Acorn BBC B: 6502 CPU. 6845 CRT controller. 6850 ACIA (UART). 6522 VIA. Two custom ULAs, which replace several tens of 74xx logic gates each. uPD7002 ADC. 8271 FDC.
And the IBM PC? It was actually using that same Motorola 6845 CRTC for its MDA and CGA interfaces.
ULAs weren't available in the early 1980s hence the IBM's use of lots of TTL to get it to work at all. Yes the 68000 eventually got a number of dedicated support chips but that was a long time after the original IBM PC was in production. I think the 68008 (the cut-down 8-bit-bus version of the full-sized 68k) which any 68k-based IBM desktop would have used was also late.
The 68000's bus was asynchronous, relying on a data-available strobe from each peripheral and memory controller which was tricky to make work with regular clocked peripherals. It had advantages, it made mixing slow and fast devices on the CPU bus easy but it didn't work easily with the simpler existing chip families, not even the Motorola 8-bit designs like the 6800. It could be bodged to do so (I designed circuitry to do just that back in the day) but it took extra glue logic and wasn't elegant.
IBM had to go with what they could buy in predictable quantity numbers at a decent price that would do the job and the 68000 just wasn't there when the door closed.
ULAs weren't available in the early 1980s hence the IBM's use of lots of TTL to get it to work at all.
So the ZX81 and BBC B were built with technology that didn't exist? Interesting.
Now if IBM decided to just use standard components, that's another matter. But ULAs and other MSI-level custom chips were definitely available at the time the PC was designed.
ULAs (Uncommitted Logic Arrays) were a UK innovation, mainly designed by Ferranti.
They allowed some of the layers of the wafers to be a common design, with the last few acting as a customization to get the chip to do what was needed. You could think of them as a half-way house to an FPGA, but with the configuration baked into the last few layers of silicon rather than after manufacturer.
I don't believe that any US company really bought into using ULAs, but they were used, as already pointed out, for the ZX81, BBC Micro, ZX Spectrum and Acorn Electron to reduce the chip count.
But production problems with ULAs were one of the main reasons why several of these systems were delivered late. Ferranti eventually disappeared into Marconi, which was sold off when that company went bust, so the technology disappeared.
Thought experiment: What would Intel be if IBM had picked a different processor for its PC back in 1981 (Motorola 68000?)?
Oooh - that would have been nice. An unsegmented 32 bit memory space, unlike the intel 16 bit, then memory segments, ... Although the Intel CPUs came with a defined MMU (Memory Management Unit) which the M68k chips did not; I seem to remember 3 different sorts of M68k MMUs floating around.
If IBM had gone M68k, would Bill Gates have been able to get in on the act ? We might be living in a very different world in which Amiga was the big boy.
Well we'd have a much nicer assembly language to deal with. I still cry on the, admittedly now very rare, occasions that I have to drop down to x86 assembly level debugging and suffer the brain ache of an architecture that produces code that often seems to spend more time swapping values between limited registers than doing anything overly useful.
Remember the P67 Chipset. I laid down big $$ for a i7 2600k and had to yank my intel mobo and send it back to Newegg then I had to relicense WinDoze on the replacement board they sent me.
The P67 failure and complete recall details were always reported sketchy but allegedly the speed of the chipset degraded 6% over two years.
Remember the P67 Chipset fiasco/recall. I plunked down $$$for an I7 2600k and had to yank the mobo and send it back for a new one, then deal with microshaft for another Windows 7 license. Details released on the design flaw of the P67 were sketchy but the fact intel stood by was the recall wwas due to the fact that the speed of the chipset reuce 6% of 2 years.
Never got even an apology from anyone.
> I have to wonder if this wasn't something that the NSA insisted get put in under the whole umbrella of "National Security".
No, No.... you don't get it. The NSA doesn't need to "put things in". The Clipper Chip demonstrated to them that if they applied themselves a bit like they do with TEMPEST, they could find the stuff idiots put in themselves, and just get back to work reading things plain text. As "idiots" who didn't work for the NSA were able to crack the protections. This demonstrated that their adversary was more technically capable than they expected themselves to have to be.
In the computer world companies are pretty well matched on what they can do within a given instruction set architecture, so they have to find ways to "cheat" more performance out of a given ISA in order to get their competition over a barrel. There is an engineering axiom "You don't get ANYTHING for NOTHING". What we have here is the problem being improperly constrained from an engineering standpoint - someone said: "Let's get more performance without using more power or area on the die." Nobody said: "Let's get more performance without using more power or area on the die, and without impacting security".
It's a common theme seen all over tech that security is considered an after-the-fact-add-on, rather than integral to the design. The NSA knows that, and they can demand the full documented ISA from Intel and we'd never know it - the government are also not obligated to inform Intel of any problems found. Kind of like the FBI doesn't need to tell Apple how they cracked those phones after the terrorist shooting. They can just quietly go about their jobs and no one needs to be the wiser (one of the reasons I wish the FBI would just shut up about the encryption back-door thing). I think these agencies talk too much for their own good.
First, this is news and while I don’t buy into the whole fake news thing, I do buy into fantastic headlines without proper information to back it up.
There are some oddities here I’m not comfortable with. The information in this article appears to make a point of it being of greatest impact to cloud virtualization, though the writing is so convoluted, I can’t be positive about this.
I can’t tell whether this is an issue that will actually impact consumer level usage. I also can’t tell whether there would actually be 30% performance hit or whether there would be something more like 1% except in special circumstance. The headline is a little too fantastic and it reminds me of people talking about how much weight they lost... and the include taking off their shoes and wet jacket.
Everyone is jumping to conclusions that AMD or Intel is better at whatever. Bugs happen.
Someone claims that the Linux and Windows kernels are being rewritten to execute all syscalls in user space. This is generally crap. This sounds like one of Linus’s rants about to go haywire. Something about screwing things up for the sake of security as opposed to making a real fix.
Keep in mind, syscalls have to go through the kernel. If a malformed syscall is responsible for the memory corruption, making a syscall in another user thread will probably not help anything as the damage will be done when crossing threads via the syscall interface.
Very little software is so heavily dependent on syscalls. Yes, there is big I/O things, but we’re not discussing the cost of running syscalls, we’re talking about the call cost itself. Most developers don’t spend time in dtrace or similar profiling syscalls since we don’t pound the syscall interface that heavily to begin with.
Until we have details, we’re counting chickens before they’ve hatched. And honestly, I’d guess that outside of multi-tenant environments, this is a non-issue otherwise Apple would be rushing to rewrite as well.
In multitannant environments, there are 3 generations Intel needs to be concerned with.
Xeon E5 - v1 and v2
Xeon E5 - v3 and v4
Xeon configurable
If necessary, Intel could produce 3 models of high end parts with fixes enmass and insurance will cover the cost.
Companies like Amazon, Microsoft and Google, may have a million systems each running this stuff could experience issues, but in reality, in PaaS, automated code review can catch exploits before they become a problem. In FaaS, this is not an issue. In SaaS this is not an issue. Only IaaS is a problem and while Amazon, Google and Microsoft have big numbers of IaaS systems, they can drop performance without the customer noticing, scale-out, then upgrade servers and consolidate. Swapping CPUs doesn’t require rocket scientists and in the case of OpenCompute or Google cookie sheet servers shouldn’t take more than 5 minutes per server. And to be fair, probably 25% of the servers are generally due for upgrades each year anyway.
I think Intel is handling this well so far. They have insurance plans in place to handle these issues and although general operating practice is to wait for a class action suit and settle it in a fashion that pays a lawyer $100 million and gives $5 coupons to anyone who fills out a 30 page form, Amazon, Google and Microsoft have deals in place with Intel which say “Treat us nice or we’ll build our next batch of servers on AMD or Qualcomm”.
I’d say I’m more likely to be effected by the lunar eclipse in New Zealand than this... and I’m in Norway.
Let’s wait for details before making a big deal. For people who remember the Intel floating point bug, it was a huge deal!!! So huge that after some software patches came out, there must have been at least 50 people world wide who actually suffered from it.
The KPTI patches force all syscalls to go through a translation so the kernel gets mapped in/out of memory for each call. This is expensive, and up till now would have been called pointless.. there's a reason Linus wanted to call it 'Fuckwit'.
Do you imagine Linus would have let this in if it was as low impact as the floating point bug? He'd have had one of his famous rants and told them to go away and think again. Instead he basically fast tracked it - even into the stable kernel which is a major deal in itself. Microsoft have done the same..
Reading the comments on the kernel mailing list, it doesn't appear that it was just Linus who wanted to call it Fuckwit, if at all...
https://lkml.org/lkml/2017/12/4/709
2) Namespace
Several people including Linus requested to change the KAISER name.
We came up with a list of technically correct acronyms:
User Address Space Separation, prefix uass_
Forcefully Unmap Complete Kernel With Interrupt Trampolines, prefix fuckwit_
but we are politically correct people so we settled for
Kernel Page Table Isolation, prefix kpti_
Linus, your call :)
https://lkml.org/lkml/2017/12/4/758
On Mon, Dec 4, 2017 at 6:07 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Kernel Page Table Isolation, prefix kpti_
>
> Linus, your call :)
I think you probably chose the right name here. The alternatives sound
intriguing, but probably not the right thing to do.
Linus
Amen. A 5%-30% performance impact, with major changes to fundamental kernel operation? And no epic Linus rant on even the merest suggestion to merge this, but rather fast-tracked and back-ported to a stable kernel version?
Let's face it: hell has indeed just frozen over.
Don't worry, Hell has thawed, Linus has just ranted.
@CheesyTheClown
> The information in this article appears to make a point of it being of greatest impact to cloud virtualization, though the writing is so convoluted, I can’t be positive about this.
Imagine you are running your company's data sharing with manufacturing in a cloud hosted server on shared hardware. This bug basically means that if any other service run in user-space on the same shared hardware had the code required to poke at the kernel, it could bypass ALL virtualization boundaries and take ownership of the whole platform at Priv-0 level. Essentially, if this bug is not quashed and RAPIDLY, the entire virtualization market on these Intel platforms is at risk - as well as the sanctity ad security of the data currently trusted to this market.
> Keep in mind, syscalls have to go through the kernel...
Yeah, and now imagine that you have to stop everything that needs an interrupt so the kernel can lock down and handle kernel level operations while the rest of the user-space tasks sit there and twiddle their thumbs... every time this happens. That's potentially a 5 millisecond hit every 15 milliseconds, and that's where the potential of the performance impact lies. Systems that have a kernel-level VM handler running at Priv-0, will have to UNLOAD anything belonging to a less privileged worker so that it can flush the speculative cache, then handle the kernel task and flush again to continue with the VM guest tasks.
There are some systems that will have a much worse impact than others, for example machines that run over-provisioned guest VMs that need to share a common resource pool will be impacted more during the VM switching (reduces the value of each VM host), machines that run VoIP bridges have a 125uSec interrupt for analog sample-to-packet timing. Machines that do anything with a physical serial port will trip interrupts constantly.
> I think Intel is handling this well so far. They have insurance plans in place to handle these issues and although general operating practice is to wait for a class action suit and settle it in a fashion that pays a lawyer $100 million and gives $5 coupons to anyone who fills out a 30 page form, Amazon, Google and Microsoft have deals in place with Intel which say “Treat us nice or we’ll build our next batch of servers on AMD or Qualcomm”.
Well you may be bitter and think that your buying dollar (or Krone), doesn't provides any power anymore, but the truth couldn't be farther from that. Yeah, so you're aware that Intel will slime their way out of it, that has a PR cost. Yes AMD has CVEs, but I can't recall them having a Pentium 90 math coprocessor issue like Intel, a SATA failure like SandyBridge, a Floating Point bug that needs to be fixed in SW (by third parties), a Management Engine that can't be turned off that leaves systems exposed unless you stop feeding the system power, and now this cache accelerator bug that can adjust performance numbers down to 0.66% of the advertised specs under the only SW fix available (again done by third parties). I'm also aware that the tempo of fairly embarrassing problems are increasing, so if I were a person building a system and saw an increase in the level of ineptitude from a multi-national company, and they only left me with a stack of paper to fill out for my $50 and a still broken POS system that 30% slower than the day I bought it, I'd be so jaded I wouldn't buy their crap any more (and I wouldn't be alone).
If I worked for the Intel PR team in the EU, the first thing I would have thought when reading this article is "uff da..." Even Apple can't get away with a known design flow affecting the product near the end of the design life - see their battery fiasco as of late. Allowing your lawyers and your insurance to cover your screw ups only works a few times... Personally I foresee an investor meeting where someone's head is going to need to be offered as a result of the stock price hit.
>There are some systems that will have a much worse impact than others, for example machines that run over-provisioned guest VMs that need to <
Hmmm. Does this mean it will have no impact at all on my WinXP virtual macines on ESXi 4, because (apart from the fact all components are out of support), wasn't this context switch on every interupt the reason XP ran so crap on ESXi 4 ?
> Hmmm. Does this mean it will have no impact at all on my WinXP virtual machines on ESXi 4, because (apart from the fact all components are out of support), wasn't this context switch on every interrupt the reason XP ran so crap on ESXi 4?
If you're running an ESXi that old and an OS that old, it doesn't sound like "patching" is in your world-view, so yeah it wouldn't "impact" you more than knowing your system is more vulnerable today than you knew it was it was yesterday. ;-)
>so yeah it wouldn't "impact" you more than knowing your system is more vulnerable today than you knew it was it was yesterday. <
Well, interesting point: if it was this relentless context switching that made XP run slower than Win7, and made XP run like crap on VMWare, it would seem to indicate that XP is immune.
"Yes AMD has CVEs, but I can't recall them having a Pentium 90 math coprocessor issue like Intel, a SATA failure like SandyBridge, a Floating Point bug that needs to be fixed in SW (by third parties), a Management Engine that can't be turned off that leaves systems exposed unless you stop feeding the system power, and now this cache accelerator bug that can adjust performance numbers down to 0.66% of the advertised specs under the only SW fix available (again done by third parties)."
So much this! (PS - I'm sure you meant 66% rather than 0.66%)
Initial benchmarks say there's an 18% hit on I/O heavy operations on Linux.
https://hothardware.com/news/intel-cpu-bug-kernel-memory-isolation-linux-windows-macos
Click through to Phoronix to see more Intel share price graphs.
I'll be surprised if this shows up anywhere besides El reg. if it does make the majors here in the states they will trivialize the issue and in the end, all the tech inept populace will ignore the baffling techno jibber-jabber they just heard because it wasn't explained in a catchphrase.
But really, for the tasks the average user puts their laptops to, they won't notice a performance hit. They might notice a battery hit, but many CPUs are faster than their user's needs. The enthusiasts (gamers etc) and professionals may notice, and they the types more likely to read tech blogs.
and professionals may notice
And *all* the big cloud providers will now need to buy 25-30% more hardware to meet the expected capacity plan. I don't imagine that Google/Amazon/Microsoft are going to be too chuffed with Intel right now. And since they (with Apple/Twitter/Facebook et. al) are probably the biggest buyers with the most clout their voice is going to count.
If they all switch to buying AMD (assuming AMD has the capacity to build that many processors) or re-tool for ARM then Intel sales are going to nosedive.
21 hours after you posted that, I can report that it is on the front page of the BBC news website and at least one major UK newspaper. Yes, that surprises me, too, but perhaps it is just too good a *story* to pass over and, after all, even normal people use computers these days.
@whitepines
You mean like Intel's Management Engine? I'd much rather have AMD, at least they work as advertised and (as yet) aren't known to be full of security flaws like the Management Engine is. Obviously I'd rather have no 'security' processor whatsoever inside my CPU, but given the known bugs with Intel's Management Engine I'd still rather take my chances with AMD.
Ryzen is already a far better deal than Intel's offerings, especially when you factor in this latest flaw that makes Intel's chips run even slower than advertised. Intel trying to flog buggy, half-working hardware is getting beyond a joke.
@conscience
Oh, I fully agree on the ME, but AMD is just the other side of the same coin. You need to look beyond x86 to have any chance of getting away from the security problems and general bugginess of both Intel and AMD.
The simple fact is, the ME just got done being put under the microscope in 2017, and it was found to be swiss cheese. It wasn't even looked into for the previous near-decade it was present on Intel platforms, and for all that time it looked impenetrable / no one cared.
Read that last sentence again. Something can have 0 CVEs published just because none of the "good guys" bothered to try to hack it.
Before assuming the PSP is better "just because it's AMD", wait for the PSP to eventually go under that same microscope. I'm sure at least one critical bug will be found, that's really the nature of having closed-source secret sauce "god-mode" security processors built into your chips.
For what it's worth, I prefer civil discussion to anonymous downvotes, too!
What has stopped me from the Ryzen which I almost bought the 16 core for $699 on sale was the fact it had issues with Hyper-V running 32 bit VM's and other Hyper-V issues such as features not supported on AMD. If they had the full support, Ryzen 1950x would certainly have been my next choice.
"So you think the security problems in the world don't matter other than the influence of which processor you buy"
Focus. Tackle each issue in its own place. We've discussed other security issues in other contexts. Actually, in this context, the issue isn't so much the security issue, because like many others, it can be fixed, but the cost of fixing it.
@Doctor Syntax
On the desktop and server end absolutely, but that's an awful big chip to stuff in a laptop. For laptops or small desktops I've been recommending ARM mostly, since ppc doesn't have anything decent in that space. Of course, if you run Windows you need x86, but if you run Linux you're probably mostly set to switch already.
ARM just got a lot more competitive against a 30% slowed Intel laptop processor, that's for sure!
Ecofeco.
My suggestion is wait a few months for people who are concerned about this issue to replace their CPUs with whatever intel brings out (or AMD).
I'm expecting the second hand market to be flooded with cheap intel chips in the next few months. I might be able to upgrade my 2nd gen i7 for something much faster (even with the performance hit) for very few ££!
Let's say that I am already running workloads on a virtualized x86, say in EC2 or GCE. Doesn't that make me immune to a first approximation because I'm not really running on "true Intel"?
I say "first order approximation" because under the covers my cloud hoster is. So I am exposed insofar as say Amazon's servers could be compromised and then they could come for me.
I'm trying to work out which parts of this are a cluster and an opportunity at the same time for AWS and friends.
This post has been deleted by its author
Well Bloomberg and The Grauniad have had a go, both citing The Reg.
Murdoch also had a go, citing their own special expert who got everything arse about face while he handwaved some technobabble claiming all vendors are bad as each other (well, the other vendor is AMD and their CPUs don't have this problem).
Except for this part:
Whenever a running program needs to do anything useful – such as write to a file or open a network connection – it has to temporarily hand control of the processor to the kernel to carry out the job.
Not a good way to think about this IMHO.
The way to think about this is that the CPU is in charge, but gets dragged along by the code (alternatively, there is a token being handed down the instruction path) (alternatively, the CPU plays "parser" of the code, leaving a foliation of the changing memory state along the time dimension). The switches between protection rings just follow a small finite state machine, where the transitions occur on kernel call, interrupt or return-from-kernel.
If true then this is a monumental cock up. 30% performance hit is huge and costly - I can't see corporate customers walking away from this without demanding compensation or new chips. If you sold a car to someone and its performance was 30% less than advertised you wouldn't get away with it, and at least in the EU there is little protection for Intel. This is essentially a manufacturing defect within the limitation period of EU law.
An if has been committed which disables the changes for AMD.
It depends on what's going on in a system. If it's IO heavy, this could be quite bad (lots of interaction with the kernel). If it's compute heavy, possibly this isn't too bad. And it also depends on whose code is running. It's only a problem if you run someone else's code arbitrarily on one's computer.
For Google this isn't too bad. The bulk of their machines are running Google's own code and dishes up search results to Internet clients. For search, maps, Gmail servers Google could take the risk and ignore the patches because they're not running arbitrary code. That's a good thing because the bulk of Google's costs is energy.
For outfits running other people's code (Amazon?) this could be bad because they're all about running other people's code for them. So they need the patches, it will slow them down. And a lot of their cost is energy, so their cost is going to rise.
For the rest of us mere users our computers are going to be slower and therefore use more energy for the same tasks.
The real killer is if this exploitable in Javascript because a huge amount of what happens these days relies in users having Web browsers that are configured to accept and run Javascript from anywhere. Which all of a sudden looks hideously dangerous. That could be a massive problem for Google; if we all switch off Javascript then Google's services don't work. And nor does anyone else's.
This is not going to be exploitable in pure javascript. (With sufficient understanding of the VM you could control the machine code that's executed. But you can't get the VM to execute arbitrary instructions. So its very unlikely this will be exploitable without help from a bug in the VM.)
WASM, on the other hand, might afford you enough flexibility. It will depend on the nature of the bug.
"It depends on what's going on in a system."
Firing up top in Linux shows several processes, mostly daemons, actively using CPU with nothing actually being done with the system so even in the absence of IO there's context changes taking place even if it's just a matter of waking up daemons to find that there's nothing to do. I'd guess that much the same situation applies with Windows.
I don't quite get all this griping about one company or another such as this company doesn't give one hoot about security much better with this.
If one thing that can learned from various security warnings over the years is that everyone suffers from them, in some case they can be patched without issue in others (which is unfortunately the case here) the patch causes a performance hit.
For those crowing about AMD here was a security issue on their Opteron
https://www.theregister.co.uk/2016/03/06/amd_microcode_6000836_fix/
Luckily for them it could be fixed with a microcode update.
A lot of big American companies are risk averse due to the large fines that can be imposed for negligence, however humans are involved in all parts of the process so mistakes can, will and are made.
Lessons will be learned but unfortunately another issue will slip through.
Apple are facing class action suits for slowing their CPU's because of old batteries.
Other than not telling Apploids what they were doing I see no huge problem with that.
I wonder when the first CA is going to hit Intel.
I've been using AMD cpu's since the K series in the late 90's damn I feel old now
Smug but old :)
"I've been using AMD cpu's since the K series in the late 90's **** I feel old now
Smug but old :)"
I know what you mean. I've been using solely AMD since my 486 DX/2 66MHz CPU.
What can I say, I like supporting the underdog.
>I've been using AMD cpu's since the K series in the late 90's
As you're a long-time AMD fan, would you satisfy my curiosity? Was the increase in your electric bill through the use of Bulldozer-class CPUs offset by the ability to supplement your furnace with your computer during the winter?
In my case yes. seriously My room was consistently 10 degrees warmer then the rest of the house. Toss in a descent video card and I had to use monster after market cpu cooler or the damn CPU would over heat and shut down.
Just wondering, what are the chances that the Linux fix is the best possible? Perhaps Apple and/or Microsoft will come up with a faster solution, perhaps a better general solution which could be applied to Linux as well. Or, perhaps OS-specific fixes taking advantage of characteristics of Mach or Windows that Linux doesn't happen to have. (Not a slam on Linux, just noting that the OSes have different architectures and features that could conceivably come into play in designing a fix.)
There's a reason AMD didn't follow Intel down this hole. They've known about it for nearly 20 years and Intel did too and chose to implement anyway. Find an old copy of VMS Internals and Data structures. If VMS couldn't keep you out, it killed itself to limit the damage. "Page Fault IPL too High". Intel got the engineers but AMD got the IP when the lab/fabs were sold off. Evidently, AMD paid attention.
Intel got the engineers but AMD got the IP when the lab/fabs were sold off. Evidently, AMD paid attention.
The DEC engineers that ended up at Intel were, for a large part, from the software side of things; the compiler group went over almost lock, stock and barrel. AMD got a number of Silicon Wranglers who had been working on AXP and chipsets; several AMD processor subsystems bear a strong resemblance to their AXP counterparts.
More than you know... I believe, without knowing, that early AMD64 and Alpha processors were designed to be pin compatible. Certainly if you looked at the motherboard for early AMD64 you would see an Alpha southbridge.
I reckon there was a lot of IP sharing, and Jim Keller of course.
"early AMD64 and Alpha processors were designed to be pin compatible."
Close enough. See Athlon, Hypertransport, and such.
Start at e.g. https://en.wikipedia.org/wiki/Athlon
"The Athlon architecture also used the EV6 bus licensed from DEC as its main system bus. Intel required licensing to use the GTL+ bus used by its Slot 1 Pentium II and later processors. By licensing the EV6 bus used by the Alpha line of processors from DEC, AMD was able to develop its own chipsets and motherboards, and avoid being dependent on licensing from its direct competitor."
As it is .Not spends about 30% in kernel according to TaskManager when loaded and working hard, how much will this cripple .Not, M$ pushes the fact that the code is secure, and this security is done in the kernel.
It is certainly going to be amusing to watch the fallout from this,
It's the move to and from kernel that is penalized, time spent inside kernel and time spent outside kernel isn't penalized. Of course, hardly any system monitoring programs will tell you how many syscalls or context switches different programs cause.
> Think of the kernel as God sitting on a cloud, looking down on Earth. It's there, and no normal being can see it, yet they can pray to it.
No I won't because I understand how a fucking kernel works. This is a remarkably stupid comparison; especially when we know how kernels work. We can evidence their existence, there are books relating to the design of them by mere mortals. They do not care about trivial areas of our lives!
I'm more mad about this than the 20-30% of CPU limiting that's going to go on.
This post has been deleted by its author
*slow clap*
And this hoo-haa triggered this memory of the Usenet Oracle and Windows 95...
The original can be found here : https://internetoracle.org/special/windoze2.cgi
The Usenet Oracle has pondered your question deeply.
Your question was:
> O great Oracle, the one who sees all and knows all, please accept
> this humble question from thy grovelling supplicant...
>
> Why is Windows 95 Beta so bug-ridden it's not funny?
And in response, thus spake the Oracle:
} THE SCENE: A dark antechamber of the Gates estate, dimly lit by three
} 20" monitors suspended from the ceiling. In the middle of the room is
} a Pentium/100Hz, sheathed in a black casing. Three programmers dance
} around the machine, chanting horribly. Their pale, clammy complexion
} is cast hideously by the light of the monitors, rendered even more
} repugnant to the watchful eye bye the 60Hz flicker of the monitors.
}
} FIRST PROGRAMMER: Thrice the brinded net hath mewed.
}
} SECOND PROGRAMMER: Thrice, and once the Warp-pig whined.
}
} THIRD PROGRAMMER: MacHarpier cries. 'Tis time, 'tis time!
}
} FIRST: Round about the terminal go;
} In the poisoned upgrade throw.
} Code, which by a student done
} In minutes numbering sixty-one.
} Run-time error, protection fault,
} Crash ye first, crash ye shalt.
}
} ALL [as they dance around the Pentium]:
} Double, double, toil and trouble;
} Tempers burn and data bubble.
}
} SECOND: Fillet of a Sound Card bake,
} In the Pentium no sound make;
} Point of arrow, click of mouse,
} Scream of user, frightened spouse,
} OS/2's net use appeal,
} Steve Jobs' look and Wozniak's feel.
} For a charm of powerful trouble,
} Like a hell-broth boil and bubble.
}
} ALL: Double, double, toil and trouble;
} Tempers burn and data bubble.
}
} THIRD: Click "Start" button, speed of slug,
} You would think you forgot the plug.
} Multitasking, ha ha ho
} If just one worked you'd be good to go.
} This should grab those straggling few
} Who aren't using DOS 6.22.
} Now we shall the Mac eclipse,
} While curse words cross our users' lips.
} Leave the errors in so we can fix
} And sell more...Windows 96!
} And so we will release the Beta
} For corruption of their data.
}
} ALL: Double, double, toil and trouble;
} Users buy, our profits double.
}
} SECOND: Compile it with errors through,
} Since the users have no clue.
}
} [Enter BillGate to the other three programmers.]
}
} BillGate: O, well done! I commend your pains,
} And everyone shall share i' the gains.
} And now about the program get,
} But NEVER use it on OUR net.
} Security is scarce put in.
} [Beeps of PONG heard in the background.]
} [Exit BillGate.]
}
} SECOND WITCH: By the usage of my UMBs
} Wicked Windows this way comes.
} Open locks,
} Whoever knocks!
}
} [Fade to black.]
}
} Remember, Obsolescence isn't an accident, it's an art form.
}
} You owe the Oracle a signed, handwritten manuscript of MacBeth, and a
} copy of the Windows upgrade for the P6.
Why has no one asked the most critical question... do I need to patch my Pi?
Also, given the number of recent SNAFU's like this and Apple's login cock-up etc... I'd like to request a new +1 level of FAIL icon so that we can ring in 2018 with a new, appropriate level of numbnuttedness that the existing FAIL icon er... fails to encapsulate this new level.. Something like a double face palm icon?
Many years ago when I was a hardware developer we used to call this the Intel Factor. In the spec sheets you used to get min / typical / max figures. You knew that when the production chips came to replace the test ones the figures would be the least advantageous. You had to always design for the very worse case + 10%
And for that matter, my 8 racks of Intel-based servers in the server room?
Seriously, that was my first thought.
If this were a car, and, under certain circumstances the brakes were applied without instruction from the driver, there would be a recall, and the problem would be fixed, with the manufacturer taking the financial hit.
Why is it different with computer hardware? Why does the world just shrug its shoulders and just go "Oh well"?
I've got a lovely Lenovo I3 laptop that runs Linux Mint beautifully, and I'm delighted with it. This has majorly pissed me off.
In the server room, we run racks of Wintel HP servers running mission-critical SCADA software, and it looks like we're suddenly going to get a lot less bang for our buck.
Why can't we return them the manufacturer, who in turn returns them to Intel?
It's a rhetorical question, I suppose. I mean, what would they be replaced with, since Intel has no iron that, you know, actually works *properly*.
But that's the point, isn't it. Why should we put up with this?
Mistakes happen, of course. But there's no reason why we should just "put up with it".
Author: Dave Hansen <dave.hansen@linux.intel.com>
Date: Mon Dec 4 15:07:34 2017 +0100
x86/mm/pti: Disable global pages if PAGE_TABLE_ISOLATION=y
Global pages stay in the TLB across context switches. Since all contexts share the same kernel mapping, these mappings are marked as global pages so kernel entries in the TLB are not flushed out on a context switch.
But, even having these entries in the TLB opens up something that an attacker can use, such as the double-page-fault attack
...
Funny, I thought video conversion would be minimally hit, as it consists mostly of:
read(very many bytes); process (very long cpu intensive code); write(very many bytes)
Where, if the read and write are implemented as sending big requests to the kernel, should be minimally affected. The processing portion of it is surely 99% of the whole processing time anyway?
I could believe things like a database would slow down, when it's hopping all over the place on disk looking for/writing data. I could believe facebook slows down alot, because browsers are doing lots of itsy bitsy tiny reads and writes to both disk and net, and lots of small updates of the screen to animate all the gifs and what not.
It's already been established on Phoronix that games are not affected - if it's mostly user mode code there isn't going to be a noticeable impact. I would have thought video editing/transcoding also fit into that category.
Personally I'm more worried about virtualisation.
If people are running their servers in a virtual environment (VMware ESXi), does this issue potentially open VM to VM communication vulnerabilities, or if the hypervisor still effectively isolating privileged memory correctly between VMs? I can understand that this may still leave the issue open inside the VM OS if those are unpatched, but as long as the hypervisor is still providing isolation, the risk is restricted to issues inside the VMs themselves.
That's the best question I've heard asked here. What about Hyper-V as well? Host hypervisors themselves would have their own exposure to this bug. Hijacking a VM is one thing, hijacking the Host Hypervisor is something of a different order. I'll bet VMware and MS Hyper-V are testing the crap out of this as I type.
I took a little time in my lunch break to try and figure out what was so big that it had intel and co running scared....
I found a Video from 2017 (wow that long ago!) showing reading a EC2 instance from an EC2 instance without any kind of permissions...
Is this what the fuss is about? It looks pretty scary to me!
https://www.youtube.com/watch?v=yPZmiRi_c-o
Intel is obviously cutting back on it's hardware validation procedures in their rush to put products on the market to compete with AMD Ryzen, Threadripper and EPYC processors.
As a for instance, AMD took almost 3 years to validate Ryzen and EPYC prior to launch. Intel seems to be taking less than a year.
Of course QC will suffer. In fact this bug has been known for several months.
When i worked at intel, apart from the fabs, which were given a pass, mainly as sod all people worked in them and they were so key to Intels profit, the rest if intel operated in a lunatuc, ranknrate, paranoia, run around hell.
Intel need to realise that their core comoetence is running software.
This post has been deleted by its author
This post has been deleted by its author
Presumably, knowing that the CPU's have a fundamental design flaw in them, Intel must now ask all vendors to cease selling defective processors.
It's one thing to discover a problem in your processor that requires an OS change to fix. It is entirely another to knowingly sell any more of these defective chips.
This is quite a big story as when the updates eventually get rolled out we will see significant drop in computer resource availability. I expect to see
- Websites / services becoming slow or going offline
- Lower productivity, workers waiting even longer as their computer grinds away
- Increased business cost to cover the loss of computer resource
- When the embargo is lifted and details of the bug become known, there will be a slew of new malware, especially targeting those who refuse to update
-Big jump in AMD shares price and a crash in Intel's
-But no SEC insider dealing investigation into why the Intel CEO sold his shares before Christmas
"- Websites / services becoming slow or going offline"
Good point about going offline. A server currently running under high CPU load could have its throughput reduced below the required workload and then there will be backlogs or service unavailability. This could be a headache for admins who will have to decide whether servers can tolerate the performance hit before installing the patch.
According to HotHardware.com (see https://hothardware.com/news/intel-cpu-bug-kernel-memory-isolation-linux-windows-macos) the linux page table isolation is being applied to all x86 CPUs not just the intel ones with the problem - and according to the linux kernel diff log the patch was submitted by an Intel engineer (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c313ec66317d421fb5768d78c56abed2dc862264)
Haven't used AMD in a long time, but as AMD can't really be used in a hackintosh, won't be switching to AMD any time soon. Besides, I've had trouble with every AMD processor I've ever had, and not with Intel, so going to stick with Intel unless AMD gets better and can be used in a hackintosh as easily as an Intel processor.
ARM is also affected by this, has no discoverability of built in devices on many platforms, and is infested by binary blobs for many devices. Its documentation is generally appalling.
Its only saving graces are the lower power and the price. Passable for a phone that everyone accepts is landfill after two years (idiocy, but the prevailing opinion), unforgivable for general computing devices.
I can't help wonder how this will effect people who've encrypted their entire hard drive with VeraCrypt, or at least run something out of an encrypted file 24/7.
Because surely the reading and writing, not to mention the use of an encryption driver will be pretty heavy on system calls.
Drive encryption should not require more system calls than before being just further processing. But there Is the theoretical possibility of user land i.e. browser discovering your keys then somebody physically thieving the machine.
ARM may have issues but being a forty year old 8 bit micro with stuff lashed on isn’t one of them. Well it has evolved but started as a 32 bit (26 bit address limited earlier) machine.
The hypervisor is in a fuzzy way kernel so in an unqualified way I think it can leak just the same but I may be wrong.
(I rather like AMD, used them preferentially back when I was building my own kit, in P4 days. I've been thinking of building a mini-ITX with my son and was already preferring AMD's Ryzen).
Once the dust settles, this may end up more negative to Intel than positive to AMD, sadly.
- it will be a big hit on Intel's financials
- assuming that it is relatively easily to design and implement a fix in silicon, this is a massive suck on Intel's current CPU lineup, but won't affect them much once the fix is in.
- Intel won't lose OEM business on existing systems (though the OEM's sales might drop). It need not lose business on future systems once they've fixed their silicon. It's vulnerable on a narrow range of systems where the choice of CPUs is still not finalized.
- Apple's rumors of switching to ARM arch concretizing? That would be a massive hit on Intel, but would do nada for AMD.
- OEMs may not at all be happy, but at least Intel has the $ to indemnify them, should it be decide to do so, or be forced to. AMD's $400M/Q losses just wouldn't allow it.
- Intel has always won because most premade system use them. A big part of that is their Intel Inside bribery incentive program. After this, they will just double down on spending big $ to wine and dine the OEMs and there's little AMD can do about it. Yes, massive reputational damage to Intel, but will it stick? Marketing $$$ is Intel's strength.
This will distract Intel, for sure. And it will cost them too. But will it change things a lot, much as we'd like more competition in the X86 space?
On the plus side for AMD, this couldn't have happened at a better time for them. They've had their moments when they were significantly better than Intel and right now is very much one of them. Had this happened 2 years ago, they would have had little to capitalize on.
Now, with the new Ryzens they have a much better story to tell to customers. If they can develop that into more OEM opportunities long term that'll be awesome.
Its not a flaw -- its an NSA demand! How else can they spy on the whole world easily.
Just keep in mind that all those used computers/CPUs being sold by CHina and Russia very new silicon guts with their hardware hacks. Yup innocent American set up by hardware as fifth column attackers.
Conspiracy anywhere 2 people gather. ROFLAMAO
VMS to x86 happened the first time back in the late 80s. USAF unit was working with weighed running an air combat simulation model designed for VAX on x86 hardware for about $5000 plus cheaper version of software for x86 at about $25K. Even live support was cheaper by factor of 1/3.
But as always study directors assumed General level staff would be more impressed if we ran on Genuine VAX equipment that we got an incredible bargain of about $250K plus the bonus that software on VAX hardware was x20 the price as well on government contract.
Virtualization science is approaching the point that -- soon native instruction sets will not matter much. Heck in 20 years you will probably be able to 3D print your own designed CPUs at home. Then with a little generic Virtual Mapping and kernel-hypervisor building assisted by COTS software on another computer...you can then run VMs on your own unique instruction set CPU.
Mind you for 7-10 years those VMs will probably still be running mostly x86 software. But eventually Computer Scientists and hobbyists will get their dream of running software based on whatever arbitrary symbolic operations language is currently in vogue or that they want to invent (Forth Reborn etc)...and X86, ARM and all those hardware vendor instruction sets will be dead.
But in the mean time going rogue to avoid 90% of current software invention needs more specific needs than "I hate big groups and companies" and "I want a smaller pond so I look like a bigger fish". So low cost low power embedded or supercomputing still tend to be the more common refuges from x86.
Speaking of speculative processing... I know it sounds like some wingnut conspiracy stuff but one does have to wonder if some group like the NSA (or the Russians/Chinese if you want to be really crazy) insinuated an ubiquitous yet subtle design bug into the core design back a decade+ ago during the height of the CARNIVORE craziness? This could have allowed them covert access nearly worldwide for almost any Intel based Windows, Linux, and Mac systems apparently. Just an idea from my food-for-thought processor. ;-)
I don't know the details about this bug, but if speculative execution from user to kernel mode is triggering the issue as described in this article, then wouldn't memory barriers fix the problem? Couldn't they just insert them at the start of system calls, interrupts, and context switches? There must be more to it.
I believe it's speculative execution within the kernel, resulting in information disclosure to user mode due to a timing attack on the shared processor cache which can undermine KASLR.
So they split the user/kernel page table set, which had always been shared before for performance (and only split to provide 4GB/4GB space for both on x86-32, which suffered the same kind of impact).
One interesting way to approach this might be to limit the cache allocated to certain processes, but that's an advanced feature found only on recent Xeons, and I don't think anyone's actually planning to do that - it might have an even worse impact.
I have an Intel i7 3770 and have been running the Windows Insider preview which has apparantly had the patch applied since November and I haven't noticed any performance drop in anything. Even testing since I heard of this I can't find anything more that 1-3% which can easily be for other reasons. Even gaming with a cpu intensive game (BF1 64player multiplayer on amiens) which maxes my CPU out I can't see any difference. Servers and virtualized enviroments might be different but there will be little or no effect on standard desktop use.
Gaming is pretty much unaffected - it doesn't involve the kernel, you're talking direct to the GPU. Most desktop apps are not IO intensive so you won't see a big hit. It's not great news for stuff that slams the disk and network, or works in real time - however, as we said, if you have PCID supported, the hit is minimized.
C.
Not only have I had to put up with the fact that Bluetooth does not function properly on all the Intel devices I have but now we are possibly going to lose 5 - 30% processing power? I want 20% refund. I think that's fair! I have always found Intel tech to be worse than AMD. And now, AMD lost the battle and we have to have Itel junk. People are so fickle to want to buy the fastest... Why don't you all learn we want the most reliable then the fastest!
If the fix is just
<If CPU=Intel, do this
else
do that>
Then any fixed CPUs (e.g. the i7-8xxx apparently) will not benefit from it at all. So, will MS/Linux etc. do the right thing and test properly?
ALSO - if Intel knew about it when they were designing coffe-lake (I7-8xxx), then how come it's taken so long to get a patch in place? Surely the design was years ago.
They have been doing that, yes - the Meltdown stuff isn't applied to AMD at all. There's more that they could be doing w.r.t. PCID when INVPCID is not available (most 3xxx/4xxx/G32xx/G1xxx Intel chips), and maybe they will in due course, but it that's an optimization that can be done a little later.
Intel's press release is a masterpiece.
https://newsroom.intel.com/news/intel-responds-to-security-research-findings/
Translation: This critical security exploit that will be disclosed soon (though not by us, even though we've known about it forever) affects only systems that are working as designed (to leak sensitive data, presumably), running specific workloads (not general ones?), via a software analysis method that is definitely not a "bug" or "flaw" (it doesn't, like, delete your secrets or anything, it just gives them away). We are working tirelessly with companies like AMD to develop a fix (even though AMD's products are immune and they should be pissed we're mentioning them), and to roll out software and firmware updates that mitigate this definitely-not-a-bug-or-flaw. In conclusion, Intel believes that Intel's products are for sure totally the most secure.
I do like the way they're throwing all the shit at the fan and seeing what sticks. In one paragraph they claim it's not confined to Intel and in the next they say they're working with other manufacturers which they go on to name. The insinuation is that every manufacturer has this problem.
They probably thought about mentioning VIA who have just announced they're returning to x86 too but that'd be too obvious.
This post has been deleted by its author
I'm disappointed of Reuters, Bloomberg, and others, for reproducing this piece of news uncritically, specially the 30% slowdown claim.
While The Register has a very specific audience, and I'm sure among us everyone understands that a 30% slowdown for one application may mean nothing for others (and even a speed up), Reuters and Bloomberg, among others, should know better than to quote a 30% slowdown without saying it comes from a single PostgreSQL developer, running a very rudimentary speed test. And which % of apps that run on Intel chips fits reasonably well that description and that environment, so how could possibly that figure be authoritative? I'd say more: how could it possibly mean anything at all and be newsworthy?
The Register's article's well within their own "flair" and kind of journalism and that's ok. But for others to quote it carelessly without double-checking with enough qualified experts (who may have to take time, perhaps months, to investigate the seriousness of this and the actual consequences, speed and otherwise) is irresponsible.
I can only surmise the people responsible for airing this news at those other outlets are techies, and aren't better than the average The Register readership and not mindful of the consequences. In other words their job's too big for them. They should probably write for The Register and not for the global mainstream news. Ditto for the editor in charge of those outlets: allowing this to make front page news is IMO careless and irresponsible.
The Register unscathed.
So what happens to my well-isolated single-user systems that only I have access to? Am I going to get UNIX, Linux, and Windoze auto-updates that will slow these systems down, dramatically, even though there may be no compelling need for that in the context of well-isolated single user systems?
It doesn't really matter that they're single user, since obvious attacks for something like this include root privilege escalation, and escape from isolation mechanisms like VMs and containers. And simply apps reading stuff from other apps which they're not supposed to be able to (like some Javascript in your browser being able to poke around anything else that happens to be running on your system).
WTF? What happens when an airbag manufacturer produces exploding mines that hit you in the face when you have a low speed prang? Do you just accept the performance penalty and drive at 10mph?
Why hasn't anyone mentioned compensation? This isn't the only fuck up by intel? What about the great lock elison debacle on the first gen Bradwells and Haswells? We accepted that we had our processors crippled and now we have to take a performance hit as well?
I say the chipzilla should pay. I believe 25% of list price for the processor is a decent compensation amount.
Good point maybe they will end up in a class action, make a few lawyers ridiculously rich and the rest of us ... if we're lucky enough to have all the stuff most people throw out, proving we have said computer chip, still laying around. We can then submit those things and get a coupon for 50% off a Starbucks Coffee.
In answering your question no the manufacturer goes bankrupt moves to another country changes their name and starts fresh. Of course all done legally with the money from decades of previous sales of the bad chip held in off shore accounts.
Of course I might just be a pessimist :)
A better question to ask is what other flaws has Intel been hiding from us? They apparently knew about this for awhile. Evidence from Linux sources show as much since July or so when the developers started working on the fix.
Anyone remember the Pentium F00F bug? That's the one where any user in any operating mode can halt the processor in a denial of service attack. The only way to recover the system was to hit the big red button labeled 'RESET'.
Or how about the FPU bug where six entries from a lookup table took a permanent vacation and screwed up floating point calculations? That one cost Intel USD $400,000,000 to fix.
Intel's recent statement about other CPU vendors being vulnerable was pointed directly at ARM and AMD. ARM, to it's credit as stated that some of their chips have the problem as well. AMD has come out and flat said they are not vulnerable. It seems to me that Intel is trying to make others look bad (especially AMD) so they don't stink as much.
"A better question to ask is what other flaws has Intel been hiding from us? They apparently knew about this for awhile. Evidence from Linux sources show as much since July or so when the developers started working on the fix."
You, uh, realize this is how security *works*, right?
When responsible researchers discover an issue they don't just immediately go and plaster it all over the press. They disclose it to other relevant parties, behind what's called an embargo, which basically means everyone agrees not to go and tell the press about it.
Then all the relevant parties work together to come up with a comprehensive fix. *Then* they ship the fix and declare the vulnerability once everything is nicely lined up.
If they *don't* do this you have a zero-day vuln - where the vuln is publicly disclosed, but no *fix* is yet available - which is a very bad thing. Embargoes and delayed disclosure exist precisely to prevent this happening.
The reason this issue was still embargoed is that fixing/mitigating it is complex and requires co-ordination among many parties, because it can't just be conveniently fixed in one place. People were busy lining up comprehensive fixes to various OS kernels and to things like web browsers to try and prevent exploitation via malicious scripts.
Whichever numpty went and prematurely blew the gaff to the press has caused a whole ugly mess, particularly since they didn't really do a very good job of explaining it, leading to lots of coverage which is confusing one specific exploit variant (that is Intel-only) with the entire class of potential exploits (which is certainly *not* Intel-only; weaponizable exploits are already known to exist for Intel, ARM, s390 and PPC CPUs, and for AMD CPUs with a non-default Linux kernel configuration, and it seems extremely naive to believe there won't be *more* along very soon).
Taking this to the automotive side and mixing the kernel-level table jokes:
- Is Intel still selling CPUs with Takata airbags on them?
- Now they will send you a new catalyst for your muffler that will cut up to 30% of the horsepower of your vehicle?
- Nobody is suing VW for CPUs that had 30% extra horsepower, but polluted the environment?
The page fault qualification by AMD is interesting. It either means that the Intel flaw occurs primarily when speculative execution pulls from virtual memory (disk)...
or that AMD did have the same flaw but takes advantage of page fault interrupts to add in a simpler microcode privilege check as a patch.
*** Best guess... Intel speculative execution is hardwired such that it begins immediately without waiting for OS address protection schemes being re-applied to newly loaded page fault loads. ****
So very likely protections on kernel data/instructions work well on those portions of kernel that are actually in memory (? via the memory controller mapping bits?). But portions of kernel currently residing in virtual memory do not have those protective bits as an integral part of initial page loads (different protections apply while on disk). That is address protection bits on newly loaded pages are probably set by ad hoc software OS routines (multiple instructions running in kernel mode)...a delay of hundreds of clocks.
The future silicon fix is therefore simple - new page loads start as kernel buffers that default to NX and kernel mode protections THEN load page data and LATER at leisure (OS routine) decide if the protection bits for that page need to be set to some user space or executable instead. Seems fairly easy to do as a software patch as well...unless current page load instruction mess with protection bits in some other way by default (like setting user mode).
Or Intel could add a Page Not Ready bit that would prevent all CPU access except page loading instructions until a new load page protection instruction successfully completed and reset the PNR bit.
This post has been deleted by its author
There has been lots of discussion about the impact to individual operating systems but what about VMWare, Hyper-V, and Xen? Are they immune to the issue? Can someone running a VM get access to hypervisor memory, and from there access to all the other VMs on the machine?
Seems to me this is the big issue.
Shouldn't we be upset with The Register for exposing this before anyone's had a chance to patch for it? My understanding is, the big players affected by this vulnerability have known for months and are working on mitigating it, but have been asked to keep it quiet so hackers didn't come up with a way to take advantage of it. Now that The Register has publicly spilled this--(apparently against the wishes and advice of everyone involved)--aren't hackers off to the races to exploit this now? Looking at the ongoing conversations taking place out there, it looks like it won't take hackers too terribly long to put something together...just hopefully longer than it will to put preventative measures in place.
I hope The Register reporter(s) are happy with their "we-got-it-first" award and can sleep well at night knowing they may have given hackers a head start.
<Just a thought.>
All of the Windows servers I am using on AWS EC2 use AMD64 processors. AWS and Azure can swap out Intel hardware for AMD hardware with just a virtual machine restart so this may be a preferred option for them.
AWS went through an exercise over the summer to force reboot some servers. We were told it was for essential maintenance of the host hardware. It affected two of my servers. Perhaps the real reason for this exercise was move us onto AMD processors (I wish now I knew the CPUs used before the reboot).
I wonder if we are going to hear from the design engineer at Intel who said, many years ago,
"You know if we do this pre-fetch thingee we are losing data security ...
and was promptly shut down by his boss who was far more concerned about AMD having faster processors than data security?
Given this flaw has been Intel CPU design for over a decade, it is reasonable to assume security agencies with in the US and at least Russia are intimately familiar with this capability and have tools to read the core kernel details as a result and compromise any system previously thought of as secure. I will be applying the patch ASAP - but will now start looking more closely at a new CPU. Perhaps it is time to switch to AMD.
This design error is history repeating itself. Back in the seventies the 1970's the CDC 7600 had a similar design flaw. A process would reference an out-of-bound memory address which would stop execution. However, the contents of the out-of bounds address would be in a register of the image of the stopped process. Using a parent process to repeatedly access low memory by the child process, all of low memory was read. This is where pass words were stored, unencrypted at the time. Seymour Cray made a hardware fix which was the right things to do.
Normally I would apologize for not reading the preceding comments, but with over 400 and counting, I will stay shtum on that account. Still, in my heart, I apologize. Thanks to El Reg for a highly educational article; were I a "real" systems analyst, I might have understood it all !
I'm wondering if the bug applies to 32-bit Intel processors. The article says x86-64, but a comment mentions that all Intel x86 processors get patched in Linux. So I'm wondering if my now fairly ancient Thinkpad T60, Intel 32-bit Core Duo T2400 laptop running XP is, with care, as secure as or more secure than a contemporary machine running Bo Derek. If it is, well, chortle.
We all know that Microsoft the OS-maker introduced undocumented features and that Microsoft the application-maker exploited said undocumented features to stay ahead of the application competition. Yes, from experience, I am expecting to get downvoted for that. Go ahead, fill your buffers. I am wondering if any parallel could be drawn with the current case. I'm not after the obvious, that Intel was trying to keep ahead of AMD and, ah, jumped the shark.
Please don't insist that everyone update to mac os high sierra 10.13 if they don't want to do so as 10.11 and 10.12 both appear to have been patched back on Dec 6. If you look at the kernel updates, there is a new one added/updated today that indicates the 2017-002 update for 10.12 and the 2017-005 updates for 10.11 (along with 10.13.2 for high sierra) all fix the Meltdown vulnerability aka CVE-2017-5754.
NONE of the entries list any fixes for the other two Spectre vulnerabilities -- CVE-2017-5753 or CVE-2017-5715. There have been no other new security updates listed on the Apple support site.
Reference (all Apple security updates) -- https://support.apple.com/en-us/HT201222
Reference (with fixes listed for CVE-2017-5754) -- https://support.apple.com/en-us/HT208331
"Kernel
Available for: macOS High Sierra 10.13.1, macOS Sierra 10.12.6, OS X El Capitan 10.11.6
...
Entry added January 4, 2018"
Is PCID supported for Windows 7? I've updated two Windows 7 systems on CPUs that claim to have both PCID and INVPCID support, but that support doesn't show as enabled.
I used the Powershell 5.1 Get-SpeculationControlSettings function to check. (See https://support.microsoft.com/en-hk/help/4073119/guide-to-protect-against-speculative-execution-side-channel-vulnerabil ).
"Speculation control settings for CVE-2017-5754 [rogue data cache load]
Hardware requires kernel VA shadowing: True
Windows OS support for kernel VA shadow is present: True
Windows OS support for kernel VA shadow is enabled: True
Windows OS support for PCID optimization is enabled: False
Suggested actions
...
BTIHardwarePresent : False
BTIWindowsSupportPresent : True
BTIWindowsSupportEnabled : False
BTIDisabledBySystemPolicy : False
BTIDisabledByNoHardwareSupport : True
KVAShadowRequired : True
KVAShadowWindowsSupportPresent : True
KVAShadowWindowsSupportEnabled : True
KVAShadowPcidEnabled : False"
I booted Mobile Windows 10 on one of these systems, and under Windows 10 it does show
"Windows OS support for PCID optimization is enabled: True".
So, I was wondering if they've omitted PCID optimization under Windows 7.
The question now is?
1>Will new chips be vulnerable?
2>When will we get new chips woyout the vulnerabilty?
3>When to buy your new PC, as if your like me I just cancelled the order of my new iMac.
How much more can we take? It's all getting a bit boring!