Can't help feeling
That if someone tried to shove 512 amps through me there would be crashes, too!
One game developer says it's had enough of Intel's 13th and 14th-generation Core microprocessors, calling them "defective." Australia-based indie dev studio Alderon Games made its frustrations with Intel's latest chips public in a write-up titled, "Intel is selling defective 13-14th Gen CPUs," authored by the studio's founder …
The article seems to be saying it allows the CPU to ask for that, not that it is putting that through the CPU unasked.
If this is correct, than I think it is safe to say the CPU is defective. I do wonder if OEM systems exhibit the same behaviour though, because if they do, this would provide a very obvious avenue to really hurt the OEMs by burning out the CPUs while under warranty.
I might have misunderstood, but from what I've already read about this case, I get the impression that the problem is that while they've provided specs, Intel haven't been sufficiently clear in translating these into definitive limits under which one can expect these CPUs to work reliably.
I read it as meaning the cpu-exogenous limits are so large (run a kettle at 500amps!?) as to be meaningless/not limits.
Implying that the CPU design might have lucked its way through testing by always operating in constrained state on the test-bench, but the unconstrained behaviour is self-damaging/borken.
I upvoted your comment, but based on what I read there are hints that Intel's practice of approving any and all motherboard power-limits over the last five years is also causing degradation. However, this is still Intel's fault, as they have consistently approved every motherboard vendor's implementation of any & all power-limits! The motherboard vendors are particularly screwed, as the vendor who pushes the CPU harder gets better benchmark scores & sells better, so all are forced to constantly raise the power limits to be economically competitive.
WTF are they expecting to be powering???
Knowing nothing of this sort of thing the only comment I'd make is that I don't think there is a normal, domestic socket anywhere in the world which is designed to supply over 4kW of power (4kW is - apparently - just the CPU limit set by the m/b, what about the other parts of the system, say the GPU?) so this is obviously a setting which means "take what you want". Seems daft when there obviously must be a physical limit; why not just report it accurately?
M.
"I don't think there is a normal, domestic socket anywhere in the world which is designed to supply over 4kW of power"
<glances at 20A three phase socket in the corner of the living room>
That being said, surely it isn't asking for 4kV at 230V, it'll be 4kV at, what 1.8? 3.3? Still bloody ridiculous, mind...
I *severely* doubt that MSI's voltage regulators can actually handle that kind of output without turning into molten plastics very, very quickly. So that's fraud on MSI's part, at least.
But if motherboard manufacturers are indeed claiming to support these levels of supply outputs it could be a voltage stability issue causing these crashes
We have a server cluster where each server can pull 5kW if it wants through 4 redundant 2.5kW power supplies. In their current configuration they are pulling 2.5kW which it distributes (roughly evenly) between the 2 CPU sockets and the 4 GPU boards as well as a little bit of memory and networking.
So a 4kW limit on power for a motherboard isn't unreasonable, but it's probably the total for the CPU sockets rather than per CPU and reflects what the PSU can deliver if pushed.
Jon
Tech Jesus (Steve) of Gamers Nexus and Wendell of L1 Tech just covered that in a join episode. Steve indicated there is a follow up with some inside info he got.
Their scoop is that there is something bad, as in "no firmware fix, but wholesale CPU replacement" across an entire production line.
I can't help feeling relief that, after 25 years of stodgy sticking with Intel, I finally had made the switch to AMD. About the worst issues I had was that intractable USB issue several years ago and I no longer need a personal nuclear reactor just to power the Intel PC.
Over the years, I've had computers with Intel, NEC, Cyrix, Transmeta, VIA, and AMD x86-compatible CPUs. I've been lucky enough to have not had any CPU hardware faults.
These days, next-process-node development requires so much money that there's little effective competition in x86-compatible CPU manufacturing. No start-up will have the needed cash.
ARM begs to differ as they - or CPUs based on the design - have been chipping away at Intel (and by extension AMD) for some years now.
Though the various predictions I've found searching ("ARM to take 50% of notebook CPU share by 2027") I'd take with a huge chunk of sodium chloride. But, who knows!
Anecdotal I appreciate, but I've had 2 AMD 7950's X3D fail in exactly the same way in the same system in under 6 months. Burnt out in one specific area of the chip and then fails to post. Plenty of pictures online of 100's of people experiencing the exact same thing on a variety of MB's and configurations. Wish I could say I was overclocking or something but it's literally stock clocks on everything with high end components throughout (Gigabyte MB and Corsair power)
Pretty annoyed by it :(
Burnt out in one specific area of the chip
This. The article pretty much confirms that this is the issue with the Intel chips when it says that the eventual failure rate is 100%; failures increase over time. The 4KW thing is a red herring. The actual die power consumption is a function of the frequency, voltage, and capacitance driven; chips like these are very carefully designed to power up specific sections only when required, and to limit the frequency and voltage to keep the die temperature to an acceptable level. The problem is that the tools which predict temperature distribution across the die are not very good, and you can never be sure whether or not the MB has adequate heatsinks, and you can never be entirely sure when silicon on a new process will fail. Eventually, some part of the chip will pop, unless you're very conservative. Microcode is going to be a very blunt instrument for controlling this.
If the motherboard advertises 4096 W and 512 A and cannot deliver that for real, I think it's perfectly fine for the CPU to crash if it tries to use lots of power. Voltage ripple effects will make any CPU unstable.
And you would need a really fast oscilloscope to verify the actual performance of the motherboard power delivery so you basically have to trust the motherboard manufacturer claims.
And it doesn't help that RAM manufacturers often advertise timings that do not actually work on all cases either. For example, the rowhammer attack shouldn't be possible with correctly set timings but many RAM sticks are vulnerable because manufacturers advertise timings that appear to mostly work.
Hardware manufacturers must stop lying about their products. At start, it was only GPU manufacturers telling that their card uses "180 W" of power but you still needed to get 750 W PSU as the minimum requirement! But now more and more manufacturers are selling imaginary specs and the product fails if you actually try to use the advertised specs.
Bear in mind that GPU manufactorer recommendations for what PSU to use are very, very conservative estimates to take into account for cheap shit PSUs, or billy basic ones used by OEM/ODMs etc.
IE I have an 7800XT, which I'm sure recommends a >790w PSU or some such. Which is utter rot, on a technical level, but it's a necessary margin to take into account that not everyone has a high quality PSU, or maybe they're running four spinning disk in there that'll draw knocking on 100w at startup, etc.
I'm happily running it on a 550w PSU, because the power profile at absolute max is about as follows:
CPU - if it draws more than 90w, somethings gone badly wrong (Ryzen 7600, rated 65w but give it some margin for boosting etc)
RAM/Mobo/NVME overall: maybe ~30w or so
GPU - 300W if it spikes badly (rated for 265w IIRC, which is about what I've seen it draw when fully loaded up and benchtesting)
Throw ~20w on there for fans etc.
That's a total of ~450w if there's a major wobble while I'm fully loading the CPU and GPU at the same time with all the fans running full whack while also loading up the disk and network - for the most part, it's gonna be closer to 300w when gaming.
So I wanged a mid range, decent quality (Corsair) 550w semi-modular PSU in there, and it's been just fine.
With respect to the 4096w/512A, that's basically saying to the CPU "Draw whatever you think you can draw to run as you see fit" - the motherboard manufacturers will have only specced their power delivery for, say, 500w to the CPU on a serious overclocking board, and it doesn't appear to be the power delivery crapping out that seems to be killing these CPUs.
Lets say the CPU says "I have the thermal overhead to run 400w, so give me 400w, motherboard" and the motherboard says "tough shit, you're getting no more than 240w" - those CPUs are still dying.
That's the case of people using workstations motherboards (which have far more conservative power limits, for stability). It's not that the CPUs are being blasted with power in those cases. They're still crashing even when run on sensible power limits.
From what interested parties have seen, it's not specifically an over abundance of power delivery that's killing them, and it can't fixed with microcode - so one can only assume there's a "hard stop" problem with the manufacturing process, likely from when they started pushing the limits of what the 12th gen architecture could do, for the 13th and 14th gen - as they are refinements / very light refreshes of that architecture (more L3 cache, tuned to draw more power if it's available, etc) to try to keep up with the AMD X3D chips, which blew everyones socks off by drawing (well, being rated for from a cooling perspective - give it a 20% wiggle room) 105W and kicking in the shins of the >250w (often way over 300w) Intel offerings.
It's going to be very interesting to see what Gamers Nexus (Actually a pretty serious benchmarking channel, rather than Capital G Gaming type content) and Level1 Techs (less hardcore, but more leaning towards enteprise with consumer stuff in the mix) come up with from their respective investigations as this sounds like intel have proper "done goofed".
Steven R
> very, very conservative estimates to take into account for cheap shit PSUs
Indeed. I don't know much about PSUs, but I *do* know that the one thing about dirt-cheap, no-name models is that you'd be very foolish to rely upon them being able to deliver the specified maximum power, at least reliably and consistently over an extended period of time. I've heard horror stories about some catching fire when pushed to do so.
I suspect that a 500W power supply from an even half-decent "name" manufacturer is going to cost a lot more than a bottom-of-the-range, no-name 750W model, but I know which one I'd trust more to run the same machine.
I've seen a number of PSU's in circa 2000 Dell computers fail after emitting the magic smoke, so the problem with PSU's was not limited to "no-name models".
I'm suspecting that there is a thermal issue that Intel glossed over, local heating of the die can slow down the logic elements, which could then lead to timing glitches causing the crash. I'm also wondering if the new process nodes are punting stricter limits on maximum junction temperatures to prevent diffusion of the N and P dopants.
This post has been deleted by its author
Game devs and publishers often have racks of consumer CPU'd systems running workstation class boards for realistic QA testing, and some use them for hosting remote game servers etc - having high speed single thread performance makes a difference for those.
You could run them on Xeons, but the games themselves aren't designed to run on a massively multicore, relatively low speed CPUs so they aren't as well suited for it.
Don't get me wrong, it's pretty niche so you might not be familiar with it, but it's absolutely a thing.
Steven R
No point squandering a legitimate excuse to mention your company's product, I suppose, but I wasn't even aware that "multiplayer dinosaur survival game" was even a genre...!
How does that even work? Do you have to avoid the doomsday asteroid heading for your home in Mexico by organising a plane trip to Europe then finding enough for you and your descendants to eat under hostile environmental conditions for the next several million years?
Having ensured your descendants' survival into the modern era, do they become the co-stars of an infamous dinosaur/human buddy cop film starring Whoopi Goldberg?
And why would you call such a game "Path of Titans" which doesn't even begin to hint at "dinosaur survival" and sounds like the most generic, play-to-win freemium game title ever?
Ah, I'd forgotten about Jurassic Park. It seemed more obvious once you mentioned that, and I thought briefly that I'd been stupid for misinterpreting "dinosaur survival" as meaning you were trying to survive *as* a dinosaur rather than a person trying to escape from them.
But then I checked the game's website and it turns out that, no, you *are* playing as a dinosaur and I was right in the first place.
Weird.
4096 W and 512 A.
My guess these limits are stored in 12 and 9 bits.
Zero Watts or Ampères don't make at a lot of sense here so zero probably means no limit or the maximum representable value + 1. ie (212-1)+1 and (29-1)+1.
Appears these chips weren't given enough magic smoke or the wrong coloured smoke so fiddling with overclocking setting and microcode updates isn't going to fix this. Everyone knows what happens when you let the smoke out of an electronic component. :)
I like the developers' touch of having their game pop up a dialog when running on these CPUs:
[ Sorry mate. You went shit* with your CPU. :((. ]
* Unsurprisingly the branding (shiteinside)® is not yet a thing despite near universal enshittification.
Being an AMD AM5 Desktop user (7950X using Linux of course) I have been following this situation with 13th/14th K/KS models with amusement, yes amusement. Intel have always pushed the envelope because they get the advertising / positive propaganda. Anyway my 2 pence worth concerning this situation is Intel you are F3CKED this time. I personally think Intel knows what is wrong with the the 13th/14th Gen being potentially the glue that you are using metaphorically speaking to hold them together. And unless Intel stands up and replaces all 13th/14th Gen CPU's immediately that a customer requests one then I can see a class action lawsuit coming its way. The shit is about to hit the fan and it is a doozy, so god help the fan let alone Intel !
What is super bad news for Intel is that would a user that has had a defective Intel 13th/14th CPU now want an untested for 6 months next Gen Intel CPU rather than an AMD 7000/9000 one, maybe they will demand their m/b and CPU to be replaced with the upcoming Intel Next Gen CPU's or maybe 100% money back so that they can purchase an AMD m/b and CPU !
All I can say is the quicker Intel jumps on this to resolve it the less future hassle it is gonna get and its gonna need might deep, deep pockets to keep everyone happy !
it's all nonsense, current is limited by resistance and voltage....
my kettle is connected to the national grid, which can supply thousands of amps, but i don't need to program my kettle to not use that current...
The reality is Intel F**ed up... they are allowing the microcode in their CPu's to pull more current than the silicon can handle.
it's the same misinformation and pisspoor designs they have continually done.
like saying their CPU can run at "XYZ Ghz" , it's all nonsense, becasue as soon as you try to meet that limit , all the other cores scale back, so maybe a limited number can run close to that speed.
Then you actually look at what they classify as "GHZ" and it's not actually GHZ speed, it's mult threading that gives a similar speed to what might be seen at that speed , when using manipulated code
Obligatory x86's days are numbered thought. AMD have the edge for now, but the opportunities to improve are few and far between. Transistors are more or less on the edge of the limits imposed by quantum tunnelling; and recent performance gains have mostly came from being clever in how one physically builds and arranges the transistors.
The diminishing returns of throwing more power at 6502, Z80, 68000, power, and endless other architectures eventually all prompt a rethink - and why should x86 be any different in that regard?
The ARM architecture is probably the best known one, though we generally haven't seen the bleeding edge of manufacturing applied to it. There's probably a place for it; especially to cut datacentre power usage.
And then there are other concepts like ternary processors (allowing bits to be -1, 0 or 1) rather than having to have hardware logic and effort expended on flagging and dealing with things as positive or negative.
i have both a Ryzen 9 7950X and i9-14900K. Both are rocksolid platforms on any app or game if run on stock profiles. Most of these issues are bad motherboard BIOS profiles. This generation many of the motherboard manufacturers have had a massive drop in quality. I want to see AMD and Intel make their own motherboards because I no longer trust the Asus/MSI/Asrock/Gigabyte motherboard cartel to make quality motherboards. Their BIOSs now run with extreme overclocks by default with extreme difficulty to turn off these profiles. Also Windows 11 now enforces weird power saving kernel level rules that makes many applications react in strange ways, often boosting the first core to maximum especially if default overclock bios profile is on, which causes the app to crash, just run windows 10, and build a stock profile with extreme boosting disabled.
Fair. The plethora of options is that large it’s difficult to know what to flip. I would not know who to pick from for a mobo today; having been an ASUS convert for over a decade. Their recent shenanigans with RMA (see GamersNexus) have earned them the NOPE award.
I spent considerable time plotting out what bios options worked well; the defaults of enable XMP got so much wrong it was painful to see, and unstable. Memory timings, fclk, voltages all badly off. $deity help anyone that doesn’t know what they are doing.