Well to be fair...
...unexpected thermal throttling isn't quite on the same level as parts melting (and then blaming the customer).
I feel like they're both pushing the limits of what's sensible with 350W+ sustained power draw.
AMD may have spoken too soon when it roasted Nvidia a couple months ago over the melting cable issue of the rival GeForce RTX 4090 graphics cards. That's because the Ryzen-shine designer is now faced with multiple reports of an overheating issue with its own reference design for its new Radeon RX 7900 XTX graphics card. AMD …
Any desktop computer part other than a PSU that pulls >300W stretches the definition of sensible.
If this kind of power draw is to be commonplace then an evolution of ATX to something using bars rather than cables would be beneficial. Declutter the case and reduce resistance on critical power flow routes within the system.
Apple's thirsty G5 towers used bars for this reason.
The shift to ATX12V in PSU potentially offers some options around the idea.
On the other hand: one failure mode tanks performance, while the other one might burn down your house.
If I had to be stuck with one of these cards, I'd rather it be the one that fails safely, thank you very much.
I'd recommend watching the videos by der8auer who really dove deep here. It seems Igor's Lab found it, but the former ripped a number of cards apart (including ones he bought from people reporting the issue).
It's looking like the vacuum chambers may be the culprit - which would mean full on recall territory, not just a "few" RMAs.
Heat pipes used in graphics cards are essentially vapour chambers.
They contain liquid that evaporates then condenses back to a liquid to efficiently move the heat.
It seems there is a design flaw that is apparent on some (not all) cards. Where it is assumed that the gas is not condensing again or some such so that the heat pipes are not working properly.
Too much / not enough coolant or just design. We will know when AMD find the cause.
I was just going to post a mea culpa. Watched the original in German, then again in English.
Yes, it's "vapor chamber" - oops! Roman at several points refers to vacuum and how the structure of the chambers acts as support to prevent it getting crushed, so apparently I got that mixed up in my aging brain.
basically some liquid evaporates in the hot area carrying away some heat, it then moves to a colder area and condenses, releases the heat, the liquid then travels back to the hot area and the cycle repeats. All very clever with no moving parts. However if the whole thing gets hot enough that the liquid doesn't condense then it stops working and just gets hotter and hotter until enough heat is removed and the liquid can condense again. Also if its not designed correctly so that the condensed liquid flows back into the hot area then it'll stop working (needs to work whichever way the vapour chamber is orientated, some experiments show the cards run cooler when mounted vertically rather than horizontally).
I'm no expert but it sounds like the former issue can be solved by setting the amount of liquid and the air pressure in the chamber so that the liquid always condenses over the desired temperature ranges (but you have to make sure it still evaporates at the lower end of your temperature range too). The latter issue is down to the design of the chamber itself and could be harder to fix.
I watched the derBauer video and was impressed by the thoroughness of his investigation. I found it on the EXTREME TECH web site which included this: "Interestingly, fellow overclocking YouTuber Igor from Igor’s Lab has also chimed in according to Videocardz. He said he spoke with an AMD partner that agrees with De8auer, it’s the vapor chamber. This partner reportedly said a batch was made with an insufficient volume of liquid in the chamber." And this statement from AMD: We are working to determine the root cause of the unexpected throttling experienced by some while using the AMD Radeon RX 7900 XTX graphics cards made by AMD. Based on our observations to-date, we believe the issue relates to the thermal solution used in the AMD reference design and appears to be present in a limited number of the cards sold. We are committed to solving this issue for impacted cards. Customers experiencing this unexpected throttling should contact AMD Support.
AMD has one thing going for them - very pro-active troubleshooting and diving into figuring out (with the help of third parties) what the problem is... unlike NVIDIA who by accounts just shrugged and said "User error, plug your cable in right".
Either way, the smug neener-neener behaviour from AMD certainly has come home to roost... I hope the PR people won't be jumping up and down in glee next time something happens to NVIDIA (or any other competitors).
"AMD has one thing going for them - very pro-active troubleshooting and diving into"
That's because a fix won't cost them much. Replacing expensive hardware is.. expensive.
Don't get me wrong, I'm an AMD fan (on some Ryzen system now). I'm also an Nvidia fan and user.
They are both excellent and serious companies.