Maximise Profit Margins.
There, explained in three words.
Linux creator Linus Torvalds has accused Intel of preventing widespread use of error-correcting memory and being "instrumental in killing the whole ECC industry with its horribly bad market segmentation." ECC stands for error-correcting code. ECC memory uses additional parity bits to verify that the data read from memory is …
ZFS has a lot of advantages, even without ECC (and the "scrub of death" is a myth).
I would like to have ECC on my home-brew FreeNAS, but I can't justify the additional motherboard and RAM costs or the extra power draw (server grade boards tend to be much more power hungry).
However, I insist on it for business grade NAS solutions.
"I would like to have ECC on my home-brew FreeNAS"
ECC ram is almost the same price as non-ECC (seriously, it's about 3-4% different, if that)
The cost difference is in the Intel ECC supporting boards and CPUs vis non-ECC ones - the solution is not to use Intel CPUs
You don't need a server grade board nowadays. I've got a Gigabyte X570 Gaming X board coupled with a Ryzen 7 2700X and 64GB (2x32GB) Samsung ECC happily running TrueNAS Core 12.
Cost of board about £170, memory £325, processor £200. Runs my RAID-Z1 vdevs very well!
Fair comment if you are indeed just running a very simple file server. For me I'm also running NFS and iSCSi to support VSphere datastores as well as also running a few jail-based apps incl. Plex which sometimes benefits from the CPU when transcoding.
I guess one's perspective of what is reasonable cost depends on the direct benefits that will arise. I'm hosting around 60TB of data, so it was worth the extra pennies to get the additional protection.
I would like to have ECC on my home-brew FreeNAS, but I can't justify the additional motherboard and RAM costs or the extra power draw (server grade boards tend to be much more power hungry).
My nas4free is running on HP Microserver (AMD Athlon II (N36L)) with ECC happily and isn't that power hungry. The 4 x 3.5" spinning rust are most likely the largest consumer of power. Newer Microservers have Opteron and IIRC come with iLO (whereas on the older ones iLO was available as add-on card).
The Linuxheads will flame me for this, but I'm sorry Linus, I don't believe you are correct.
My Lenovo is equipped with a Xeon, and supports ECC. Yet it didn't come equipped with it. And the huge majority of the models in the same line didn't, from the factory.
Intel built the Xeon with ECC support. Yet here's an entire model range that needed to be ordered as a custom config in order to have it from the factory.
Why? It wasn't Intel's fault: ECC memory is just too expensive for most people's cost/benefit equations.
Nothing to do with Intel. You want more ECC market penetration? Tell the OEM manufacturers, from memory to computer, to stop price gouging on that single extra bit. Some computer manufacturers charge a real premium just to upgrade to ECC.
I don’t believe the cost argument. The cost difference should be minimal (and of course would evaporate almost entirely if ECC were used everywhere). If ECC memory is significantly more expensive than non-ECC then that’s an artificial sales/marketing thing - it’s not because of any intrinsic significant additional expense
You need 9 RAM chips instead of 8, right? That is in line with last time (years ago, I admit) I compared prices for ECC vs. non-ECC RAM (all from the same place). ECC was about 15 per cent more expensive, as to be expected. And the prices of that dealer were quite OK. Has that changed so much?
Even longer ago ECC was just overpriced (similar to Xeons), costing twice as much or more than non-ECC RAM.
But it's not. As one example, a quick search through Newegg showed only one 2666mHz ECC SODIMM from a major manufacturer, a 16gb part from Hynix
https://www.newegg.com/p/pl?storeName=Laptop-Memory&pageTitle=Laptop+Memory&
N=100007609+601204087+600006161&Submit=ENE
at $91.86. A 32gb non-ECC dual SODIMM kit from a major manufacturer only averages around $122 or so
https://www.newegg.com/p/pl?storeName=Laptop-Memory&pageTitle=Laptop+Memory&N=100007609+500002048+601204087&Submit=ENE
Yes, that's laptop. But downgrade a Dell Precision 5820 desktop workstation to an Intel i7, from a Xeon, and the 16gb RAM kit is $124.98 less expensive. A shown, you can double the RAM for that amount of money.
So there is DEFINITELY a price difference.
Though to be fair, as ECC has been relegated to the corporate desktop and server worlds mostly, there is a magic price point for ECC where it starts to get cheap. You wont find it on the top megahurtz parts, and the top speeds will be punitively expensive unless you are buying data center volumes, but midrange memory (Like the no frills kingston bare sticks) is much closer to their non-ecc parts. The performance hit (the other old argument against ECC, not mentioned in the article) is also small on modern architectures, at least compared to the losses to mitigate the various side channel attacks.
When you start actually using large GBs of memory heavily, you start seeing simple memory errors often enough that ECC for just for system stability is a good investment, but also an important protection for the security reasons that Torvalds pointed out. It's often possible to get binned parts that have better latency and eliminate the speed penalty for less $$ then going to higher Mhz rated parts. People tend to forget that faster CAS timings also affect memory performance, not just the base clock speed.
That said I'd also like to see the artificial price segmentation for SAS and SATA addressed, and then near term SATA could just go away on the the motherboard side(since SAS is backwards compatible).
a quick search through Newegg showed only one 2666mHz ECC SODIMM from a major manufacturerQuite.
And using basic economics, what happens when there is:
So, with little or no competition, the price premium is only about 50%.
This tells me that the margin on ECC is high due to the aforementioned reasons, not that the manufacturing cost is (particularly) high.
If Intel hadn't hobbled ECC so that we had actual volume and competition in the ECC segment, don't you think that the price premium of ECC would reflect the manufacturing cost increase, about 15%, as opposed to the current price premium of 50% or more?
Everyone is completely discounting the pricing factor of RAM binning. ECC RAM might also be more expensive because the RAM manufacturers save the top-binned parts for ECC, under the guise that there will likely be less errors on the top parts and, whilst ECC corrects those errors, many systems flag the errors with DANGER, WILL ROBINSON!
Nobody wants to see system error messages, that's the entire point of buying ECC in the first place. So use the best parts in ECC in order to grant the part the best reliability record, the best customer satisfaction ratings. You got what you paid for - no error messages, right??!
And top-binned parts cost money.
So ECC might ALWAYS be more expensive, you're paying for reliability, and to grant that they save the best parts for them.
You are paying for more chips to make ECC (a parity chip for every byte for standard ECC memory although other schemes are available to cover higher levels of performance/capacity/reliability).
Because of the ECC calulation taking a few cycles, ECC is generally slower than he equivalent chips (i.e. identical manufacturing process/manufacturer) when used in non-ECC RAM so it is unlikely to be binned for high-speeds.
The lower speeds generally mean that ECC RAM will be more reliable than non-ECC RAM but generally the same chips running at the same speed will be equally reliable as the timing tolerences between the chips are greater at those slower speeds versus high-speed RAM where manufacturers bin chips with near identical performance to enabel them to run faster.
TL;DR: You make a million RAM chips and bin by speed. All the ones that pass the QC checks at base speeds will have a similar level of reliability. 99%+ of the chips will be used in nonECC configurtations. For chps that can run faster, you bin them at a higher speed and charge a premium. There is little difference in quality between the base spec chips. The cost difference comes from the extra chip plus the additional costs for lower volume parts.
One extra bit gets you Parity checking. It can detect an error, but can't figure out how to correct it.
Unlike parity memory, which uses a single bit to provide protection to eight bits, ECC uses larger groupings. Five ECC bits are needed to protect each eight-bit word, six for 16-bit words, seven for 32-bit words and eight for 64-bit words.
Source: https://www.pctechguide.com/computer-memory/ecc-memory
see also:
https://www.realworldtech.com/parity-and-ecc-explored/
https://en.wikipedia.org/wiki/Hamming_code
ECC simply adds an extra 8 bits of the same memory device to a 64 bit interface for DDRx SDRAM systems (which is what just about everything in this arena uses).
So the cost of parts is 12.5% more for the memory device (and probably far less than 1% at the system level).
The ECC is done in the memory controller (part of the microprocessor) and as it is a standard thing, the marginal cost of adding it is tiny - a little bit more silicon real estate.
There are some extra PCB tracks to be added but given how many are already present they add little cost (might make PCB routing more interesting but it really is not that much more complexity). Initialisation of memory takes a bit longer (every memory location has to be written a valid value for the ECC to be valid) although it is possible to simply initialise a process space when it is first used.
The only reason I can see for Intel to take this stance is to make sure that those with a need for ECC have to pay a hefty premium.
I have done designs with ECC for decades and the marginal cost doesn't really even show up in the grands scheme of things.
I have to agree with you. ECC requires an ecosystem, not just a chip, and Intel doesn't control that. As the article points out, AMD's chip doesn't work without supporting motherboards and memory chips either. It's not just the Xeon chip prices, the Xeon ecosystem prices are higher across the board with MB and RAM vendors charging more as well to fully support ECC.
If you are using a Ryzen-based AMD system (maybe earlier models - I just haven't checked), all support at least unbuffered ECC out-of-the-box as long as you can stand the potential performance hit of running at 2666MHz/CL19 vs faster non-ECC memory and potentially being told that "the RAM isn't on the approved memory list for this motherboard" by the motherboard vendor although most vendors will support it on a selection of their offerings.
I have a 64GB Ryzen platform that hasn't had any unexplained crashes in a year versus it's Intel equivalent without ECC that has hung twice. It's a home lab so its just annoying rather than critical but it's nice not having to fix VM's following a crash. If only time was worth something... Kingston/Crucial do "affordable" unbuffered ECC so I only paid around 20% more for the priviledge.
"Why? It wasn't Intel's fault: ECC memory is just too expensive for most people's cost/benefit equations."
The CPU supports ECC but does the chipset? And who makes the chipset? That is the artificial segmentation Linus is referring to and yes, it is created by Intel to differentiate their products and then OEM's are left to choose between the different pricing tiers.
In terms of the cost difference between ECC and non-ECC it SHOULD be around 15% (i.e. the cost of adding a 9th RAM chip for every 8 existing RAM chip plus a little more on the chipset/) but is instead often 100%+ more because of the relatively low volumes.
And as for error rates? A computer that is on 24x7 will see ~3 correctable errors a year (http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf) although that maybe a little on the low side with more than 4GB now being common.
>It wasn't Intel's fault: ECC memory is just too expensive for most people's cost/benefit equations."
Intel kept ECC for the server market to charge a premium for Xeons
If you had ECC on all CPUs then cheap RAM would become available with ECC and everyone would just buy desktop machines instead
Remember you have to buy Xeon (or preferably Itanium) for 'real' work
"If you had ECC on all CPUs then cheap RAM would become available with ECC"
That's a plausible theory but unfortunately one that can't be proven. For example, the market may end up with a different stratification, speed rather than ECC vs non-ECC. What i mean by that is that "good" ECC RAM might still end up more expensive due to internal RAM manufacturer's needs or demands; "cheap" ECC RAM would be settled on as being slower for the application whilst fast ECC, the stuff you really want as an enthusiast, would still be kept expensive.
Just because they could.
Never underestimate the greed of industry. We have no way of promising that fast ECC would be cheaper if only it was more popular. Many RAM circuit board designs already have the traces for the ECC, the parts only need to be populated. Yet the RAM companies are often charging big, big premiums just to add those missing chips to an existing design - again, because they can.
That's a plausible theory but unfortunately one that can't be proven. For example, the market may end up with a different stratification, speed rather than ECC vs non-ECC.
I've done that for decades at home. Anything server-like I happily traded speed for reliability and chose slower memory but with ECC. For gaming rig, who cares if it crashes so opted for speed rather than ECC.
That's not to say if I could have my cake and eat it..
"Intel kept ECC for the server market to charge a premium for Xeons"
Thats only part of the story - there's desktop chipsets that don't allow ECC while the equivalent workstation/server chipset does.
There's nothing inherently in the CPU that stops Intel CPU's supporting ECC - the memory controller is in the northbridge portion of the chipset. While there are reasons to discourage ECC for some chipsets (i.e. integrated GPU's), supporting it is an almost zero cost option or would be if Intel didn't charge extra for "Xeon support" and there are a number of server boards that support non-Xeon CPU's as long as they don't have integrated graphics. As Linus says - he can buy a Xeon and pay 5x the cost of a desktop CPU for 2x the performance.
Intel have segmented mobile/desktop/workstation/low-end server/mid-range server/high-end server by limiting support for the number of PCIe lanes, ECC memory and similar features. It was great when AMD weren't providing any competition and they could rely on tick-tock production cycles but 10nm has left them sitting on bad decisions from 5 years ago.
in fact, Intels' Celeron, Pentium, and Core I3 cpu's have ECC enabled, but only when used on a 'server' chipset, like a C2xx, not on a desktop/laptop chipset. and the only reason Core i5 and i7 have it disabled is so they can sell more expensive Xeon chips which are functionally identical.
If 4GB systems have an average of 3 single bit faults a year, then a 16GB system would have 12/year. My desktop and laptop both have 16GB and both are 5+ years old.
You are quoting prices which are based on volumes, market verticals, and price gouging. If the volumes are there the parts suppliers will kill each other to get you the ECC modules for at most a 15% higher cost.
The entire chip industry works on razor-thin margins except in the higher end server markets where paying double for a component is a small blip when you are dealing with high MTBF's.
The whole idea of reliability is to build your parts reliable.
The official talking point was that the added circuitry for ECC memory (including the extra bits of storage) would actually reduce the reliability of most systems because there would be more parts to fail. This while simultaneously claiming ECC was needed for servers with their massive memory capacities of up to 4 GB! (Windows NT for servers) Considering a typical consumer build of the last decade had as much memory as a server of the Y2K era, that argument sounds a little weak, doesn't it?
> that argument sounds a little weak, doesn't it?
You have about the same chance of a single-bit error than you have with non-ECC. But that error is corrected! If you have two errors you at least find out. Note that the chance of a double error should be monumentally small, so a double error is an indication some module may be dying.
So, no, ECC really does not have any disadvantage to speak of.
Depends if you count slow performance (as opposed to repeated kernel panics) to be an issue. In my experience, when ECC RAM goes bad, it makes machines monumentally slow with no logical reason as to why, until you use the paid MemTest86 and discover thousands of "corrected errors" over a few days worth of rounds.
I've recently bought a Ryzen system, and just finding memory sticks that would work was a pain. The mainboard is clearly advertised as able to use DDR4 3800, but the first 2 sets I tried at merely 3600 did not work, even though the 2nd was on the manufacturer compatible list. And their support told me that AMD only supports 3200, anything else not working is too bad, but not their problem.
And that's not getting into the GPU issues I got afterwards.
So I'm not likely to bother hunting for ECC on AMD until it's officially supported. I'll continue stuffing important data on my Xeon which has it.
My b550 mobo claims to support ECC, though I haven't tried it. I just built a pretty nice system on one. Since I only use it for video editing, it doesn't
need huge long uptimes at one go, I use a less power hungry system for a daily driver.
Doesn't the ECC check slow things down a little bit? Or is it always pipelined so the CPU finds out a couple cycles later (if it hasn't already crashed)?
IIRC that's how it used to be done.
"Doesn't the ECC check slow things down a little bit? "
Yes and no.
If you're using buffered ram (which you should!) then the access latency is a little longer (one cycle)
Unbuffered ram has "other issues" including being touchier about disturbances to supply voltages or system noise and you can't put as much into the machine
Apart from that, there's no penalty in normal operation
I'd argue that we should have gotten away from dram a long time ago. Random access latency has become the single biggest bottleneck in systems and is why there's caching/branch prediction/etc all over the place. lower latency ram would remove most of the necessity for it.
The move to other RAM is ongoing - look at the slew of storage class memories and PMEM coming. The reality is that as a first off-cpu memory where you have to balance latency, density, cost, bandwidth, power and reliability there is nothing superior yet. Not saying that won't change, but even the much trumpeted 3d x-point has not made a dent.
"If you're using buffered ram (which you should!) then the access latency is a little longer (one cycle)"
Non-ECC vs 9-chip unbuffered ECC should be around 2 CAS refresh cycles longer to compute ECC
Buffered or registered ECC will be around 1 cycle longer again as they have the additional step of copying memory contents to or from the buffers/registers.
the primary advantage of buffered/registered ECC (or non-ECC memory if register non-ECC memory is used) is that it seperates memory bus electrical load from the actual memory load which is why you see motherboards quoted as supporting 64GB uregistered or 256GB registered memory (or similar numbers).
Comparing ECC speeds (i.e. the speed quoted in MHz), finding 9 chips (or a multiple of 9 chips) that perform well doesn't have a large target market so generally you will see 2400MHz or 2666MHz and CL17-19 versus non-ECC DIMMs at 2400/2666CL16-17. I haven't looked recently but was unable to find ECC RAM running at more than 2666MHz and they overclock poorly...
I understand that x86 may be here for a long time. Most of the world does not use a desktop. It is ARM all the way for most people.
x86 will probably go the way of the VAX instruction set. I bet there are still machines running FORTRAN on VMS in some corner of NASA.
> There's probably some emulators running FORTRAN on VMS somewhere in NASA as well, to give the maintainers a sandbox.
It annoys me when people use their imagination to fill in the blanks, lol thinking that Fortran is so old you need a VMS emulator.
Well the truth is that Fortran 2018 is the latest version, it integrates with .Net and C, compiles to portable code and although not the most popular language is still used extensively in scientific circles.
Presumably your point was NASA would have an emulator for old versions of hardware like the Voyager probes. Maybe they do, can't think why. Those probes are way too far away to patch. As for NASA and development hardware, well they usually just chuck the stuff in the tip when the project ends.
https://ourcodingclub.github.io/tutorials/fortran-intro/
> Most of the world does not use a desktop
Citation needed, but going with the assumption I suspect most of the world is indeed using laptops, the vast majority of which are based on x86-64 mostly using intel chips.
Oh you were thinking of tablets? Yeah some people may do banking on those when they are in bed or caught short in the shop, so they need ECC too.
> I bet there are still machines running FORTRAN on VMS in some corner of NASA.
Why would they need that? Fortran 2018 is the latest version and integrates with .Net
Years ago - Intel was pushing ECC, but only on RAMBUS as they owned patents on it. Once their wanabe monopoly failed (licensing would have been so money generating) they dropped the ball and walked off the field like a cry baby. Don't cry when the world doesn't want to pay fees to you for ever for what should be opensource. Keep the ball rolling, and let the game get better.
My FreeNAS server uses an ASRock E3C224D2I server motherboard and a Core i3-4150 CPU. Although the motherboard is a server version, complete with IPMI and a "just trust us" BMC to make a headless life easier, the CPU is most definitely described by Intel as a desktop part.
The combination supports ECC just fine.
"The combination supports ECC just fine."
It's the memory controller that limits the use of ECC on Intel motherboards and anything from the last 12 years should support ECC if the chipset supports it, although I'm sure Intel will have a handful of exceptions...
You have a C224 chipset and yes, it supports ECC for all supported CPU's.
Yes yours does do it as the Core i3-4150 supports ECC, most Intel core cpu sku's don't have an ECC capable memory controller, the top of the line Core i7-4790K doesn't. The only reason why that combination works is because Intel's bios's allow it (on a very specific chipset) and the cpus memory controller can handle ECC. Strictly speaking ECC should work on any motherboard combination as long as your CPU's memory controller can handle it.
With a Ryzen system ECC works on the workstation boards like an AsrockRack X470D4U or a gaming type board like a Gigabyte X570 Aorus Pro because it isn't artificially prevented in the bios and all Ryzen cpus have an ECC capable memory controller.
It has been quite some time since I was actively following such stuff, but at least since "64 bit" memory busses/cards came around, the number of bits needed for SECDED and simple byte parity have been the same.
Performance of ECC was/can-be an issue in that correction on read might be needed, and sub-64-bit writes needed to be Read-Modify-Write. Not such an issue when most writes are coalesced in the cache. Anyway, I'm not seeing the "need more bits of DRAM" argument.
Are non-ECC systems also blowing off even byte parity (which would also require 72 bits total for 64 payload) like so many of the 16 and 32-bit era PCs did?
Yeah, I recall memory "cards" that had "generated parity", so even if the system "needed" Parity, what it got was freshly generated from whatever crap data fell out of the RAM.
Parity is "one bit detected". ECC is "one bit corrected, two bits detected." When I worked on validating the Cell Microprocessor at IBM in 2003 timeframe, the L2 cache lines were ECC. 64 bits of data, 10 for the ECC.
My understanding was that OS on IBM servers ran a low-priority process to constantly read & re-write all the ram. This was to prevent pairs of errors from accruing.
And that prompts the question, does ECC really provide guarantees against RowHammer? Its all about the numbers of bits that RowHammer can flip. With ECC, 1 bit gets corrected, 2 gets flagged (or causes a reboot or something), 3 sneaks through.
So to my mind ECC just shifts the target and / or reduces the scope for a RowHammer attack, but does not necessarily eliminate it as a threat. The fundamental problem remains.
I've not dug far enough into RowHammer to know what successes there have been in the field, but flipping addition bits is going to be much, much harder than just the one. Mind you, even before RowHammer was announced, a proper validation effort involved testing against that class of attack. Apparently, not enough testing, mind you...
Strangely enough, MacOS keeps a lot of data compressed in RAM. The read/write to disk does not need compression/decompression. You can then utilize your spare memory and access the bits in it faster by decompression ram to ram rather than disk to ram, decompression, then access.
This scheme obviously will not work if the memory is being continuously accessed so I believe they have a good multi-level paging scheme. Mem->compressed-mem->Disk.
""1 in 3 systems experience one or more correctable memory errors a year..."
Thirty years ago a customer was doing a tech refresh using PCs as branch office terminals. Their supplier advised them to leave the PCs always loaded and powered up to save network bandwidth in remote reloading them every morning.,
I asked about the effect of environmental soft errors on the PC no-parity, no-ECC DRAM and received blank looks. Went off and dug out the man in our company who specialised in such considerations. He calculated that given the number of branch terminals - there would be an undetected soft error somewhere every two weeks with indeterminate consequences. The customer decided that the terminals would be reloaded every morning - as ECC was not being offered.
It's called, hold on, marketing. Come on Linus, these are the same guys that tell us that a monolithic kernel is better. The same guys that write a driver three or four times, well soon to be three for Intel (Firmware/UEFI, Windozzzzzze and Linux, for Intel - no more Mach based OSes).
I remember reading a paper a long time ago about how using ECC was a MUST for systems that were used for these large data sets. Our laptops have reached that stage already. The logic cost in terms of real estate and timing will be minuscule for doing ECC. The only cost is the extra dram.
yes, but of that 25% of total system price spent on RAM, how much cheaper were similar spec 4x512MB non-ECC rams ? if the non-ECC stuff was 12% cheaper than the ECC stuff, and the ram was 25% of the total, then the total system price of the ECC was more like 3%.
oh and the Opteron was AMD's server CPUs, marketed against the Intel Xeon's.
I'm with Linus on this. PC's stuffed full of large amounts of RAM are routinely used for things like Finite Element Analysis. Life-critical calculations are now performed by everyday average PC's in every engineering application in the land. Sure, the probability of a bitflip screwing your results are low; but it's not zero.
Finding an ECC-capable laptop is nigh impossible (probably because for same workload, ECC power draw is greater); and one really does not want to have to get a server rack or rip-off pricing on a "Workstation" to find a suitable motherboard.
With the quantities of RAM now routinely up in the 64GB+ territory, is it too much to ask for at least the motherboard option to cover ECC without quadrupling the price? (Before buying the ram, which I accept should be somewhere of order 1/10th more expensive due to the need to have 9 ram chips for every 8 plus the little bit of extra circuitry to make it fly).
TBF running life critical software on a laptop (assuming it leaves the office/house) is a bit of a no-no. And that software and its results would be better served by being run on something more efficient and not so portable. I've got a screamer of an 8 core laptop but am moving my crunching off to a more cost effective GPU on a shit motherboard, or I was until I read this and am wondering about ECC packed GPUs.though I think I'd probably do my error correcting by re-running certain proofs - I can probably cope with 10 errors a year as I'm still in learning/exploratory mode but in a business mode I could easily get away with selling that error rate but wouldnt feel happy about it.
This does make me wonder about the problem domain. I have just tried running some tests on some AI problems and these seem largely immune to bit flip type errors as they tend to try and converge and an error just slows down the convergence in training. And I guess one could use an ECC mainboard to drive a GPU relatively safely by refreshing the model on the GPU on a regular basis..
Our office has gone through the process of loading everyone onto a laptop in recent years; a trend that's unlikely to go away and I imagine exacerbated by Covid. The problems I'm thinking of in particular only need to be looked at once or twice a year; solve in fractions of a second- yet are of life critical nature. Against this use case, getting hardware to meet requirements? Forget it, there are obligatory cost targets to reach. Service the 5000 users that need email and excel; not the handful of engineers that need proper tools!
I have code written in the 70's in FORTRAN-IV the results of which are still being used today. In original form, that code ran on an IBM mainframe. Today; if it needs rolling out for a one-off result; it runs inside a DOS VM inside a Windows VM. Talk about number of layers that could go wrong. The original manual for it is hilarious - it even estimates the power consumption, time and cost of running the mainframe.
The hardware requirements are of course negligible by todays standards; but as the number of layers go up, so does the potential for errors. And one suspects the old battleaxe of a mainframe had solid error checking hardware.
The odds of the one-off run coinciding with a bitflip are obviously low; however, subtle errors are where it's most dangerous. Gross error would be immediately spotted by a knowledgeable user. A 5-10% deviation would be tough to spot - and outside the safety tolerances / design fat in your average system.
15 years ago a decent motherboard could be had sub £150 with ECC support; and a 256MB DIMM absolutely more than adequate to the task at hand. Today, you want ECC; there's this assumption you want massive performance and vast quantities. I do have uses for the that capability; (Finite-element especially); but good luck finding something of "average" cost as opposed to "workstation" pricing to do the job. I'm semi-wondering if getting an older ECC system off-network might be a good idea.
Regarding GPU's; I haven't really explored ECC for those yet. The FEA application I use doesn't use the GPU for anything other than rendering the screen (mostly because GPU RAM is limited in quantity, and when a model can fill a 4TB swap file the limit comes down to the speed of your storage - although a few workstations are in principle able to support such bonkers quantities of RAM!)
It would (have) be(en) their job to tell their readers about ECC.
Today, every moron has a PC with 16 GB RAM and SmartPhones come with 12 GB of RAM. Not so long ago, only the “big irons” had so much memory and the structures of that memory were bigger by magnitudes. To translate this: You will experience bit flips in memory. If you're lucky, your computer/phone freezes up. If you're unlucky, your (file) system gets corrupted. If you are even more unlucky, a money transfer will get corrupted and put you in dept. Shit happens. It's just a matter of luck. ECC doesn't cost more than normal RAM and it isn't slower – really (I'm not a fan of homeopathy).
Dear journalists, it's YOUR job to tell the people that an Intel iXXXXX is crap, because it doesn't support ECC. ECC being vital for systems with more than 256 MB of RAM...
> every moron has a PC with 16 GB RAM
Mines 12, and before I upgraded to a new machine I had 8.
> SmartPhones come with 12 GB of RAM
Sure they do. You must be thinking of the overpriced flagships. Mine phone and tablet has 2GB. I have 16GB of storage, of which I can use about 4 if I have nothing installed.
Intel didn't kill ECC and parity memory in consumer and business end-user PC's. Rather the customers and clone makers did, and the trade press was complicit.
Early personal computers did not have either parity or ECC memory. Then the IBM Personal Computer came out for business use, with advertising referring to it as "The IBM of Personal Computers." IBM's middle name is Business, as in "International Business Machines."
IBM PC's carefully tested all memory at Power On. That took time which users did not appreciate. The PC stopped hard if it encountered any, even momentary error while running. People did not appreciate losing their unsaved work in progress.
Clone makers won business by doing several things:
* They offered BIOS settings to skip the memory test at power-on. People loved the time savings.
* They offered computers without parity memory at a lower price. People loved the lower price.
The trade press was complicit. It made fusses about all sorts of things, but their editorials did not educate users about the risks of skipping tests and not having parity checking. Furthermore, their reviews of these clones did not downgrade them for lacking parity memory or offering an option to skip the power on test.
During those years, IBM strategically shifted from telling customers what is good for them to being "Market Driven." That meant giving customers what they want. Even business and health care customers voted with their pocketbook that they did not value valid results. The same IBM executives who said it was a bad decision technically, said it was the right market driven business decision to drop parity memory from desktops and laptops. I heard this directly from executives of both the PC and the memory chip divisions sitting together addressing an internal IBM audience.
Given the lower quantity of RAM fitted to OCs during the period you describe, and the larger process size (bigger transistors are harder to flip) such RAM was built on, is the issue corrected RAM the same then as it is now?
I thought DDR5 will have ECC included so intel won't be able to differentiate xeon CPUs based on that.
Example:
https://www.overclock3d.net/news/memory/ecc_ecc_for_everyone_sk_hynix_spills_the_beans_on_its_ddr5_dram_tech/1
https://www.rambus.com/blogs/get-ready-for-ddr5-dimm-chipsets/