back to article New year, new rant: Linus Torvalds rails at Intel for 'killing' the ECC industry

Linux creator Linus Torvalds has accused Intel of preventing widespread use of error-correcting memory and being "instrumental in killing the whole ECC industry with its horribly bad market segmentation." ECC stands for error-correcting code. ECC memory uses additional parity bits to verify that the data read from memory is …

  1. IGotOut Silver badge

    Maximise Profit Margins.

    There, explained in three words.

    1. Terry 6 Silver badge

      Re: Maximise Profit Margins.

      Maximise short term profit margins

      FTFY

  2. Anonymous Coward
    Anonymous Coward

    Alternatives are available

    For example, see https://www.intelligentmemory.com/dram-components/ecc/

    This allows automatic single-bit correction for CPUs that don't support ECC natively.

    1. In total, your posts have been upvoted 1337 times

      Re: Alternatives are available

      That's very nifty. I wonder how they do it without adding read latency. Are they underspeccing the array performance?

    2. John H Woods Silver badge

      Re: Alternatives are available

      I wouldn't dream of hosting a ZFS file server on hardware without ECC. What's the point of being paranoid about your disks and not your RAM?

      1. Anonymous Coward
        Anonymous Coward

        Re: Alternatives are available

        ZFS has a lot of advantages, even without ECC (and the "scrub of death" is a myth).

        I would like to have ECC on my home-brew FreeNAS, but I can't justify the additional motherboard and RAM costs or the extra power draw (server grade boards tend to be much more power hungry).

        However, I insist on it for business grade NAS solutions.

        1. Alan Brown Silver badge

          Re: Alternatives are available

          "I would like to have ECC on my home-brew FreeNAS"

          ECC ram is almost the same price as non-ECC (seriously, it's about 3-4% different, if that)

          The cost difference is in the Intel ECC supporting boards and CPUs vis non-ECC ones - the solution is not to use Intel CPUs

        2. Andy Humphreys

          Re: Alternatives are available

          You don't need a server grade board nowadays. I've got a Gigabyte X570 Gaming X board coupled with a Ryzen 7 2700X and 64GB (2x32GB) Samsung ECC happily running TrueNAS Core 12.

          Cost of board about £170, memory £325, processor £200. Runs my RAID-Z1 vdevs very well!

          1. Trenjeska

            Re: Alternatives are available

            That's still too much for something that only serves files. Plus it sips to much juice. Looking for a max 35W system to run disks. The disks themselves already add 5W each when spinning.

            1. Andy Humphreys

              Re: Alternatives are available

              Fair comment if you are indeed just running a very simple file server. For me I'm also running NFS and iSCSi to support VSphere datastores as well as also running a few jail-based apps incl. Plex which sometimes benefits from the CPU when transcoding.

              I guess one's perspective of what is reasonable cost depends on the direct benefits that will arise. I'm hosting around 60TB of data, so it was worth the extra pennies to get the additional protection.

          2. katrinab Silver badge
            Happy

            Re: Alternatives are available

            I have an i7-3770 that was retired from desktop duties with 32GB of RAM running vanilla FreeBSD. Storage is 2x 2TB Cruicial MX 500s + 4x 10TB Iron Wolves in RAID Z2. Probably massively over-specced for the task.

        3. Down not across

          Re: Alternatives are available

          I would like to have ECC on my home-brew FreeNAS, but I can't justify the additional motherboard and RAM costs or the extra power draw (server grade boards tend to be much more power hungry).

          My nas4free is running on HP Microserver (AMD Athlon II (N36L)) with ECC happily and isn't that power hungry. The 4 x 3.5" spinning rust are most likely the largest consumer of power. Newer Microservers have Opteron and IIRC come with iLO (whereas on the older ones iLO was available as add-on card).

          1. Anonymous Coward
            Anonymous Coward

            Re: Alternatives are available

            My complete setup, including 4 HDDs and three SSDs uses less than 20 watts (average).

  3. Snake Silver badge

    I don't see it that way

    The Linuxheads will flame me for this, but I'm sorry Linus, I don't believe you are correct.

    My Lenovo is equipped with a Xeon, and supports ECC. Yet it didn't come equipped with it. And the huge majority of the models in the same line didn't, from the factory.

    Intel built the Xeon with ECC support. Yet here's an entire model range that needed to be ordered as a custom config in order to have it from the factory.

    Why? It wasn't Intel's fault: ECC memory is just too expensive for most people's cost/benefit equations.

    Nothing to do with Intel. You want more ECC market penetration? Tell the OEM manufacturers, from memory to computer, to stop price gouging on that single extra bit. Some computer manufacturers charge a real premium just to upgrade to ECC.

    1. Rich 2 Silver badge

      Re: I don't see it that way

      I don’t believe the cost argument. The cost difference should be minimal (and of course would evaporate almost entirely if ECC were used everywhere). If ECC memory is significantly more expensive than non-ECC then that’s an artificial sales/marketing thing - it’s not because of any intrinsic significant additional expense

      1. Anonymous Coward
        Anonymous Coward

        Re: I don't see it that way

        Isn't that the point? The costs should be minimal, but aren't.

        1. GrumpenKraut

          Re: I don't see it that way

          You need 9 RAM chips instead of 8, right? That is in line with last time (years ago, I admit) I compared prices for ECC vs. non-ECC RAM (all from the same place). ECC was about 15 per cent more expensive, as to be expected. And the prices of that dealer were quite OK. Has that changed so much?

          Even longer ago ECC was just overpriced (similar to Xeons), costing twice as much or more than non-ECC RAM.

          1. Snake Silver badge

            Re: 15% surcharge

            But it's not. As one example, a quick search through Newegg showed only one 2666mHz ECC SODIMM from a major manufacturer, a 16gb part from Hynix

            https://www.newegg.com/p/pl?storeName=Laptop-Memory&pageTitle=Laptop+Memory&

            N=100007609+601204087+600006161&Submit=ENE

            at $91.86. A 32gb non-ECC dual SODIMM kit from a major manufacturer only averages around $122 or so

            https://www.newegg.com/p/pl?storeName=Laptop-Memory&pageTitle=Laptop+Memory&N=100007609+500002048+601204087&Submit=ENE

            Yes, that's laptop. But downgrade a Dell Precision 5820 desktop workstation to an Intel i7, from a Xeon, and the 16gb RAM kit is $124.98 less expensive. A shown, you can double the RAM for that amount of money.

            So there is DEFINITELY a price difference.

            1. GrumpenKraut
              Unhappy

              Re: 15% surcharge

              Oh dear, we seem to be back at "double price". Thanks for the information.

              1. Snake Silver badge

                Re: "double price"

                No problem amigo :-) I knew this because I shopped the difference when I was upgrading the Lenovo from 32gb to 64gb. The stock 32gb was non-ECC so, if I wanted ECC, I would need to purchase the full 64gb kit.

                Not at those prices I wouldn't!!! O.O Yeow! Ouch!

                1. Anonymous Coward
                  Anonymous Coward

                  Re: "double price"

                  Though to be fair, as ECC has been relegated to the corporate desktop and server worlds mostly, there is a magic price point for ECC where it starts to get cheap. You wont find it on the top megahurtz parts, and the top speeds will be punitively expensive unless you are buying data center volumes, but midrange memory (Like the no frills kingston bare sticks) is much closer to their non-ecc parts. The performance hit (the other old argument against ECC, not mentioned in the article) is also small on modern architectures, at least compared to the losses to mitigate the various side channel attacks.

                  When you start actually using large GBs of memory heavily, you start seeing simple memory errors often enough that ECC for just for system stability is a good investment, but also an important protection for the security reasons that Torvalds pointed out. It's often possible to get binned parts that have better latency and eliminate the speed penalty for less $$ then going to higher Mhz rated parts. People tend to forget that faster CAS timings also affect memory performance, not just the base clock speed.

                  That said I'd also like to see the artificial price segmentation for SAS and SATA addressed, and then near term SATA could just go away on the the motherboard side(since SAS is backwards compatible).

                  1. katrinab Silver badge
                    Meh

                    Re: "double price"

                    I think SAT will be replaced with NVME on mainstream desktops, and be relegated to use cases where you need massive amounts of storage that doesn’t need to be particularly fast.

            2. eldakka
              Holmes

              Re: 15% surcharge

              a quick search through Newegg showed only one 2666mHz ECC SODIMM from a major manufacturer
              Quite.

              And using basic economics, what happens when there is:

              • limited demand due to deliberate segmentation practices of the near-monopolistic largest desktop and server CPU supplier;
              • limited competition;
              • limited supply?

              So, with little or no competition, the price premium is only about 50%.

              This tells me that the margin on ECC is high due to the aforementioned reasons, not that the manufacturing cost is (particularly) high.

              If Intel hadn't hobbled ECC so that we had actual volume and competition in the ECC segment, don't you think that the price premium of ECC would reflect the manufacturing cost increase, about 15%, as opposed to the current price premium of 50% or more?

              1. Snake Silver badge

                Re: 15% surcharge

                Everyone is completely discounting the pricing factor of RAM binning. ECC RAM might also be more expensive because the RAM manufacturers save the top-binned parts for ECC, under the guise that there will likely be less errors on the top parts and, whilst ECC corrects those errors, many systems flag the errors with DANGER, WILL ROBINSON!

                Nobody wants to see system error messages, that's the entire point of buying ECC in the first place. So use the best parts in ECC in order to grant the part the best reliability record, the best customer satisfaction ratings. You got what you paid for - no error messages, right??!

                And top-binned parts cost money.

                So ECC might ALWAYS be more expensive, you're paying for reliability, and to grant that they save the best parts for them.

                1. Anonymous Coward
                  Anonymous Coward

                  Re: 15% surcharge

                  You are paying for more chips to make ECC (a parity chip for every byte for standard ECC memory although other schemes are available to cover higher levels of performance/capacity/reliability).

                  Because of the ECC calulation taking a few cycles, ECC is generally slower than he equivalent chips (i.e. identical manufacturing process/manufacturer) when used in non-ECC RAM so it is unlikely to be binned for high-speeds.

                  The lower speeds generally mean that ECC RAM will be more reliable than non-ECC RAM but generally the same chips running at the same speed will be equally reliable as the timing tolerences between the chips are greater at those slower speeds versus high-speed RAM where manufacturers bin chips with near identical performance to enabel them to run faster.

                  TL;DR: You make a million RAM chips and bin by speed. All the ones that pass the QC checks at base speeds will have a similar level of reliability. 99%+ of the chips will be used in nonECC configurtations. For chps that can run faster, you bin them at a higher speed and charge a premium. There is little difference in quality between the base spec chips. The cost difference comes from the extra chip plus the additional costs for lower volume parts.

                2. CrackedNoggin Bronze badge

                  Re: 15% surcharge

                  Just like "organic vegetables".

          2. dakra

            Re: I don't see it that way

            One extra bit gets you Parity checking. It can detect an error, but can't figure out how to correct it.

            Unlike parity memory, which uses a single bit to provide protection to eight bits, ECC uses larger groupings. Five ECC bits are needed to protect each eight-bit word, six for 16-bit words, seven for 32-bit words and eight for 64-bit words.

            Source: https://www.pctechguide.com/computer-memory/ecc-memory

            see also:

            https://www.realworldtech.com/parity-and-ecc-explored/

            https://en.wikipedia.org/wiki/Hamming_code

      2. Electronics'R'Us
        FAIL

        Re: I don't see it that way

        ECC simply adds an extra 8 bits of the same memory device to a 64 bit interface for DDRx SDRAM systems (which is what just about everything in this arena uses).

        So the cost of parts is 12.5% more for the memory device (and probably far less than 1% at the system level).

        The ECC is done in the memory controller (part of the microprocessor) and as it is a standard thing, the marginal cost of adding it is tiny - a little bit more silicon real estate.

        There are some extra PCB tracks to be added but given how many are already present they add little cost (might make PCB routing more interesting but it really is not that much more complexity). Initialisation of memory takes a bit longer (every memory location has to be written a valid value for the ECC to be valid) although it is possible to simply initialise a process space when it is first used.

        The only reason I can see for Intel to take this stance is to make sure that those with a need for ECC have to pay a hefty premium.

        I have done designs with ECC for decades and the marginal cost doesn't really even show up in the grands scheme of things.

      3. confused and dazed

        Re: I don't see it that way

        The cost is a x72 DIMM versus a x64 DIMM - ie an extra 1/9 DRAM this gives you single error correct, double error correct (SECDED).

        The price is however driven by commodity pricing ...

    2. Anonymous Coward
      Boffin

      Re: I don't see it that way

      I have to agree with you. ECC requires an ecosystem, not just a chip, and Intel doesn't control that. As the article points out, AMD's chip doesn't work without supporting motherboards and memory chips either. It's not just the Xeon chip prices, the Xeon ecosystem prices are higher across the board with MB and RAM vendors charging more as well to fully support ECC.

      1. Anonymous Coward
        Anonymous Coward

        Re: I don't see it that way

        If you are using a Ryzen-based AMD system (maybe earlier models - I just haven't checked), all support at least unbuffered ECC out-of-the-box as long as you can stand the potential performance hit of running at 2666MHz/CL19 vs faster non-ECC memory and potentially being told that "the RAM isn't on the approved memory list for this motherboard" by the motherboard vendor although most vendors will support it on a selection of their offerings.

        I have a 64GB Ryzen platform that hasn't had any unexplained crashes in a year versus it's Intel equivalent without ECC that has hung twice. It's a home lab so its just annoying rather than critical but it's nice not having to fix VM's following a crash. If only time was worth something... Kingston/Crucial do "affordable" unbuffered ECC so I only paid around 20% more for the priviledge.

    3. Anonymous Coward
      Anonymous Coward

      Re: I don't see it that way

      "Why? It wasn't Intel's fault: ECC memory is just too expensive for most people's cost/benefit equations."

      The CPU supports ECC but does the chipset? And who makes the chipset? That is the artificial segmentation Linus is referring to and yes, it is created by Intel to differentiate their products and then OEM's are left to choose between the different pricing tiers.

      In terms of the cost difference between ECC and non-ECC it SHOULD be around 15% (i.e. the cost of adding a 9th RAM chip for every 8 existing RAM chip plus a little more on the chipset/) but is instead often 100%+ more because of the relatively low volumes.

      And as for error rates? A computer that is on 24x7 will see ~3 correctable errors a year (http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf) although that maybe a little on the low side with more than 4GB now being common.

      1. Yet Another Anonymous coward Silver badge

        Re: I don't see it that way

        >It wasn't Intel's fault: ECC memory is just too expensive for most people's cost/benefit equations."

        Intel kept ECC for the server market to charge a premium for Xeons

        If you had ECC on all CPUs then cheap RAM would become available with ECC and everyone would just buy desktop machines instead

        Remember you have to buy Xeon (or preferably Itanium) for 'real' work

        1. Snake Silver badge

          Re: Cheap

          "If you had ECC on all CPUs then cheap RAM would become available with ECC"

          That's a plausible theory but unfortunately one that can't be proven. For example, the market may end up with a different stratification, speed rather than ECC vs non-ECC. What i mean by that is that "good" ECC RAM might still end up more expensive due to internal RAM manufacturer's needs or demands; "cheap" ECC RAM would be settled on as being slower for the application whilst fast ECC, the stuff you really want as an enthusiast, would still be kept expensive.

          Just because they could.

          Never underestimate the greed of industry. We have no way of promising that fast ECC would be cheaper if only it was more popular. Many RAM circuit board designs already have the traces for the ECC, the parts only need to be populated. Yet the RAM companies are often charging big, big premiums just to add those missing chips to an existing design - again, because they can.

          1. Down not across

            Re: Cheap

            That's a plausible theory but unfortunately one that can't be proven. For example, the market may end up with a different stratification, speed rather than ECC vs non-ECC.

            I've done that for decades at home. Anything server-like I happily traded speed for reliability and chose slower memory but with ECC. For gaming rig, who cares if it crashes so opted for speed rather than ECC.

            That's not to say if I could have my cake and eat it..

        2. Anonymous Coward
          Anonymous Coward

          Re: I don't see it that way

          "Intel kept ECC for the server market to charge a premium for Xeons"

          Thats only part of the story - there's desktop chipsets that don't allow ECC while the equivalent workstation/server chipset does.

          There's nothing inherently in the CPU that stops Intel CPU's supporting ECC - the memory controller is in the northbridge portion of the chipset. While there are reasons to discourage ECC for some chipsets (i.e. integrated GPU's), supporting it is an almost zero cost option or would be if Intel didn't charge extra for "Xeon support" and there are a number of server boards that support non-Xeon CPU's as long as they don't have integrated graphics. As Linus says - he can buy a Xeon and pay 5x the cost of a desktop CPU for 2x the performance.

          Intel have segmented mobile/desktop/workstation/low-end server/mid-range server/high-end server by limiting support for the number of PCIe lanes, ECC memory and similar features. It was great when AMD weren't providing any competition and they could rely on tick-tock production cycles but 10nm has left them sitting on bad decisions from 5 years ago.

        3. John Geek

          Re: I don't see it that way

          in fact, Intels' Celeron, Pentium, and Core I3 cpu's have ECC enabled, but only when used on a 'server' chipset, like a C2xx, not on a desktop/laptop chipset. and the only reason Core i5 and i7 have it disabled is so they can sell more expensive Xeon chips which are functionally identical.

          If 4GB systems have an average of 3 single bit faults a year, then a 16GB system would have 12/year. My desktop and laptop both have 16GB and both are 5+ years old.

    4. A random security guy

      Re: I don't see it that way

      You are quoting prices which are based on volumes, market verticals, and price gouging. If the volumes are there the parts suppliers will kill each other to get you the ECC modules for at most a 15% higher cost.

      The entire chip industry works on razor-thin margins except in the higher end server markets where paying double for a component is a small blip when you are dealing with high MTBF's.

      The whole idea of reliability is to build your parts reliable.

  4. Dvon of Edzore

    The Party Line

    The official talking point was that the added circuitry for ECC memory (including the extra bits of storage) would actually reduce the reliability of most systems because there would be more parts to fail. This while simultaneously claiming ECC was needed for servers with their massive memory capacities of up to 4 GB! (Windows NT for servers) Considering a typical consumer build of the last decade had as much memory as a server of the Y2K era, that argument sounds a little weak, doesn't it?

    1. GrumpenKraut

      Re: The Party Line

      > that argument sounds a little weak, doesn't it?

      You have about the same chance of a single-bit error than you have with non-ECC. But that error is corrected! If you have two errors you at least find out. Note that the chance of a double error should be monumentally small, so a double error is an indication some module may be dying.

      So, no, ECC really does not have any disadvantage to speak of.

      1. Woodnag

        Absolutely

        Although the article overstates with "ECC memory solves these problems" when actually "ECC memory mitigates these problems" by correcting single bit errors reliably.

      2. needmorehare

        Re: The Party Line

        Depends if you count slow performance (as opposed to repeated kernel panics) to be an issue. In my experience, when ECC RAM goes bad, it makes machines monumentally slow with no logical reason as to why, until you use the paid MemTest86 and discover thousands of "corrected errors" over a few days worth of rounds.

        1. Anonymous Coward
          Anonymous Coward

          Re: The Party Line

          Maybe you should post the results of Memtest86 on a machine with faulty ECC RAM and faulty non-ECC RAM to allow us to compare the difference?

          Next you will be saying that a degraded RAID volume runs slower as well...

        2. Alan Brown Silver badge

          Re: The Party Line

          "when ECC RAM goes bad, it makes machines monumentally slow with no logical reason as to why,"

          really? My linuxen all log it as Machine Check Errors

  5. don't you hate it when you lose your account

    Rant?

    From here it looks like the anger management classes are doing the trick. And breath

    1. Pirate Dave Silver badge
      Pirate

      Re: Rant?

      Yep. A rather pale rant from the Chief Penguinista. He's capable of so much more...lol.

  6. Anonymous Coward
    Anonymous Coward

    I've recently bought a Ryzen system, and just finding memory sticks that would work was a pain. The mainboard is clearly advertised as able to use DDR4 3800, but the first 2 sets I tried at merely 3600 did not work, even though the 2nd was on the manufacturer compatible list. And their support told me that AMD only supports 3200, anything else not working is too bad, but not their problem.

    And that's not getting into the GPU issues I got afterwards.

    So I'm not likely to bother hunting for ECC on AMD until it's officially supported. I'll continue stuffing important data on my Xeon which has it.

  7. DCFusor

    Recent AMD mobo

    My b550 mobo claims to support ECC, though I haven't tried it. I just built a pretty nice system on one. Since I only use it for video editing, it doesn't

    need huge long uptimes at one go, I use a less power hungry system for a daily driver.

    Doesn't the ECC check slow things down a little bit? Or is it always pipelined so the CPU finds out a couple cycles later (if it hasn't already crashed)?

    IIRC that's how it used to be done.

    1. Anonymous Coward
      Anonymous Coward

      Re: Recent AMD mobo

      ECC is built for long uptimes - speeds will be slower than the fastest RAM but likely within 1%-3% of most value RAM worst case. Caching will typically hide the difference on systems with large L2/L3 caches.

    2. Alan Brown Silver badge

      Re: Recent AMD mobo

      "Doesn't the ECC check slow things down a little bit? "

      Yes and no.

      If you're using buffered ram (which you should!) then the access latency is a little longer (one cycle)

      Unbuffered ram has "other issues" including being touchier about disturbances to supply voltages or system noise and you can't put as much into the machine

      Apart from that, there's no penalty in normal operation

      I'd argue that we should have gotten away from dram a long time ago. Random access latency has become the single biggest bottleneck in systems and is why there's caching/branch prediction/etc all over the place. lower latency ram would remove most of the necessity for it.

      1. confused and dazed

        Re: Recent AMD mobo

        The move to other RAM is ongoing - look at the slew of storage class memories and PMEM coming. The reality is that as a first off-cpu memory where you have to balance latency, density, cost, bandwidth, power and reliability there is nothing superior yet. Not saying that won't change, but even the much trumpeted 3d x-point has not made a dent.

      2. Anonymous Coward
        Anonymous Coward

        Re: Recent AMD mobo

        "If you're using buffered ram (which you should!) then the access latency is a little longer (one cycle)"

        Non-ECC vs 9-chip unbuffered ECC should be around 2 CAS refresh cycles longer to compute ECC

        Buffered or registered ECC will be around 1 cycle longer again as they have the additional step of copying memory contents to or from the buffers/registers.

        the primary advantage of buffered/registered ECC (or non-ECC memory if register non-ECC memory is used) is that it seperates memory bus electrical load from the actual memory load which is why you see motherboards quoted as supporting 64GB uregistered or 256GB registered memory (or similar numbers).

        Comparing ECC speeds (i.e. the speed quoted in MHz), finding 9 chips (or a multiple of 9 chips) that perform well doesn't have a large target market so generally you will see 2400MHz or 2666MHz and CL17-19 versus non-ECC DIMMs at 2400/2666CL16-17. I haven't looked recently but was unable to find ECC RAM running at more than 2666MHz and they overclock poorly...

  8. IGnatius T Foobar !

    Who is this "Intel" ?

    Why is Linus complaining about the predecessor to AMD? Those people are old news.

    It could also be argued that complaining about x86 is a waste of time because the world is poised for a move to ARM anyway.

    1. spuck

      Re: Who is this "Intel" ?

      Is this the year of Linux on the Desktop on ARM? :)

      Sure, Linux is running on a huge number of ARM devices now, but to say that "the world is poised for a move to ARM" seems a bit premature. x86 will be around for a *long* time to come...

      1. heyrick Silver badge

        Re: Who is this "Intel" ?

        "x86 will be around for a *long* time to come..."

        Sadly.

      2. A random security guy

        Re: Who is this "Intel" ?

        I understand that x86 may be here for a long time. Most of the world does not use a desktop. It is ARM all the way for most people.

        x86 will probably go the way of the VAX instruction set. I bet there are still machines running FORTRAN on VMS in some corner of NASA.

        1. Zippy´s Sausage Factory
          Devil

          Re: Who is this "Intel" ?

          There's probably some emulators running FORTRAN on VMS somewhere in NASA as well, to give the maintainers a sandbox. And it's probably running much faster than the original hardware ever could...

          1. DuncanLarge Silver badge

            Re: Who is this "Intel" ?

            > There's probably some emulators running FORTRAN on VMS somewhere in NASA as well, to give the maintainers a sandbox.

            It annoys me when people use their imagination to fill in the blanks, lol thinking that Fortran is so old you need a VMS emulator.

            Well the truth is that Fortran 2018 is the latest version, it integrates with .Net and C, compiles to portable code and although not the most popular language is still used extensively in scientific circles.

            Presumably your point was NASA would have an emulator for old versions of hardware like the Voyager probes. Maybe they do, can't think why. Those probes are way too far away to patch. As for NASA and development hardware, well they usually just chuck the stuff in the tip when the project ends.

            https://ourcodingclub.github.io/tutorials/fortran-intro/

        2. DuncanLarge Silver badge

          Re: Who is this "Intel" ?

          > Most of the world does not use a desktop

          Citation needed, but going with the assumption I suspect most of the world is indeed using laptops, the vast majority of which are based on x86-64 mostly using intel chips.

          Oh you were thinking of tablets? Yeah some people may do banking on those when they are in bed or caught short in the shop, so they need ECC too.

          > I bet there are still machines running FORTRAN on VMS in some corner of NASA.

          Why would they need that? Fortran 2018 is the latest version and integrates with .Net

  9. Anonymous Coward
    Anonymous Coward

    RAMBUS and ECC systems

    Years ago - Intel was pushing ECC, but only on RAMBUS as they owned patents on it. Once their wanabe monopoly failed (licensing would have been so money generating) they dropped the ball and walked off the field like a cry baby. Don't cry when the world doesn't want to pay fees to you for ever for what should be opensource. Keep the ball rolling, and let the game get better.

    1. Anonymous Coward
      Anonymous Coward

      Re: RAMBUS and ECC systems

      Fuck RAMBUS!

      (sorry, old habits die hard)

  10. Anonymous Coward
    Anonymous Coward

    I guess the lack of ECC support on desktop processors must be new?

    My FreeNAS server uses an ASRock E3C224D2I server motherboard and a Core i3-4150 CPU. Although the motherboard is a server version, complete with IPMI and a "just trust us" BMC to make a headless life easier, the CPU is most definitely described by Intel as a desktop part.

    The combination supports ECC just fine.

    1. Anonymous Coward
      Anonymous Coward

      Re: I guess the lack of ECC support on desktop processors must be new?

      "The combination supports ECC just fine."

      It's the memory controller that limits the use of ECC on Intel motherboards and anything from the last 12 years should support ECC if the chipset supports it, although I'm sure Intel will have a handful of exceptions...

      You have a C224 chipset and yes, it supports ECC for all supported CPU's.

  11. Anonymous Coward
    Anonymous Coward

    Yes yours does do it as the Core i3-4150 supports ECC, most Intel core cpu sku's don't have an ECC capable memory controller, the top of the line Core i7-4790K doesn't. The only reason why that combination works is because Intel's bios's allow it (on a very specific chipset) and the cpus memory controller can handle ECC. Strictly speaking ECC should work on any motherboard combination as long as your CPU's memory controller can handle it.

    With a Ryzen system ECC works on the workstation boards like an AsrockRack X470D4U or a gaming type board like a Gigabyte X570 Aorus Pro because it isn't artificially prevented in the bios and all Ryzen cpus have an ECC capable memory controller.

    1. Yet Another Anonymous coward Silver badge

      There was probably a requirement from a big customer for a cheap embedded board with ecc, eg. ATMs or medical.

      There is no risk of a micro i3 board stealing server sales, that's why you can't get the i7 version

  12. Anonymous Coward
    Anonymous Coward

    Linux article, so

    In before Jake!

    Good job IR35 ain't on the agenda (though he will probably get it mentioned, possibly by even claiming be doesn't know what it is)

    1. Anonymous Coward
      Anonymous Coward

      Re: Linux article, so

      If you're playing a game, it's nice not to do it as AC... it makes it hard for spectators to follow the action.

  13. Mike 16

    Ninth bit?

    It has been quite some time since I was actively following such stuff, but at least since "64 bit" memory busses/cards came around, the number of bits needed for SECDED and simple byte parity have been the same.

    Performance of ECC was/can-be an issue in that correction on read might be needed, and sub-64-bit writes needed to be Read-Modify-Write. Not such an issue when most writes are coalesced in the cache. Anyway, I'm not seeing the "need more bits of DRAM" argument.

    Are non-ECC systems also blowing off even byte parity (which would also require 72 bits total for 64 payload) like so many of the 16 and 32-bit era PCs did?

    Yeah, I recall memory "cards" that had "generated parity", so even if the system "needed" Parity, what it got was freshly generated from whatever crap data fell out of the RAM.

    1. Claptrap314 Silver badge

      Re: Ninth bit?

      Parity is "one bit detected". ECC is "one bit corrected, two bits detected." When I worked on validating the Cell Microprocessor at IBM in 2003 timeframe, the L2 cache lines were ECC. 64 bits of data, 10 for the ECC.

      My understanding was that OS on IBM servers ran a low-priority process to constantly read & re-write all the ram. This was to prevent pairs of errors from accruing.

      1. bazza Silver badge

        Re: Ninth bit?

        And that prompts the question, does ECC really provide guarantees against RowHammer? Its all about the numbers of bits that RowHammer can flip. With ECC, 1 bit gets corrected, 2 gets flagged (or causes a reboot or something), 3 sneaks through.

        So to my mind ECC just shifts the target and / or reduces the scope for a RowHammer attack, but does not necessarily eliminate it as a threat. The fundamental problem remains.

        1. Claptrap314 Silver badge

          Re: Ninth bit?

          I've not dug far enough into RowHammer to know what successes there have been in the field, but flipping addition bits is going to be much, much harder than just the one. Mind you, even before RowHammer was announced, a proper validation effort involved testing against that class of attack. Apparently, not enough testing, mind you...

    2. jtaylor

      Re: Ninth bit?

      I recall memory "cards" that had "generated parity", so even if the system "needed" Parity, what it got was freshly generated from whatever crap data fell out of the RAM.

      SUN workstations used all 9 bits. I loathed those fake-parity modules.

  14. LateAgain

    Memory Compression?

    Does anyone know if the memory compression, used by OS-X and windows 10, will spot corruption?

    Just asking :-)

    1. Claptrap314 Silver badge

      Re: Memory Compression?

      Separate issue. Memory compression is about storage on the HD. It has nothing to do with the memory chips on the motherboard.

      1. A random security guy

        Re: Memory Compression?

        Strangely enough, MacOS keeps a lot of data compressed in RAM. The read/write to disk does not need compression/decompression. You can then utilize your spare memory and access the bits in it faster by decompression ram to ram rather than disk to ram, decompression, then access.

        This scheme obviously will not work if the memory is being continuously accessed so I believe they have a good multi-level paging scheme. Mem->compressed-mem->Disk.

      2. DuncanLarge Silver badge

        Re: Memory Compression?

        Do your research, memory compression is used to COMPRESS the contents of RAM.

        It has been in kernel 3.14 and is used by Android 4.4 and above. Win 10 does it too.

  15. VLSI

    Out: lifetime warranty RAM

    In: lifetime warranty DDR5 ECC RAM

    I'll take it, if modules actually only end up costing 12.5% more. I have no problems running memtest every so often and replacing faulty modules otherwise.

  16. Anonymous Coward
    Anonymous Coward

    ""1 in 3 systems experience one or more correctable memory errors a year..."

    Thirty years ago a customer was doing a tech refresh using PCs as branch office terminals. Their supplier advised them to leave the PCs always loaded and powered up to save network bandwidth in remote reloading them every morning.,

    I asked about the effect of environmental soft errors on the PC no-parity, no-ECC DRAM and received blank looks. Went off and dug out the man in our company who specialised in such considerations. He calculated that given the number of branch terminals - there would be an undetected soft error somewhere every two weeks with indeterminate consequences. The customer decided that the terminals would be reloaded every morning - as ECC was not being offered.

    1. Anonymous Coward
      Anonymous Coward

      Don't understand. The error rate is unchanged and still not deterministic. Were they saying that correcting a fault within 1 day by hard reset was preferable to having a machine in a bad state for an indefinite period? If so I think the logic is flawed.

      1. Dave 126 Silver badge

        > Don't understand. The error rate is unchanged

        Rate is unchanged, but time period is greater. The local office PC terminals were being left on overnight with programmes and data in RAM instead of being loaded at the start of the working day.

  17. trist

    "I never understood how you get twicec the CPU for five times the price"

    It's called, hold on, marketing. Come on Linus, these are the same guys that tell us that a monolithic kernel is better. The same guys that write a driver three or four times, well soon to be three for Intel (Firmware/UEFI, Windozzzzzze and Linux, for Intel - no more Mach based OSes).

  18. A random security guy

    Hadoop Map Reduce

    I remember reading a paper a long time ago about how using ECC was a MUST for systems that were used for these large data sets. Our laptops have reached that stage already. The logic cost in terms of real estate and timing will be minuscule for doing ECC. The only cost is the extra dram.

  19. Old Used Programmer

    Sometimes....it was required...

    In 2002, I built a system based on 2 AMD Opteron-240 CPUs. They *required* ECC DRAM. The 2GB of ECC DRAM (4x512MB DIMMs) was 25% of the total system cost.

    1. John Geek

      Re: Sometimes....it was required...

      yes, but of that 25% of total system price spent on RAM, how much cheaper were similar spec 4x512MB non-ECC rams ? if the non-ECC stuff was 12% cheaper than the ECC stuff, and the ram was 25% of the total, then the total system price of the ECC was more like 3%.

      oh and the Opteron was AMD's server CPUs, marketed against the Intel Xeon's.

  20. Binraider Silver badge

    I'm with Linus on this. PC's stuffed full of large amounts of RAM are routinely used for things like Finite Element Analysis. Life-critical calculations are now performed by everyday average PC's in every engineering application in the land. Sure, the probability of a bitflip screwing your results are low; but it's not zero.

    Finding an ECC-capable laptop is nigh impossible (probably because for same workload, ECC power draw is greater); and one really does not want to have to get a server rack or rip-off pricing on a "Workstation" to find a suitable motherboard.

    With the quantities of RAM now routinely up in the 64GB+ territory, is it too much to ask for at least the motherboard option to cover ECC without quadrupling the price? (Before buying the ram, which I accept should be somewhere of order 1/10th more expensive due to the need to have 9 ram chips for every 8 plus the little bit of extra circuitry to make it fly).

    1. Tom 7

      TBF running life critical software on a laptop (assuming it leaves the office/house) is a bit of a no-no. And that software and its results would be better served by being run on something more efficient and not so portable. I've got a screamer of an 8 core laptop but am moving my crunching off to a more cost effective GPU on a shit motherboard, or I was until I read this and am wondering about ECC packed GPUs.though I think I'd probably do my error correcting by re-running certain proofs - I can probably cope with 10 errors a year as I'm still in learning/exploratory mode but in a business mode I could easily get away with selling that error rate but wouldnt feel happy about it.

      This does make me wonder about the problem domain. I have just tried running some tests on some AI problems and these seem largely immune to bit flip type errors as they tend to try and converge and an error just slows down the convergence in training. And I guess one could use an ECC mainboard to drive a GPU relatively safely by refreshing the model on the GPU on a regular basis..

      1. Binraider Silver badge

        Our office has gone through the process of loading everyone onto a laptop in recent years; a trend that's unlikely to go away and I imagine exacerbated by Covid. The problems I'm thinking of in particular only need to be looked at once or twice a year; solve in fractions of a second- yet are of life critical nature. Against this use case, getting hardware to meet requirements? Forget it, there are obligatory cost targets to reach. Service the 5000 users that need email and excel; not the handful of engineers that need proper tools!

        I have code written in the 70's in FORTRAN-IV the results of which are still being used today. In original form, that code ran on an IBM mainframe. Today; if it needs rolling out for a one-off result; it runs inside a DOS VM inside a Windows VM. Talk about number of layers that could go wrong. The original manual for it is hilarious - it even estimates the power consumption, time and cost of running the mainframe.

        The hardware requirements are of course negligible by todays standards; but as the number of layers go up, so does the potential for errors. And one suspects the old battleaxe of a mainframe had solid error checking hardware.

        The odds of the one-off run coinciding with a bitflip are obviously low; however, subtle errors are where it's most dangerous. Gross error would be immediately spotted by a knowledgeable user. A 5-10% deviation would be tough to spot - and outside the safety tolerances / design fat in your average system.

        15 years ago a decent motherboard could be had sub £150 with ECC support; and a 256MB DIMM absolutely more than adequate to the task at hand. Today, you want ECC; there's this assumption you want massive performance and vast quantities. I do have uses for the that capability; (Finite-element especially); but good luck finding something of "average" cost as opposed to "workstation" pricing to do the job. I'm semi-wondering if getting an older ECC system off-network might be a good idea.

        Regarding GPU's; I haven't really explored ECC for those yet. The FEA application I use doesn't use the GPU for anything other than rendering the screen (mostly because GPU RAM is limited in quantity, and when a model can fill a 4TB swap file the limit comes down to the speed of your storage - although a few workstations are in principle able to support such bonkers quantities of RAM!)

  21. _LC_
    Alert

    Don't forget the “Tech Press”

    It would (have) be(en) their job to tell their readers about ECC.

    Today, every moron has a PC with 16 GB RAM and SmartPhones come with 12 GB of RAM. Not so long ago, only the “big irons” had so much memory and the structures of that memory were bigger by magnitudes. To translate this: You will experience bit flips in memory. If you're lucky, your computer/phone freezes up. If you're unlucky, your (file) system gets corrupted. If you are even more unlucky, a money transfer will get corrupted and put you in dept. Shit happens. It's just a matter of luck. ECC doesn't cost more than normal RAM and it isn't slower – really (I'm not a fan of homeopathy).

    Dear journalists, it's YOUR job to tell the people that an Intel iXXXXX is crap, because it doesn't support ECC. ECC being vital for systems with more than 256 MB of RAM...

    1. GrumpenKraut

      Re: Don't forget the “Tech Press”

      > ...Intel iXXXXX is crap, ...

      If you believe some of the posts above those chips actually do support ECC if using a Xeon chipset/board (I always thought the sockets are different). Makes matters even more depressing.

    2. DuncanLarge Silver badge

      Re: Don't forget the “Tech Press”

      > every moron has a PC with 16 GB RAM

      Mines 12, and before I upgraded to a new machine I had 8.

      > SmartPhones come with 12 GB of RAM

      Sure they do. You must be thinking of the overpriced flagships. Mine phone and tablet has 2GB. I have 16GB of storage, of which I can use about 4 if I have nothing installed.

  22. dakra

    Don't blame Intel. Blame the customers who didn't value valid results.

    Intel didn't kill ECC and parity memory in consumer and business end-user PC's. Rather the customers and clone makers did, and the trade press was complicit.

    Early personal computers did not have either parity or ECC memory. Then the IBM Personal Computer came out for business use, with advertising referring to it as "The IBM of Personal Computers." IBM's middle name is Business, as in "International Business Machines."

    IBM PC's carefully tested all memory at Power On. That took time which users did not appreciate. The PC stopped hard if it encountered any, even momentary error while running. People did not appreciate losing their unsaved work in progress.

    Clone makers won business by doing several things:

    * They offered BIOS settings to skip the memory test at power-on. People loved the time savings.

    * They offered computers without parity memory at a lower price. People loved the lower price.

    The trade press was complicit. It made fusses about all sorts of things, but their editorials did not educate users about the risks of skipping tests and not having parity checking. Furthermore, their reviews of these clones did not downgrade them for lacking parity memory or offering an option to skip the power on test.

    During those years, IBM strategically shifted from telling customers what is good for them to being "Market Driven." That meant giving customers what they want. Even business and health care customers voted with their pocketbook that they did not value valid results. The same IBM executives who said it was a bad decision technically, said it was the right market driven business decision to drop parity memory from desktops and laptops. I heard this directly from executives of both the PC and the memory chip divisions sitting together addressing an internal IBM audience.

    1. Dave 126 Silver badge

      Re: Don't blame Intel. Blame the customers who didn't value valid results.

      Given the lower quantity of RAM fitted to OCs during the period you describe, and the larger process size (bigger transistors are harder to flip) such RAM was built on, is the issue corrected RAM the same then as it is now?

  23. IkerDeEchaniz

    DDR5 with ECC included

    I thought DDR5 will have ECC included so intel won't be able to differentiate xeon CPUs based on that.

    Example:

    https://www.overclock3d.net/news/memory/ecc_ecc_for_everyone_sk_hynix_spills_the_beans_on_its_ddr5_dram_tech/1

    https://www.rambus.com/blogs/get-ready-for-ddr5-dimm-chipsets/

  24. sitta_europea Silver badge

    Haven't used Intel for years (except in my second-hand laptop) for precisely this reason.

  25. six_tymes

    just when I thought he couldn't get any crazier. sorry to all you followers, he is bloody crazy.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like