back to article A moment of silence for all the drives that died in the making of this Backblaze report

Cloud storage and backup provider Backblaze has released a report on its hard drive failure rates for 2022 which appears to verify that the age of a drive is a key metric for predicting potential failure. Backblaze regularly releases stats covering the failure rates of all the hard drives under its management, and these …

  1. Yet Another Anonymous coward Silver badge

    1% failure rate

    Anyone stopped to think how amazing that actually is ?

    Given the speed these things are spinning at and the precision the heads need to hit a bit of data the fact that only 1% fail each year is incredible (haven't read the report to see if that includes drives that failed so early they should really have been caught by the manufacturer)

    1. In Like Flynt

      Re: 1% failure rate

      Sure, it's amazing... unless you're in that 1%. At which point the adjective changes from "amazing", to "fuuuuuuuuuuuuh..."

      1. John Robson Silver badge
        Devil

        Re: 1% failure rate

        Then it goes to the phase of starting to take backups...

    2. Timop

      Re: 1% failure rate

      This is easy if you have to manufacture only single golden sample.

      Try adding variations from part tolerances and assembly process etc. And produce truckloads of drives weekly. Or maybe even daily.

    3. Potemkine! Silver badge

      Re: 1% failure rate

      I'm not sure this is that amazing.

      First, those HDDs aren't that old, 5 years at the maximum. Next, for the mechanical part the physics involved are known enough to be able to make a robust design. The parameter there would be rather the cost, being very reliable is more expensive. The less reliable part is IMHO the electronic part of the HDD. AFAIK, reliability of electronic components is not as much studied than their mechanical counterparts.

      1. Anonymous Coward
        Anonymous Coward

        Re: 1% failure rate

        Which part of all the drives that are 10 TB or smaller (bar one model that is 4 3/4 yrs old) already being over 5 yrs old means "those HDDs aren't that old, 5 years at the maximum"? Is 7.5 now magically less than 5?

        I got asked a question 10 yrs ago, "what are the 2 types of hard drives?" - the answer wasn't FC/SATA or SAS/NL-SAS, it was drives that have failed, and drives that haven't failed YET.

        Mechanical devices fail - and given the cost of hard drives, and the precision that they work at, it is incredibly amazing that the failure rate isn't higher.

        Have you ever bought a car, a washing machine, dishwasher, etc, etc?

      2. Version 1.0 Silver badge
        Happy

        Re: 1% failure rate

        I've got a Windows XP system running with a 10Mb drive, it's still fine after about 24 years now, And there's another 10Mb drive that's still running fine after about 40 years but it hasn't been running full time, I only put the RL02 disks into the drive when I need to access data from back then.

        And about 35 years ago I was working with a DEC customer of ours who was having problems with her system failing to boot up on a hard drive that I had installed many years earlier, yes it was "dead" - the disk wasn't spinning. But we talked about it and the next day she'd got it up and running to copy the data onto a new drive ... she solved the problem so quickly. She had unscrewed the top of the hard disk cover off and given her disk a little snip with her finger to get it running - it started spinning and they got everything fixed that day.

        1. Lost Neutrino

          Re: 1% failure rate

          Sounds like the perfect definition for "robust". I wish modern cars were still like that: open hood, whack engine with a hammer, et voilà - vrooom!

        2. DuncanLarge

          Re: 1% failure rate

          How the hell did you fit XP into 10 megabits?

          1. systemBuilder22

            Re: 1% failure rate

            I am pretty sure they screwed up their message and it's not a 10MB drive. Typical 1999 drives were 1GB. I started using the IBM PC XT in 1983 with a 10MB drive. Nobody would be using that 5" drive 17 years later, let alone 40 years later.

  2. An_Old_Dog Silver badge

    I'd like to see a report like this for flash drives and SSDs

    The good thing about hard drives is they usually give advance clues they're about to fail (check your S.M.A.R.T. logs). The bad thing about flash drives and SSDs is, In my experience, they simply unrecoverably fail once-and-forever, but I'd like to see some volume stats on this.

    1. alain williams Silver badge

      Re: I'd like to see a report like this for flash drives and SSDs

      Backblaze do provide ssd-drive-stats.

      1. An_Old_Dog Silver badge

        Re: I'd like to see a report like this for flash drives and SSDs

        Their SSD report shows the number of failures, etc., but not whether or not there were pre-failure indications, which is what I wanted to know about.

    2. Anonymous Coward
      Anonymous Coward

      They(Backblaze) have done them before

      But they broke them out into a separate list.

      SSD's are tricky because they are so usage dependent though. But there is also a real problem with SSDs keeling over well before their write endurance limit kills them. There were a ton of mundane controller failures out there a few years ago.

      1. DS999 Silver badge

        Re: They(Backblaze) have done them before

        Controller failure is a problem for HDDs as well - that's usually what causes those early failures in the first few months of operation.

        It would be interesting if they tracked S.M.A.R.T. stats for remaining life for their SSDs along with failure rates. Storage Review did a long term test a few years ago running SSDs flat out for as long as it took for them to fail. Some exceeded their write life by as much as 3x.

        EDIT: I read the article after posting this and it looks like they are doing exactly what I wished for above so it'll be interesting to check back in a few years.

    3. Sampler

      Re: I'd like to see a report like this for flash drives and SSDs

      In an environment like this though, unrecoverably fail isn't really an issue as the data will always exist in multiple copies elsewhere, it's only really an issue for the home users not taught better and badly managed infrastructure that should know better.

  3. Andy Non Silver badge

    At least spinning rust hard drives seem more stable nowadays

    I remember back in the 80's. There was a tendency for brand new hard drives to fail within the first week or two of use. If they survived beyond that period they tended to last for several years.

    1. JoeCool Silver badge

      Re: At least spinning rust hard drives seem more stable nowadays

      That's the "bathtub curve" comment

      1. Andy Non Silver badge

        Re: At least spinning rust hard drives seem more stable nowadays

        Interesting. Not heard the phrase before, but it is quite apt.

        https://en.wikipedia.org/wiki/Bathtub_curve

    2. An_Old_Dog Silver badge

      1980s Hard Drives

      Back then, we would run a testing program overnight on the drives going into the computers we were building. We caught many bad blocks not on the manufacturer's defect list (less than a dozen per drive), and a few outright drive failures, but by-and-large, they passed, and the computers they went into were not brought back to the shop for repair.

      We also had a 100% component testing policy, and I caught a shipment of "Hexa" brand multifunction I/O boards from China in which ALL twenty were defective!

      1. mattaw2001

        Re: 1980s Hard Drives

        Hands up all of us who remember the handwritten table of bad blocks on the hard disk!

        1. Anonymous Coward
          Anonymous Coward

          Re: 1980s Hard Drives

          Wasn't there a DOS debug command you had to type to setup the drive and type that info in?

          1. An_Old_Dog Silver badge
            Windows

            Re: 1980s Hard Drives

            From the Debug prompt ("-") ...

            for Western Digital controllers, g=c800:5 [Enter]

            for Data Technology Corporation controllers, g=c800:5 [Enter]

            for Adaptec controllers, g=c8000:ccc [Enter]

            for OMTI controllers ... I've forgotten.

      2. GraXXoR

        Re: 1980s Hard Drives

        We used to call them soak tests, IIRC.

        All our machines were run for 24 hours using an automated test suite that marked off all the bad sectors found on the disks and also was good at picking up failures in the SIMMs or CRT.

      3. simonlb Silver badge

        Re: 1980s Hard Drives

        Not 80's, but early 90's: Samsung SHD3062A 120Mb HDD's suffered from degrading platters so that after less than a year there would be more bad blocks than good blocks.

  4. alain williams Silver badge

    How busy are the devices ?

    Do we assume that all disks are as busy as the rest of them ? I would have thought that the ones doing more work might fail earlier. I cannot see some sort of I/O count.

    They exclude boot devices as presumably they are not that busy.

    1. DS999 Silver badge

      Re: How busy are the devices ?

      I never observed much difference in life between drives that were active 24x7 in DBs and drives that had a more sedate life as lower utilization file servers. There may be some slight correlation there but it wouldn't be worth them selecting for.

      The reason they pull out boot drives could be something else, like choosing a smaller size for those or maybe they recycle veteran drives that were used for the main file store and don't want the physical handling required to count against them.

      1. Richard 12 Silver badge

        Re: How busy are the devices ?

        Boot drives have a very different usage pattern to the data drives.

        Depending on the OS and setup, a boot drive might be read just the once during boot and then spun down until the next update, spinning all the time but almost entirely read-only, or have near-continuous writes to log files.

        They also tend to be much smaller.

        Either way, it makes sense to monitor the two separately. Might be clues as to drive types which fare better for each type of workload.

        1. JMartin

          Re: How busy are the devices ?

          > Boot drives have a very different usage pattern to the data drives.

          From the link to ssd-drive-stats:

          “Boot drives in our environment do much more than boot the storage servers: they also store log files and temporary files produced by the storage server. Each day a boot drive will read, write, and delete files depending on the activity of the storage server itself. In our early storage servers, we used HDDs exclusively for boot drives. We began using SSDs in this capacity in Q4 2018. Since that time, all new storage servers, and any with failed HDD boot drives, have had SSDs installed.”

    2. Happy_Jack

      Re: How busy are the devices ?

      I'd be surprised if many organisations still bought spinning rust drives for boot disks. This is an obvious place to use SSDs given the smaller capacities needed.

    3. Timop

      Re: How busy are the devices ?

      Vibrations might be large risk and it is easy to simulate resonant frequencies nowadays and make sure that nothing starts resonating while full rpm.

      Speed changes etc might be the trickier thing. Because you need to set the component frequencies somewhere and definetly not to full rpm frequencies. So there might be some edge cases that wreak havoc inside the drive.

    4. Kevin McMurtrie Silver badge

      Re: How busy are the devices ?

      Less work can mean more failures. Let us not forget the IBM Deskstar 75GXP. There was a limited number of times that the head could fly over a track before the media flaked off. Idle drives had rapid data loss then head crashes in 1 to 2 years

      Less crappy drives seem to spend their idle time scanning for weak blocks to remap. It keeps the head moving and moves data before the error correction bits run out.

      Needless to say, I read the Backblaze reports before buying now.

  5. This post has been deleted by its author

  6. Graham Cobb

    Thanks to Backblaze

    I just wanted to say thanks to Backblaze for doing this and publishing it. I read the reports each time they come out, and I am sure the manufacturers do as well.

    Good on them!

  7. Anonymous Coward
    Anonymous Coward

    It's all relative.

    A chart showing the average age of each drive model deployed by Backblaze against size shows clearly that the smaller a drive is in capacity, the older it tends to be

    We have Seagate drives that are still running that are over 22 years old and run 24/7. To be honest I'm not impressed with those reliability figures for the 16TB drives.

    1. DJV Silver badge

      Re: It's all relative.

      Having been in IT in one form or another for nearly 40 years, I now refuse to buy Seagate drives due to the number of failures I or the companies I have worked for have experienced.

      1. Anonymous Coward
        Anonymous Coward

        Re: It's all relative.

        With all the buyouts over the last few years there are now effectively two HDD suppliers.

        Pick your poison, Seagate or WD.

        Personally, I've had major issues with WD drives as well.

        1. eldel

          Re: It's all relative.

          Yeah, I've been a Hitachi loyalist for HDDs for years and was really smug when the Backblaze stats seemed to validate that bias. Now that they're owned by WD though I expect them to reduce the QC and thus fail more often.

          I did the last refresh 2 years ago so I'm hopeful that by the time they are ready for replacement there will be viable SDD options (home NAS).

          1. systemBuilder22

            Re: It's all relative.

            Hitachi bought the IBM drive division, the inventor of winchester sealed disk drives. So it's no wonder they are head-and-shoulders more reliable than anybody else. They invented the tech and have been building it longer than anybody.

  8. John Geek

    re SMART data, different vendors use it quite differently. Look at a Seagate drive, and you'll see realistic soft error stats, look at a WD drive and soft errors will be ZERO until the drive dies. this makes doing any predictive analysis via SMART statistics quite meaningless

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like