back to article Backblaze's geriatric hard drives kicked the bucket more in 2023

Cloud backup and storage provider Backblaze has published a report on hard drive failures for 2023, finding that rates increased during the year due to aging drives that it plans to upgrade. Backblaze, which focuses on cloud-based storage services, claims to have more than three exabytes of data storage under its management. …

  1. Jou (Mxyzptlk) Silver badge

    ST12000NE0007 vs ST12000NE0008

    My little server was upgraded to four ST12000NE0007, I think in 2018 or 2019, in RAID5 (actually Storage Spaces Parity). Now they are two ST12000NE0007 + ST12000NE0008, cause two of them failed within warranty time and got replaced by Seagate without problems, but no other problem surfaced so far.

    The funny part is: It somehow matches the Backblaze statistics for the -7 being less reliable.

    Being paranoid about watching and logging the SMART values I got a early heads-up before the actual failure of each drive. And they failed about one year apart. But sending the "this is how the SMART values developed over the last four weeks for ALL drives" helps getting it swapped without the slightest discussion.

  2. Lee D Silver badge

    Confirming, more than ever, that I've not trusting my data to anything that isn't Western Digital.

    1. Jou (Mxyzptlk) Silver badge

      Yeah, the WD 500 GB RE2 were the most reliable ones. That is why no one is using them any more. 'cause they are the best in servers. Didn't need seven times swapping within four weeks with three customers until I convinced Fujitsu to send Seagates instead.

      The 1.5 TB Seagate Server drives were the most reliable ones when that size was new. That is why no one is using them any more. 'cause the are the best in servers. Didn't need five times swapping within eight weeks with three customers until I convinced Fujitsu to send WD instead.

      (Repeat with all HDD manufacturers which existed or still exist - only those who survived a bad series are still left)

    2. ldo

      Re: not trusting my data

      If you are “trusting” your data to any storage medium, you’re doing it wrong.

      To paraphrase Ronald Reagan, “trust, but backup”.

      1. Lee D Silver badge

        Re: not trusting my data

        What I run active servers on is largely a matter of trust.

        This is not a question of backups, that's an entirely different and separate topic (given that you wouldn't generally put backups on the same medium as your active data for a start).

        But not having to service a working production server or array regularly means that you have to be able to trust that it's going go hold up well enough so that you're not there replacing 4% of the drives through failure (rather than choice) constantly.

        4% of the drives in even a small server setup each year is a drive every few weeks or so.

        1. ldo

          Re: not having to service a working production server

          That’s a question of “uptime” and “availability”, not so much about data integrity.

  3. ldo

    No Use Of SMART To Predict Failures

    I don’t think they bother using SMART to try to predict that a drive is going to fail. They simply keep using drives until they actually fail, and then replace them.

    For a company whose business is data integrity, is that the wisest course?

    Yes.

    1. Jou (Mxyzptlk) Silver badge

      Re: No Use Of SMART To Predict Failures

      Of course it is when you have a "pack of 10 disks where two may fail at any time, and mirror between those two packs".

      I don't know the exact Blackblaze style i.e. pack size, but with that or similar configs I would definitely wait until it fails too, or do a bulk-swap of failed and "soon fail" drives once a day.

      1. VicMortimer Silver badge

        Re: No Use Of SMART To Predict Failures

        They publish that info.

        https://www.backblaze.com/cloud-storage/resources/storage-pod

  4. Quando

    Do BackBlaze record power cycles for each drive? I suspect their drives go a long time between reboots, but it would be interesting to see if there is enough power cycling to note any correlation with failure rate.

    Another interesting value would be failure rate / TB.

    Maybe I should just grab the raw data and have a play - now where was that spare day I had lying around?

    1. Dvon of Edzore

      From previous blogs I understand Backblaze waits to replace drives in a storage pod until a certain number of drives in it have failed. The coding scheme used for their multiple drive redundancy permits normal operation with several drives failed in each 20 drive RAID set. That way the power cycling and other costs of pulling a pod from operation can be spread across multiple drive replacements at a time. Each drive in a set is in a separate pod, so if a whole pod goes down, whether from maintenance or system failure, it only takes out one drive from each set, and operation can continue using the remaining pods.

      If a set has more than a couple of drives in failure state, the data is migrated to a different set before enough fail to risk data integrity. This allows waiting to replace drives in their 45 or 60 drive pods until it is cost effective to do so. Their massive numbers of active drives, and the design to manage them, gives Backblaze additional flexibility most small organizations don't have.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like