That seems rather specific. Is there some sort of self-destruct code in the firmware they forgot to remove?
HPE fixes another SAS SSD death bug: This time, drives will conk out after 40,000 hours of operation
HPE has told customers that four kinds of solid-state drives (SSDs) in its servers and storage systems may experience failure and data loss at 40,000 hours, or 4.5 years, of operation. The IT titan said in a bulletin this month the “issue is not unique to HPE and potentially affects all customers that purchased these drives.” …
COMMENTS
-
-
-
-
Thursday 26th March 2020 12:30 GMT Version 1.0
For home users, if you are going to buy a NAS, then buy a pair of them and swap the disks to get non-sequential serial numbers to each NAS.
You can then get a safe network backup by making the second NAS invisible on the network and use rsync to pull an incremental copy of all the data from the visible NAS every night.
-
Friday 27th March 2020 18:44 GMT Stoneshop
For home users, if you are going to buy a NAS,
Buy a bare unit, and buy the disks from two sellers. Preferably multiple brands per order as well if the NAS doesn't object (it shouldn't). At the very least split the order between the NAS plus half the disks, and the other half a week or two later if you want to stick with one seller.
-
-
Thursday 26th March 2020 13:17 GMT hoola
If you are using enterprise hardware then you get what you get. There is no option to specify the manufacturer of the actual disk in the carrier. In fact they are usually all the same. If you are really concerned then you do a rolling replacement at various points in the life of the system. Though hou you do that with 1000 disks in a storage solution is interesting.
As ever, what you do with a generic box, array control and a handful of disks is not the same at enterprise scale.
-
Thursday 26th March 2020 13:55 GMT Alan Brown
"I quickly discovered that you should not buy a bunch of disks from the same vendor, always use different batches "
Something that seems "lost" on vendors pushing arrays into enterprise. It's extremely difficult to convince 'em to supply anything other than a boxful of identical HDDs with sequential serial numbers to drop into the chassis. Bad enough when it's a small array, a potential nightmare scenario if you're up servers with 60+ drives in 'em. (Been there done that - looking at you HP)
-
-
-
Thursday 26th March 2020 08:21 GMT gnasher729
It's a bug where a huge circular buffer is used, which needs to start back at the beginning of the buffer after 40,000 hours of operation, and the code checking for the condition is wrong by one. Since they are telling people now about this, which means they will update the firmware or be responsible for it, I'd assume this is entirely bad luck.
-
-
Thursday 26th March 2020 16:23 GMT Wayland
32768 a number I remember well. It was the location of the first character on a Commodore P.E.T display.
POKE 32768,65
would put an A there
It's actually 2^16 so the 16th address line on a 6502
65 being the ASCII code for A
It sounds like there must be some sort of clock inside these SSDs because why else would it be counting hours. Also perhaps it's supposed to die when that number runs out but it was either supposed to be a bigger number or a slower count.
-
-
-
Thursday 26th March 2020 16:22 GMT Michael Wojcik
Re: just like the printer cartridges
Well, no, it isn't.
The inkjet cartridges are planned obsolescence, and they self-destruct on a programmed date, regardless of how much they've been used. The SSDs fail after doing a certain amount of (presumably useful) work, and if the comment above regarding a circular buffer is accurate, it's an actual mistake in the firmware (albeit one that should never have made it out the door).
Inkjet cartridges (and inkjet printers) are a scam. This is a stupid bug.
And, of course, HPE doesn't sell inkjet cartridges; that's HP Inc.
-
-
Wednesday 25th March 2020 22:13 GMT Anonymous Coward
Dell
Dell sent me an email with the service tag of an affected server I look after. Nice email with the tag in a url to the page to download the drivers, and the name of what you needed.
So good on them. But then again they sell servers with 5 years warranty, so would not them all failing in 4 years 200 days. But then I want not want all the drives to fail at the same time either.
-
-
Thursday 26th March 2020 14:00 GMT Alan Brown
Re: Planned obsolescence?
"The blunder seems to be that they give 5y warrant"
Few customers buy 5 year support. 3 years is standard from most vendors unless you push hard for more (and going to 5 years usually adds a 100% premium over 3 year support contracts)
Ergo most vendors aren't going to give a flying F* and warranty beyond that point is your problem
(worse - most drive makers sell OEM drives with _only_ the warranty provided by the vendor - so if you bought 3 year support on a system containing drives with a supposed 5 year warranty, then 3 year warranty is what you actually get)
-
Thursday 26th March 2020 10:47 GMT Anonymous Coward
Can someone explain why an SSD needs a clock.....
....and if it isn't a clock, then what is it counting? And why would 40,000 turn out to be a magic number?
*
If it's a matter of maintenance data being needed ("How old is this drive?", etc), then what's wrong with writing a date and time stamp record somewhere at install time?
-
Thursday 26th March 2020 13:06 GMT Anonymous Coward
HPE MTBF
"HPE has told customers that four kinds of solid-state drives (SSDs) in its servers and storage systems may experience failure and data loss at 40,000 hours, or 4.5 years, of operation."
Looks like a freaking unusually high MTBF for any HPE gear, those days !
Last Superdomes I've seen installed were done with HPE staff connected to them 24X7 for a couple of weeks before they ran by themselves ...