Obviously, Backblaze has yet needed to recover any data from an SSD. It's nearly impossible as I learned the hard way.
Backblaze thinks SSDs are more reliable than hard drives
Cloudy backup and storage provider Backblaze has found that flash SSDs are more reliable than hard drives, at least as far as the boot drives deployed in its datacenters go, but cautions this could change in future as the SSDs age. These findings come from Backblaze's latest report detailing reliability statistics on the …
COMMENTS
-
-
Tuesday 13th September 2022 14:17 GMT phuzz
If you're at the stage of trying to recover data from dead media, then several other things have gone wrong already. Such as RAID and backups.
(Although, as these are boot drives, I wouldn't be surprised if they're not backed up per se, but instead Backblaze have a procedure for rebuilding on a fresh drive, as if it was a brand new server)
-
-
Wednesday 14th September 2022 12:22 GMT ChrisC
Not sure it's valid even for consumer usage - if a spinny-rust drive has failed to the point where you can't see it *at all*, then the costs involved in recovering any data from said drive are likely to be beyond the level at which the average consumer will just give themselves a metaphorical kick up the arse for not already having backed up the lost data, and take it as a lesson learned for next time.
And IMO, the average consumer isn't likely to be aware enough of what the signs of impending drive failure can be, or to have their system configured to proactively monitor the SMART parameters for them, which means the first time they realise their drive has a problem is most likely going to be when it suddenly ceases to exist as far as their OS is concerned.
This is all assuming the drive is even in a position to be trying to give any sort of warnings that it's on its last legs - the last drive of mine that failed did so completely out of the blue due to a component failure on the controller PCB which, although it was probably degrading over time, wasn't able to be detected by SMART or any manual observation of drive characteristics (spin up time, noise etc.), and only became apparent when, on the next power cycle, it went "phut" and rendered the drive completely inoperable.
-
Wednesday 14th September 2022 13:37 GMT Jou (Mxyzptlk)
Windows 7 does give that warning. Seen it myself on an SSD and HDD in a laptop which was just about to die (the drives, not the laptop). I got a call from my brother and niece about the warning popup of Windows, and my reaction was: "Total red alert, copy everything you need on USB disk right about now, don't reboot, just plug in and copy!" Later we created image backups too, with the statistic of about 450 unreadable sectors of the system SSD. They had that message on screen for a few weeks. SSD was OCZ, HDD don't remember, but dead too.
Was migrated to one 1TB SSD, and the system survived (and upgraded to Win10 right away too).
I doubt that MS removed that warning on later Windows versions. I think I saw that warning on Windows XP too, but that is too far away to be sure.
-
Thursday 15th September 2022 14:22 GMT ChrisC
Interesting, I've never seen Windows natively give me any sort of pre-emptive warnings about drive health, but I may be forgetting just how long ago it's been since the last gradual failure I had where SMART monitoring would have been of use - it's therefore entirely possible that 7 and later do include this natively and I've simply yet to experience it due to the only drive failure I've had since switching to 7 being the one I mentioned earlier where it went from fully functional to brick in the space of a single system power cycle.
-
Thursday 15th September 2022 16:04 GMT Jou (Mxyzptlk)
Well, my personal opinion is that Windows could warn earlier, but that would require keeping a statistic to notice when specific numbers suddenly start to rise to warn BEFORE the drive itself goes "Pre-Fail".
But I do that type on monitoring in a combination with powershell and smartmontools. Though pure powershell would be enough for most out there.
-
-
-
-
-
-
Wednesday 14th September 2022 09:25 GMT Piro
Re: Maybe
Ouch, that's a hell of a take.
The vast majority of computing wouldn't be possible. There are few systems that allow the running of memory in a mirrored configuration, and fewer still that allow entire systems to run in lockstep across separate hardware (think VMware FT).
I agree that important data should regularly be copied to extra locations, not part of the original system, but to say all unique data in time and space is worthless is quite the thing to suggest.
-
-
-
Friday 30th September 2022 14:32 GMT VicMortimer
Backblaze figured out many years ago that there is ZERO advantage to using 'enterprise' hard drives, because they're no more reliable than 'consumer desktop' drives. And of course the 'enterprise' drives are a lot more expensive.
That's how their drive report got started. They did the testing, realized that they were being scammed by drive manufacturers, and published their reliability numbers because they thought it would be nice to let everybody else know just how much of a scam 'enterprise' hard drives actually are.
-
-
Wednesday 14th September 2022 04:12 GMT doublelayer
They're almost certainly not bothered about recovery. Their business is storing lots of data, so I'm sure they're aware that getting to the point of having to recover from failed hardware means they'd have completely failed the customer. It's not that useful even in personal usage, as there are a lot of failure modes where recovery from anything, mechanical or SSD, isn't possible. By all means recover if it looks possible and could help, but never count on having that option.
-
-
-
-
Tuesday 13th September 2022 21:08 GMT Richard 12
Re: SSD is still a physical storage device
The published specifications are typical bathtub curve, with the "high failure rate" occurring at around 50-100 years old in a write-rarely read-continuously situation that is a typical server boot drive.
It's nice to see that the specification isn't wildly wrong.
Storing logs on the drive will greatly shorten the lifetime, how much will depend on how the OS and drive firmware handles appending.
We had one batch of drives with a firmware fault that killed them within a few weeks of taking logs, as they did a full block erase-write for every tiny log line...
-
-
-
Tuesday 13th September 2022 14:51 GMT elwe
I doubt very many of these are hypervisors. When you run a scale out service you tend to use many whole boxes as service nodes, with no need for a hypervisor on most boxes. Typically you do have a very small number of hypervisors to run VMs for DNS, LDAP directories, sysog servers and other ancillary services.
-
Tuesday 13th September 2022 17:19 GMT J. Cook
Not quite.
Unless you've configured your ESXi hosts to write log files somewhere else, by default it'll write to the local datastore. If you've been booting your hosts off SD cards, your log files get written to a ramdrive due to the failure rate of SD cards that get lots of writes. (Ask anyone who runs a hand full of Raspberry Pis about media failure...)
VMWare, at one point, declared that installing and booting off SD cards would no longer be allowed for ESXi 7, which was walked back after a lot of outrage. It's still planed for a 'future version', probably 8 or a dot release of 8. (which is why I was rather annoyed that the new servers we just bought did not have any local drives on them...)
-
Wednesday 14th September 2022 03:46 GMT DougMac
Re: Not quite.
Tried doing SD card boot for VMware.
I had to replace the cards about ever 4-6 months.
Gave up and spec'd out systems with local SSD boot after that.
Granted, other brands of server hardware seem to do better than others, but it was of no surprise to me that VMware unspec'd SD card boot off the HCL.
-
Wednesday 14th September 2022 10:13 GMT 43300
Re: Not quite.
Our previous set of servers had SD cards as boot media when new - each one had a pair for alleged redundancy. Trouble was that the failure reporting didn't work, so the first we knew was when we started getting issues first with patching, then with not booting at all.
Fortunately the front bays were wired in, so after a few years I just bought some SSDs to use as the boot drives.No further problems after that, and I would never consider using SD cards again - either standard SSDs (as in the current servers), or BOSS cards.
-
-
-
-
-
Wednesday 14th September 2022 12:30 GMT ChrisC
Re: These guys deserve a Nobel for "Science you can use"
And lovely to see a commercial entity releasing data like this for public use, rather than keeping it all firmly under lock and key for internal consumption only. Makes you stop and ponder just how much data (for all manner of things) is being gathered around the world and which could be of use in some way to someone other than the gatherer, but which never gets seen by anyone inside the gathering organisation...
-
Wednesday 14th September 2022 16:16 GMT ChrisC
Re: These guys deserve a Nobel for "Science you can use"
Or even *outside* the organisation, doh...
Though given how much data is captured as a matter of routine these days, it wouldn't surprise me in the slightest if my earlier typo isn't a million miles from the truth in many cases as well.
-
-
-
Tuesday 13th September 2022 17:33 GMT Ball boy
The choice isn't really about reliability
No one should ever rely on a single disk for anything. Given that, then the reliability (providing it is within a reasonable percentage of the previous tech) isn't that critical - but speed is: SSD is simply way, way faster than platters.
However, if an SSD does fail for any reason (be it the memory array or just in the control logic) I very much doubt you stand much chance of getting any data back. At least with a platter there's a fighting chance some of the information can be recovered - that is, assuming you forgot rule 1: never have a single point of failure unless you're planning to fail.
-
Tuesday 13th September 2022 21:32 GMT Richard 12
Re: The choice isn't really about reliability
Missed the point.
Reliability matters, but not for the data (it's a boot drive, the data is a clone of a million others). It matters for the downtime.
Losing the boot drive takes out the server for a few hours, and consumes the valuable in-person time of a technician to go and physically swap it out.
-
Wednesday 14th September 2022 04:23 GMT doublelayer
Re: The choice isn't really about reliability
Depending on what you're doing with it, reliability can become more important and speed isn't always critical. Losing a disk doesn't just mean losing the data on it; I'm sure anyone running a storage business is aware of that and has redundancy. It means the cost in time and money to allocate a replacement disk and add that to the array holding the data. It means an eventual call to a technician to remove the failed hardware and replace it with fresh devices. It means buying replacements faster. There are reasons people care about that.
You don't always need speed, either. In my personal machines, the boot disk is always an SSD because speed is very important there. In my storage server, it's mechanical drives. I can deal with it taking a couple more milliseconds to fetch a file I've moved over there, and if I couldn't, I wouldn't be using a network link anyway. This allows me to have more storage in it than I could afford if it was an all-SSD setup (when I was buying disks, SSDs were running about 4-5 times as expensive per terabyte than HDDs). Although my primary consideration was financial cost, I'd definitely consider reliability more than speed.
-
-
Tuesday 13th September 2022 18:00 GMT Jou (Mxyzptlk)
I had only ONE SSD failing on me.
And that one was *drumroll* OCZ Vertex 3 240 GB. They died like flies at the end of their time, and luckily I noticed it early enough.
No other SSD failure so far, mostly Samsung top-line (I think six of them?) some WD-blue (four of them, but only two in active use. WD = Sandisk = their own flash memory). So indeed, the reliability is so good that they get too small instead of fail.
-
Tuesday 13th September 2022 19:16 GMT Pirate Dave
Well, maybe
Just one tiny anecdotal point - last week, our 4 year old Compellent/Dell SCv3020 finally blew its first drive. One of the SSDs. There are 10 SSDs and 20 spinning rustbuckets, all 30 of which were installed at the same time, and if memory serves, were relatively close in manufacturing date.
Not a datacenter full of disks, and no where near a significant sample size, but enough to make me chuckle at the irony of Backblaze's statement. As always, YMMV.
-
Wednesday 14th September 2022 15:56 GMT phuzz
Re: Well, maybe
Might be worth ordering some spares when you get the replacement drive. One thing I've noticed with SSDs is that all the drives in a single batch tend to fail quite close to each other.
I've always found harddrive to be a bit more random in their failures, two harddrives with consecutive serial numbers might fail years apart, but with SSDs they're often only months apart.
-
-
Tuesday 13th September 2022 19:36 GMT Henry Wertz 1
Read-only workload
I would expect SSDs to be more reliable especially in this use as a "nearly read-only medium -- (probably?) no swap enabled, minimum writes other than when the software is first insalled, and during software updates. Since these systems just boot off this disk and all the main work is done by (spinning rust) storage drives, I honestly assumed SSDs have near 0 failure rate other than failure due to excessive writes. Good to see though!
Personally, I had 2 HP SSDs die -- both a $20 trashy "controllerless" 240GB SSD. The first one, I was running some tasks that needed about 24GB of RAM on a 16GB system, used the SSD as swap -- it croaked in a month or two. Disappointed but not surprised. The second one, I used as a regular data drive -- the piece of junk STILL failed in under 6 months. DO NOT BUY CONTROLLERLESS SSDs! And HP, shame for selling them! I still run mainly spinning rust, but must admit I have not had any other SSD failures, or failures of the ~24GB MMC they stuck in a few netbook-type systems (on the system my mom has wit that, though, only the Ubuntu / is on the 24GB, the /home and swap are on a 750GB HDD -- works a treat, software loads fast off the MMC, since Ubuntu is small it still has like 12GB free on it; and the big stuff goes on the hard disk which also has plenty of free space.)
-
Tuesday 13th September 2022 20:02 GMT Jou (Mxyzptlk)
Re: Read-only workload
12 GB for the OS feels like A LOT in Linux terms! When did Linux get so bloated? Server 2022 OS is about 14 GB (normal non-core install including a small swap), and with updates it can go up to 17 GB until the next self-clean cycle - which is very little for Windows terms.
Soon they will trade place, but I doubt Windows will shrink, rather Ubuntu will grow.
(And yes, I don't dare to compare with bloaty MS-Client OS-es, even though Win10 needs way less than Win7/Win8/Win11).
-
-
-
Wednesday 14th September 2022 07:44 GMT Chz
Re: SSD Failure
It's my personal suspicion - based entirely on anecdotal data, I should add - that SSD failures are lower than HDD for the main part of their lifetimes, but the far edge of the bathtub curve looks different. Long-running HDDs do indeed keep going. I have one that's 15 years old. No essential data, I use it as temp space. I have a suspicion that the 15 year survival rate for SSDs will be lower than for spinning rust. Pure gut instinct on that though, as Enterprise flash is at most 10 years old and I don't think they ever sold enough of the consumer drives before then to make a decent data point.
-
Wednesday 14th September 2022 13:48 GMT Jou (Mxyzptlk)
Re: SSD Failure
Needs more detail: What SSDs? There is quite a difference which manufacturere you choose.
Ranking for consumer SSD quality for "since ever SSDs are there":
1. Samsung, using their own chips of course.
2. Crucial, which is the Micron consumer brand, using their own chips too.
3. WD, which bought up Sandisk, using their own chips too.
4. Well, it COULD by Hynix, if their website would finally work outside of UK/US/Korea/Japan. Using their own chips too.
5. Don't care for the rest :D - that Includes Intel.
The ranking for professional server SSDs is different: "Don't care how much of them die, they have support for X years". However, I've not seen one dying yet, including the famous Intel-32767-hours bug drives.
-
Wednesday 14th September 2022 15:47 GMT J. Cook
Re: SSD Failure
The ranking for professional server SSDs is different: "Don't care how much of them die, they have support for X years".
Correct. For server and enterprise the rule is also 'one is none, two is one, three or more is better'.
Granted, I have seen cases where the controller falls over and takes the data on the drives with it as it goes down. (Previous boss liked to spin a tale of an EMC firmware update that proceeded to corrupt the data on the entire appliance, which is why he was very, very suspicious of firmware updates on storage appliances. I don't blame him, but the only bits of EMC storage we've ever had in house was a data domain, and that's been shrink wrapped until the data on it passes retention in a couple more years...
-
Thursday 15th September 2022 01:56 GMT doublelayer
Re: SSD Failure
The full dataset includes not only manufacturers but specific models. It can allow you to find out in great detail what would have been the best disks to have bought five years ago, and although the SSD data is not as expansive, it should eventually provide similar levels of data for that. This is only slightly useful in determining what disks to buy now, unfortunately.
-
-
-
Saturday 17th September 2022 18:42 GMT Richard Pennington 1
Sample size of 1 ...
I am typing this on a vintage Apple machine (iMac 15,1 from 2014) which features a Fusion drive. This means that its main internal HD is actually a 1TB spinning-disk HD combined with a 120GB SSD.
About March this year, this machine suffered repeated crashes (at intervals of 1-2 days). Long phone calls to Apple support later, the solution was: [1] all user data was backed up to Dropbox (which I was using anyway in the background to get my various machines to talk to each other); [2] delete the internal HD completely, and reinstall the system from Apple's Internet recovery system; [3] manually recover about 5 days' emails which got lost in the middle but which were captured on other machines.
It turned out that the SSD had failed completely, but the spinning-disc HD was - and is - still in full working order. The machine now runs on just the spinning-disc HD. It has now gone over 80 days (and 3 system updates) since its last crash.