back to article Why are enterprises being irresistibly drawn towards SSDs?

SSDs have been the subject of hype, hype and more hype. Any of us who have used them in our personal computers know the benefits SSDs bring, but personal experiences and hype alone don't explain the robustness of enterprise flash adoption. Flash has a lot of naysayers. Any article on The Register about flash inevitably invites …

  1. Borg.King

    Change in Flash technology to eliminate finite write lifetime?

    I would imagine that eliminating the lifetime contraints of a flash cell has to be one of the goals current and future manufacturers are striving for. The recent Aluminium / Graphite rechargeable battery work gives me some hope that a similar approach might be feasible for long term storage solutions.

    Of course, once your storage device has a long life time, the number of replacement units you sell is going to decline and reduce your income.

    1. Trevor_Pott Gold badge

      Re: Change in Flash technology to eliminate finite write lifetime?

      A) The Al/C battery work doesn't port to silicon chips. It is unlikely we will ever see flash chips without write limits.

      B) You'll sell just as many new units even if your units last forever because our demand for data is insatiable. What use is a SATA 32GB SSD today, excepting in some very niche applications? Hell, what use is a 120GB? Would you buy a 240GB for your notebook?

      Flash write lives aren't being artificially suppressed. It's just physics.

      1. frank ly
        Happy

        @Trevor_Pott Re: Change in Flash technology to eliminate finite write lifetime?

        "What use is a SATA 32GB SSD today ....."

        As the root, /home, swap and /{data} partitions of my desktop computer. If I want 'big data' I just push a 1TB spinning rust or 128GB SSD into one of the front panel slots (or use a USB-3 adapter cable to connect a big SSD to my laptop). I don't anticipate this situation changing for me for many years and I'm sure most people at home (a big market) would be able to do the same.

        1. Archaon

          Re: @Trevor_Pott Change in Flash technology to eliminate finite write lifetime?

          "What use is a SATA 32GB SSD today ....."

          Even taking frank ly's *Nix example as a given, I've got a machine with a pair of 30GB (not even 32GB) SSDs in RAID 1 which runs Server 2012 R2 Standard quite happily. Believe it typically sits at around 11GB free.

          1. Trevor_Pott Gold badge

            Re: @Trevor_Pott Change in Flash technology to eliminate finite write lifetime?

            "Even taking frank ly's *Nix example as a given, I've got a machine with a pair of 30GB (not even 32GB) SSDs in RAID 1 which runs Server 2012 R2 Standard quite happily. Believe it typically sits at around 11GB free."

            And I've got OS-only installs of Server 2012 R2 Standard that eat the better part of 80GB.

            You *might* be able to convince me if you tried to make a case for 32GB SSDs as an ESXi disk, except that's probably useless since there are USB keys that are better fits for that job, and just plug directly onto the motherboard (or into the SATA plug).

            Dragging along 32GB SSDs is an exercise more in being spectacularly cheap than anything else. I get it - I am an SMB sysadmin, we have to do this all the time . But the hassle of migrating components from system to system as everything else dies (or the system isn't worth the electricity it consumes) gets old fast.

            A dirt cheap thumb drive solves the problem of a place to put a hypervisor, and the ancient SSD from the beforetimes isn't going to help me run my datacenter. It might be useful to the poorest of the poor consumers, or people in some extreme niches, but as a general rule storage devices aren't much use to general market past about 5, maybe 6 years. After that they're just too small.

            A great example is the 1TB magnetic disk. I have an unlimited number of these things. I can't and won't use them. It costs me more to power up storage devices to run those drives for the next three years than it would to just go get 4TB drives. To say nothing of space, cooling, OPEX, etc.

            Even if all our storage devices lasted forever, they would eventually stop being used. Just like my Zip drive. Just like my Blu-ray. Newer devices hold more, and they are less of a pain in the ASCII to use.

            1. Archaon

              Re: @Trevor_Pott Change in Flash technology to eliminate finite write lifetime?

              "And I've got OS-only installs of Server 2012 R2 Standard that eat the better part of 80GB. etc"

              There's shades of grey (not 50), not just black and white, you know? The question, as always, is what is correct for the customer and their environment. 30GB SSDs are not wrong. 80GB SSDs are not wrong. USB drives are not wrong. It depends.

              I understand that just because I have Server 2012 R2 running at 19GB on a 30GB RAID 1 set does not mean everyone can or wants to do that. If you need 80GB, then you need 80GB, it's that simple. You should also understand the reverse is true, that if you only need 30GB there's no point paying for 80GB or 120GB, especially if the disks are going to go in the bin (or the dark corner of a cupboard) once the server is decommissioned. Call it 'spectacularly cheap' if you like, because yeah, in this instance that was the point.

              What I suspect you really mean is that relative to the cost of buying a slightly larger disk it's not worth the hassle to watch over machines with small system disks and maintain them. After all the time, effort and wage/consultancy costs of IT staff are a finite resource just like anything else, and I'm sure there's better things to spend on than worrying about whether a cached Windows Update has taken one of your systems over the edge. And that's a completely understandable and completely correct/justified point of view in many organisations.

              Many (not all) customers prefer redundant (i.e. RAID 1) boot drives. That kicks out USB drives but you'd get away with a RAIDed dual SD card module. Of course a USB key is fine if the customer doesn't mind.

              As I said, it all depends on the customer and the environment. The point is to take a step back, look at it objectively and not be blinkered into a default position.

              1. Trevor_Pott Gold badge

                Re: @Trevor_Pott Change in Flash technology to eliminate finite write lifetime?

                No, Archaon, in your non-objective, blinkered position on things you've missed the whole thrust of my argument: namely that there is no value - except in some very niche situations, including outright poverty - in recovering 10 year old drives from systems and reusing them, even if their lifespan was infinite.

                Just because you can take a 32GB SSD out of some ancient system and reuse it in a newer one (with a whole metric ****pile of TLC and babying) doesn't mean it's sane, rational, profitable or otherwise a good idea. It's also something that the majority of individuals or businesses will do.

                You, personally, may do it. That doesn't make it a good plan> It doesn't make it what the majority will do, would do, or even should do. And that, right there, is the whole damned point. Which you seem to be unable to grok.

                1. Archaon

                  Re: @Trevor_Pott Change in Flash technology to eliminate finite write lifetime?

                  "No, Archaon, in your non-objective, blinkered position on things you've missed the whole thrust of my argument: namely that there is no value - except in some very niche situations, including outright poverty - in recovering 10 year old drives from systems and reusing them, even if their lifespan was infinite."

                  What you don't seem to "grok" from your own blinkered position is that I pretty much agreed with you. Just making the point that there's alternatives. As I said, shades of grey. Not just Mr Pott's opinion and wrong.

                  1. Trevor_Pott Gold badge

                    @Archaon

                    I fairly explicitly stated in my original comment that there were other possible alternatives. I also made it pretty clear that they were niche and not very relevant. You then decided that the alternatives had to be spelled out and attempted to make them seem relevant.

                    That was not only pointless, you did not succeed in making them seem relevant at all. Which has now become the point of this thread.

                    If I said "for the purposes of creating the circle used in $company logo, the value of pi used was 3.14159{irrelevant additional numbers}" you'd be the guy not only explaining "pi is more than 3.14159, and in sometimes it matters that you use {long string of numbers}". And you'd be explaining that to the guy who owns http://www.tastypi.com.

                    Rock on!

                    1. Anonymous Coward
                      Anonymous Coward

                      Re: @Archaon

                      On Trevor's articles, the only pedantry allowed is Trevor's. Anyone else's pedantry will be mocked until either you give up or become so enraged that you lose objectivity.

                      1. Archaon

                        Re: @Archaon

                        "On Trevor's articles, the only pedantry allowed is Trevor's. Anyone else's pedantry will be mocked until either you give up or become so enraged that you lose objectivity."

                        That would appear to be the case. The former, that is.

                      2. Trevor_Pott Gold badge

                        Re: @Archaon

                        The tears of enraged commenters power my happiness.

          2. Steven Raith

            Re: @Trevor_Pott Change in Flash technology to eliminate finite write lifetime?

            @Archeon

            ""What use is a SATA 32GB SSD today ....."

            Even taking frank ly's *Nix example as a given, I've got a machine with a pair of 30GB (not even 32GB) SSDs in RAID 1 which runs Server 2012 R2 Standard quite happily. Believe it typically sits at around 11GB free."

            Come back to me in a couple of years when WinSXS has eaten up the rest of that ;-)

            Steven R

            1. Archaon
              Trollface

              Re: @Trevor_Pott Change in Flash technology to eliminate finite write lifetime?

              "Come back to me in a couple of years when WinSXS has eaten up the rest of that ;-)"

              Oh no. The terror. You mean I have to set up a StartComponentCleanup scheduled task to run once in a blue moon? Oh no. Save me from this tragedy please. The world hath ended and the fiery gates of hell have opened to swallow me up.

              Come off it.

              1. Trevor_Pott Gold badge

                Re: @Trevor_Pott Change in Flash technology to eliminate finite write lifetime?

                StartComponentCleanup does not prevent WinSXS from growing unchecked. It just slows the progression somewhat.

        2. Trevor_Pott Gold badge

          Re: @Trevor_Pott Change in Flash technology to eliminate finite write lifetime?

          "As the root, /home, swap and /{data} partitions of my desktop computer. "

          Desktop linux is pretty goddamned niche. From my original comment:

          "What use is a SATA 32GB SSD today, excepting in some very niche applications?"

          Funny how when you quote it you leave off the last bit.

          Also: " I'm sure most people at home (a big market)" won't be using Linux on the desktop. Doubleplus when we talk about putting different directories on different drives. Sorry, mate. You're not so much in a class by yourself as homeschooling from a tree in the middle of the Yukon.

    2. Archaon

      Re: Change in Flash technology to eliminate finite write lifetime?

      "I would imagine that eliminating the lifetime contraints of a flash cell has to be one of the goals current and future manufacturers are striving for."

      Proper enterprise SSDs are rated for 20+ full drive writes per day (DWPD) over a 5 year period. Endurance is not a problem in most, it's just a case of choosing the right SSD for the job. A good quality SSD being used for the correct task for the SSD type shouldn't hit it's endurance limit within the usable life of a system; and in terms of manufacturing defects etc I believe I'm right in saying that the failure rate of SSDs is considerably lower than of spinning disk.

      If you put a consumer grade SSD (normally rated well under 1 DWPD) into a server that's constantly caching or writing small data (as per one of Trevor's examples) then yeah sure, it will murder it in short order. For the exact opposite reason you're not going to put one of those 20+ DWPD enterprise grade drives in your desktop PC because they're expensive and there's no benefit to you in doing it.

      A part of me dies every time I see someone decide to try and be clever and save money by putting Kingston SSDNow drives* in their production servers or - god forbid - a storage array.

      * No hate for Kingston, just an example of a cheap consumer drive that sprang to mind.

      1. Anonymous Coward
        Anonymous Coward

        Re: Change in Flash technology to eliminate finite write lifetime?

        A Kingston SSD Now has about a 10th of the write endurance of a good enterprise SSD but it is also about a 10th of the price. It is perfectly suitable for some uses.

    3. prof_peter

      Re: Change in Flash technology to eliminate finite write lifetime?

      Until recently the *only* goal of flash designers was to deliver more bits for less money - until maybe 3 years ago maybe 95% of the flash that was produced went into iPods, cell phones, thumb drives, and SD cards, with SSDs accounting for a total of 3% of flash production. In the 10 years before that point, flash performance went down by nearly a factor of 10, and lifespan of low-end flash went down by far more.

      A lot can change in 3 years, though, and there is now enough demand for flash in performance-critical roles to at least halt this decline. (whether enough customers will actually pony up the money for higher-performance chips to make them profitable is debatable, though.)

      The reason we don't see flash devices failing left and right due to wear-out is because they've been getting bigger more rapidly than they've been getting faster. (or more accurately, than computer I/O workloads have been getting faster) The internal wear-leveling algorithms distribute writes evenly over the chips, so if you build an SSD out of consumer-grade chips with a 3000-write lifetime, you have to over-write the *entire* device 3000 times before the first chips start failing. (Sort of. A small number will die earlier, but there's spare capacity that can deal with that.) For a 500GB laptop drive, that's 1.5PB of writes, or 35GB/hr, 24/7, for 5 years. For a 1TB enterprise drive full of 30,000-cycle eMLC chips, you'd have to run it at 700GB/hr (200MB/s) for those 5 years to wear it out.

      And wear-out isn't just a problem for flash anymore - the newest high-capacity drives are only rated for a certain total volume of reads and writes, due to head wear effects, which works out to a shorter lifespan than flash. (and reads count, too, while they're free on flash) See http://www.wdc.com/wdproducts/library/other/2579-772003.pdf for the gory details...

  2. Anonymous Coward
    Anonymous Coward

    I like spinning disks. They may not be as fast, but in case of failure a disk has the opportunity to get some info off it 90% of the time.

    SSD's can go from good to bad in a microsecond taking all data with it. I use SSD's on my gaming rig, but keep my long term data on a pair of traditional RAID-1 4TB SATA drives.

    1. This post has been deleted by its author

      1. Rebecca M

        OK, you knew that, but did you realise that the risk of silent data corruption is actually higher with a RAID-1 array than it is with a single disk? No? Well, it is.

        You now have two disks, each with it's own on-board electronics, cache RAM, and firmware bugs, storing the same data. The same data is read from one drive one time you access it, and another driver another time. So the chance of any block of data being silently affected by a drive fault it doubled. Your RAID controller won't even notice, and will pass the bad data up to the application level twice as often.

        Bollocks. The overwhelming majority of hard drive errors (>99%) are between the platter and the head. Controller gremlins are so rare that you can essentially ignore them from a statistical perspective. Look at where the real errors happen - either the head didn't record the right thing in the first place (e.g. undervoltage coil), spontaneous corruption (i.e. a bit is toggled on the disk surface) or the head is unable to read valid data from the disk as the result of some issue after writing.

        In all of these cases the read will fail the CRC check built into any hard drive for decades, and the drive firmware will typically retry the operation a few times. If it manages to get the data subsequently the sector is mapped out and re-written in a spare block. This is standard single drive stuff, no need even for RAID at this point. The error only gets reported when the drive abandons the read. If that happens in a system with no redundancy you have a problem. With RAID it is not an issue - the sector is reconstructed from parity information. The possibility that data can be misread from the disk, passes the integrity tests even though it is invalid and passed on to the application without comment is remote in the extreme - the error detection mechanisms built in as standard work. When people talk about silent errors on the hard drive they are generally talking about spontaneous transitions that prevent the data being read as these checks fail, not bad data coming out of the drive without comment.

        As said before that eliminates well over 99% of errors but we'll call it 99% for ease of analysis. If we eliminate 99% of errors through RAID1 but double the remaining 1% does that mean the system as a whole is more or less reliable? You can't simply pretend that 60 years of research in data storage hasn't happened. Basic ignorance of hard drive integrity checking here, Blurays as the backup gold standard last week, a robust backup strategy that had no redundancy the week before. Perhaps it is time to stop pontificating and start learning.

        1. This post has been deleted by its author

          1. Rebecca M

            Secondly, as for the idea that these problems are so rare that you can essentially ignore them, just a few weeks ago, Corsair released a firmware update for one of their 60 Gb drives, (and I'm not picking on Corsair particularly, but it was an issue that affected me, so I have the details to hand):

            Great, so you've managed to prove that controller errors occur. Great, but it doesn't get you anywhere. You've also accepted that notional 1% error rate from the electronics which is what you need to bear in mind when defending the point you originally made:

            You now have two disks, each with it's own on-board electronics, cache RAM, and firmware bugs, storing the same data. The same data is read from one drive one time you access it, and another driver another time. So the chance of any block of data being silently affected by a drive fault it doubled.

            With that 1% figure in mind you have to show not that controller errors exist or even that they increase with the number of drives. You have to show that doubling the number of drives increases the error rate of the associated electronics not by a factor of two but by a hundredfold, simply to get back to where you were, or increase 200x to get to that claimed doubling of risk.

            That's a big claim to make. Arguing about whether the errors attributable to that circuitry is 1% or 2% is most of the statistical insignificance to which I was referring. The rest is of course a read error that causes a CRC to pass even after corruption. Yes that's always possible too, but generally a one in 2^16 chance even for a completely garbaged sector (we're not talking one or two bit errors here). Even combined the two effects are small enough to make nonsense of the entire argument.

            1. This post has been deleted by its author

          2. the spectacularly refined chap Silver badge

            Even if it was 99.9999999999%, then statistically you would experience 110 corrupted bytes in every terabyte of data you move. Doesn't that worry you at all? When you rebuilt the imaginary disk array you speak about above, you'll have no redundancy, (unless you're running RAID 6, and even then you could only hope to IDENTIFY the error, not correct it).

            You are using the wrong set of figures in your calculations. If 99% of errors are eliminated you can do nothing to extrapolate the number of errors remaining without reference to the starting error rate - here you assume that all reads even from a good disk are errors. Use the same set of figures consistently and the position becomes a lot more difficult to justify.

            1. This post has been deleted by its author

              1. the spectacularly refined chap Silver badge

                I know, and I didn't make the original claim, Rebecca did. I simply took her argument and pointed out that even changing 99% to 99.9999999999% still did not produce the result she wanted to, thereby reducing her argument to absurdity.

                Your original position is clear enough - RAID1 increases the scope for corruption. You have twice as much data to go wrong, which copy is read is essentially random so if data is constantly read and written back without checks the scope for corruption across both disks is increased. This is all fine and I would agree with it.

                Rebecca's point is that checks are made in the form of the CRC attached to each and every sector. For any read that results in an error (detected or not) there is a good chance that it will indeed be detected at the CRC checking stage - that is the 99% figure being bandied about. The remaining 1% are the cases that either slip through CRC checking or occur subsequently to it.

                In the 99% cases the error is reported to the RAID controller (hardware or software) so the data is retrieved from the mirror. No data loss results thanks to the presence of that mirror. It is only the other 1% of cases that go undetected that can spread corruption from one disk to the other. This is what happens when an error occurs: it says nothing of the probability of an error occuring in the first instance.

                1. This post has been deleted by its author

                  1. Solmyr ibn Wali Barad

                    "OK, we agree up to a point - great."

                    Good call. It's an interesting discussion, and there's no need to get too emotional about it, because everybody had valid points to make.

                    Please consider two topics that were not mentioned yet:

                    1) PRML encoding method, which was introduced around 1991, and has been prevalent ever since. With major updates of course.

                    PRML means that reading bits off the magnetic media is not a straightforward process, it's a highly sophisticated guesswork. So an uncertain amount of uncertainty is to be expected.

                    2) T10-PI is a system of additional checksums that is designed to combat silent corruption problems discussed here. PI-capable drives can be formatted with 520-byte sectors, 8-byte checksum is kept inside the same data frame as 512 bytes of payload, and consistency checks are performed along the whole data path.

                  2. Rebecca M

                    I know, we all know that. One of the incorrect assumption made by some people, is that these checks always either correct the error or return an error flag to the RAID controller or OS, or that the cases where they do not, (bad data is passed upwards as good), are so rare as to be ignorable.

                    Go back to my original post and you see that I acknowledge their existence but point out that is the uncommon case. You have accepted that uncommon nature. Now let's consider your different error types:

                    1. Complete and sudden failiure. The classic, 'doesn't spin up'.

                    An irrelevant distraction from your original point which specifically excluded drive failure. Let's move on immediately.

                    2. Media errors which enter the drive's onboard controller, are corrected by the controller logic, and good data is passed upwards to the RAID controller or OS. You don't even know this is happening, unless you notice reduced performance or monitor SMART statistics.

                    Well, yes you do since the block remap table is available to any software that asks for it - this isn't even a SMART feature. Again, it doesn't alter the analysis one jot since it happens in both single drive and RAID configurations.

                    3. Media errors which enter the drive's onboard controller, which is detects but cannot correct, an error flag is passed upwards to the RAID controllor or OS. The classic 'Error reading drive C:, (r)etry, (i)gnore, (a)bort, or (f)ail?'.

                    Now this is the key point - this is the instance a single drive configuration can't recover from but a RAID1 setup can, by reading the other disk.

                    4. Controller firmware bugs, (or other rare causes), that pass BAD data upwards to the RAID controller or OS, as if it was GOOD data. Rebecca originally claimed that this never happens. Now she is claiming that it is a very low percentage of errors.

                    I never claimed anything of the sort, just that the effect is so small we can ignore it when considering your claim. Remember your claim: that a RAID1 configuration suffers from more silent corruption than a single drive setup. We have already established that the case 3 errors are the vast majority of errors of this kind and that RAID1 virtually eliminates them. Fiddling around with this tiny percentage of errors does nothing to make that original claim correct unless the rate of them goes up by several orders of magnitude. That is what you have to show and what you have failed to do.

                    It is your job to show where all those extra errors come from. In the absence of that I consider this closed, I'm not letting you continue to redefine and clarify everything you say and misrepresent every argument to the contrary.

                    1. This post has been deleted by its author

                      1. the spectacularly refined chap Silver badge

                        Show me the code to retrieve this information from the controller of a standard SATA or SAS drive.

                        What, you mean like bad144 included as standard on this very (NetBSD) system? Call yourself a data storage expert? It is one of the things in our routine monitoring.

                        READ THE CERN PAPER AND UNDERSTAND IT, then try to find a solution.

                        I've read both papers you cite. Neither support you. You can quote the specific sentence or paragraph if you like but I'll tell you now: it isn't there. You made a very specific claim about a head to head comparison between RAID and a single disk. Neither makes such comparisions because that is not the goal of either paper. Both are concerned with the system as a whole rather than the specific characteristics of the array - indeed the CERN paper ackowledges that the errors they recorded where primarily attributable to memory, i.e. before it even got to disk.

                        It is not enough to show that data corruption exists. It is not enough to show that the most widely used error detection and correction systems are not bulletproof. Rebecca has kept a laser focus on the original claim: each time you have introduced irrelevances and distractions that bring in factors that affect the legitimacy of your claim neither one way or the other. Neither deals with RAID1 at all.

                        Although neither paper addresses your point the NEC paper supports the argument to the contrary:

                        "Disk drives commonly experience errors that are recoverable via the ECC..."

                        "At some point, data becomes unreadable leading to “read failures,”..."

                        "RAID can also catch and address errors flagged by hard drives..."

                        The silent corruption case not reported by the drive is restricted a small fraction of cases. It doesn't tackle that head on but does hint at the reduced magnitude of the problem:

                        "However, there is a set of disk errors not caught by the hard drive..."

                        Please do comment back, I'm enjoying watching how you can dance on the head of a pin for so long withotu admitting the mistake.

                        1. This post has been deleted by its author

                          1. Solmyr ibn Wali Barad

                            @1980s_coder

                            "Go ahead, upvote Rebecca and let her upvote you"

                            For what it is worth, I upvoted every long comment on this subthread, because they contained one or more reasonable points.

                            Which does not mean I condone personal attacks, or unconditionally agree with every claim made.

        2. Paul Crawford Silver badge

          @Rebecca M

          The majority of HDD errors are indeed detected by the controller and/or reported by the disk itself when a read request cannot be honoured. That is what classical RAID protects against.

          With a periodic "scrub", where the system attempts to real all HDD sectors so errors are seen and re-written to hopefully fix the problem via sector reallocation, you get a good chance of not ever suffering from known RAID failure under normal conditions (data read, or more commonly when a HDD is replaced and a rebuild is needed).

          But today where you might have massive data sets you can't ignore the problems of "silent errors" where the HDD's correction/detection system, or any one of a number of other sub-systems, has mess with your data. You might want to read this paper on the subject:

          http://research.cs.wisc.edu/wind/Publications/zfs-corruption-fast10.pdf

          (There is another from CERN but I don't have the link to hand)

          1. This post has been deleted by its author

          2. Rebecca M

            Re: @Rebecca M

            With a periodic "scrub", where the system attempts to real all HDD sectors so errors are seen and re-written to hopefully fix the problem via sector reallocation, you get a good chance of not ever suffering from known RAID failure under normal conditions (data read, or more commonly when a HDD is replaced and a rebuild is needed).

            Yes, I'm familiar with ZFS but that actually makes the point for me. ZFS's integrity checks really come into there own for data that is not accessed for years (decades?) at a time and ensuring that it remains readable by correcting any errors as they occur and hopefully while they are still correctable, rather than remaining undetected for years by which time things have decayed to the point that you don't have enough left from which to reconstruct the original data.

            However, there is an implicit assumption made, namely the errors that result from that additional processing are more than offset by the reduction in errors caused by the underlying storage. There is always an outside chance that the maintenance process itself introduces errors as a result of bugs or disturbances somewhere along the path - electronics, firmware, interconnects, system software etc. The scrub process is still regarded as a good thing to do because those errors are rare enough to be not worth considering when set against against the much greater risks of corruption on the underlying media.

        3. Tom 13

          Re: Bollocks. The overwhelming majority of hard drive errors

          Admittedly a very small sample size, but my personal experience suggests otherwise. Granted it was a solution dreamed up by a cheap CIO* on non-server systems, but 50% of our we've lost this system because of a drive problem were the results of bad data on both drives. IIRC it was 6 systems in two years with a user group of about 30 people. Part of the reason you're missing is point is you're being hyper-technical in saying "hard drive error". Yes, you're technically correct about that. But data corruption that gets replicated to both disks comes from other sources as well. The first is some damn fool who thinks holding the power button for 4 seconds is the same thing as Shut Down. Data corrupts on one drive as a result and gets copied to the other after reboot. The most common of course is a malware infection that corrupts the boot sector. RAID tech being what it is, even at the cheap level we were using (PCI IDE solution from Adaptec) this was immediately copied to the mirrored drive. So it wasn't just a matter of breaking the array then using the good drive to recover the system. I do think in all cases we were able to recover the data. And yeah, the cheap CIO was using it as a backup, not a drive failed protection. The previous backup solution was IDE from some now thankfully defunct outfit who couldn't properly engineer their tape back up to comply with the IDE spec.

          *In fairness to the cheap CIO, he was saddled with a horrendous system by the CEO of the company. The CEO had sold a government agency a solution for using a statistical programming solution running on desktops instead of on a proper server. Yes it was true when the department was 3 people, but not once they got to 30.

      2. Daniel B.

        RAID

        If you scribble random data all over just one of the drives, your RAID controller won't notice, and will return that data 50% of the time, when it reads the relevant sectors from the corrupted drive.

        Um... That only applies to RAID1. RAID5/6 does actual parity check on stuff and thus won't return corrupted data. Even better if you're using ZFS, which actually has data integrity checks. ZFS+raidz1 is the best option out there, if you really care that much about corruptible data.

        1. This post has been deleted by its author

    2. Natalie Gritpants
      Boffin

      Hope you have a backup system too. That you've tested recovery on.

    3. Paul Crawford Silver badge

      You have to start with the assumption that if a storage device fails, you won't ever/economically get any/trusworthy data back off it.

      From that starting point, you ought to have enough paranoia to assume the worst, so you begin with the question of what happens when (not if) your device fails/corrupts?

      RAID save you down-time, both use (machine keeps working) and admin (no need to restore your backup) but RAID!=Backup as we are always told.

      Also most RAID & file systems don't have integrity checks so you can have data corruption and not know until something starts playing up. Once you realise this and the vast amount of data you may need to store (comparable to the 10^14 bits of HDD error rate) you might want that, so you then invest in ECC memory and a file system like ZFS or GPFS that has checks. They also support snapshots, a vastly under-rated feature that can save a lot of hassle in restoring a just deleted/modified file, or simplifying a consistent backup point-in-time.

      And there there is your backup, which ought to be in another building and not on-line as a mounted file system or you might get randsomeware screwed (something that snapshots can also help with, if you notice soon enough).

      Really the arguments for SSD vs HDD that matter are cost/GB and IOPS, and smarter systems will use both to give to lots of storage at good price and responsiveness.

  3. Hellcat

    Why is enterprise loving SSDs.

    Simple. Money.

    An SSD easily knocks 2 minutes (it's probably closer to 5 minutes) off our fat client bootup and login to usable desktop.

    2 minutes (1/30th hour) x 5 days x 48 weeks x average hourly rate > Cost of buying an SSD.

    And that's only over 1 year. Multiply by 3 or more years, and over 10,000 client devices and the productivity savings are massive*. Even the bean counters have to agree with that one.

    *Yes I know it probably doesn't save the company real money as the users go for a smoke/coffee/chat at Dave's desk about that holiday he's just got back from but... when they get back it's ready for them, not sitting at 'applying your settings'.

    1. Daniel B.

      Well...

      An SSD easily knocks 2 minutes (it's probably closer to 5 minutes) off our fat client bootup and login to usable desktop.

      Most companies where I've worked keep all PCs turned on. Desktop boot times don't matter if you aren't booting up that much.

      1. Archaon

        Re: Well...

        Most companies where I've worked keep all PCs turned on. Desktop boot times don't matter if you aren't booting up that much.

        If they're like most companies where I've worked then the actual company policy is to turn all PC off to save power, however staff don't do it because they're working on old junker PCs connecting to a convoluted environment (in our case in a different country) and take 20 minutes to boot up and get logged in each morning.

        When someone else is footing the electricity bill most people - myself included - slip into the habit of just leaving their machine on for convenience.

  4. Howard Hanek

    Not Fit For Production?

    I imagine that there are some data centers where spinning wheels sit in the corner just waiting for that mending job......

  5. Tom 38
    Headmaster

    RAID controllers that are not from before-time-began also know how to talk to SSDs so as not to wear a hole in them. As you move higher up the chain into enterprise SSDs, you find that the individual drives have supercapacitors and thus can do a lot of this directly at the drive level, saving further on wear

    Supercaps are a feature of enterprise SSDs, but have FA to do with wear levelling.

    All SSDs, enterprise or not, have wear levelling in their firmware - on an SSD an LBA does not refer to a fixed storage block, it refers to an internal pointer to a block lookup table, wear levelling rejigs the table according to use.

    However, this doesn't mean that enterprise SSDs are a con - an SSD is a small computer of its own, and the quality of the firmware on the SSD operates greatly impacts the performance of the device.

    Consumer SSDs can do *daft* things that are merely daft when they happen in your home PC, but cost money when they happen in your server - one example is the firmware changing its allocation approach based upon free capacity in the device, so going above 70% usage causes it to stop responding until it has restructured its internal tables, which can take several minutes. This might make sense in a home PC - users expect devices to have good performance right up until completely full, so a little lockup once is acceptable.

    1. Trevor_Pott Gold badge

      "Supercaps are a feature of enterprise SSDs, but have FA to do with wear levelling."

      I apologize for not being more explicit in my article. Supercaps - and the functionality they provide - allow write buffering and write coalescing to be handled by the drive itself, rather than relying entirely on the controller or OS. Because of the supercaps, writes can be stored in buffer on the drive until there is enough to write a full block.

      SSDs without supercaps do not all do this. Some do, some don't, and there is some debate about whether or not those that do should.

      So you are partly correct: supercaps do not directly have anything to do with wear leveling. What they enable is write coalescing which enables a more efficient form wear leveling than would be otherwise possible.

      1. Duncan Macdonald

        Write buffering and coalescing can be done without supercaps

        For example the Samsung 840 EVO and 850 EVO SSDs use part of the array in SLC mode which allows writes to be combined reducing the write amplification for the main TLC array.

        Also a number of SSD controllers combine writes by initially buffering them in the controller RAM even without supercaps. (A major investigation into SSD power fault handling proves that this happens - see https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf for more details)

        1. Trevor_Pott Gold badge

          @Duncan Macdonald

          Regarding your comment "Write buffering and coalescing can be done without supercaps"

          I would like to refer you to my previous comment, wherein I stated the following: "SSDs without supercaps do not all do this. Some do, some don't, and there is some debate about whether or not those that do should."

          I acknowledge that write buffering and coalescing can be done without supercaps. It is the supercaps, however, that allow these operations to occur safely and thus make SSDs that implement these features fit for the enterprise.

          1. Duncan Macdonald

            Re: @Duncan Macdonald

            One question that I have never seen mentioned yet alone answered - when the power to a computer fails - which happens first - loss of the power good signal on the motherboard shutting down the motherboard or loss of the 3.3v supply to the SSDs ? Also what is the time difference between the two ? Will the SSD power hold up long enough to flush pending writes ?

            1. Trevor_Pott Gold badge

              Re: @Duncan Macdonald

              The rest of the system tends to go down between one and three seconds before the SSD. Mobo power powers the CPU, RAM and PCIe cards. It gets drained essentially instantly. Enterprise SSDs are rigorously tested to be able to finish their writes before the supercap gives out. SSDs without supercaps will NOT finish writes.

              Also: not all SSDs with supercaps are the same. (Front pages versus back pages.)

              1. Duncan Macdonald

                Re: @Duncan Macdonald

                If there is one or more seconds from motherboard powerdown to SSD powerdown then a very simple algorithm on the SSD could suffice - after 100ms idle flush all pending writes. This would still allow write combining for frequent small write requests and for infrequent small write requests the write amplification does not matter as the rate of page writes is low.

                1. Trevor_Pott Gold badge

                  Re: @Duncan Macdonald

                  But that powerdown timing isn't guaranteed. Hence why data loss occurs on consumer SSDs during power out events. Thus why supercaps are a thing.

    2. Howard Hanek
      Childcatcher

      SSDs as a system partition

      I've been using 128GB SSDs as a system partition, backing up daily to a second mechanical data drive for our desktops. I've used this configuration numerous times without problems or data loss by simply not storing data on the SSD.

      1. Trevor_Pott Gold badge

        Re: SSDs as a system partition

        Again, going to have to call that pretty niche. Your average punter wouldn't know how and your average enterprise admin wouldn't bother. Few folks have the knowhow and the time to do what you do...not that isn't a good idea. :)

  6. W Donelson

    a 3/4 page article stretched to two pages....

  7. Nick Dyer

    Workload profiles also matter

    Nice article Trevor. It's also worth mentioning that whilst SSDs/Flash are/is fantastic, it's not the answer in certain workloads. Random IO of course screams with flash and is well documented... but running sequential workloads (especially reads) will typically reap very disappointing results outside of synthetic benchmarks.

    This is by far the most common misconception of flash in the enterprise today, and is why careful consideration needs to be taken when designing storage deployments that will be fit-for-purpose for the workloads it's being designed for. The notion of an all-flash datacenter for all workloads is a marketing step too far....for now.

    Disclaimer - I work for Nimble Storage, but this subject is not a marketing message for our tech.

    1. Gerhard Mack

      Re: Workload profiles also matter

      That hasn't been true for several years now. My SSD based setups easily outrun my spinning disk by a very noticeable amount and even sequential writes tend to do better. Did you miss the point where flash has been moving to PciE because the SATA/SAS ports were too slow?

      1. Archaon

        Re: Workload profiles also matter

        That hasn't been true for several years now. My SSD based setups easily outrun my spinning disk by a very noticeable amount and even sequential writes tend to do better. Did you miss the point where flash has been moving to PciE because the SATA/SAS ports were too slow?

        I would assume that the bod from Nimble was talking about storage arrays.

        Yes there is potential for an SSD to outstrip a spinning disk for sequential tasks; however on a storage array that's running mostly sequential workloads with a high capacity requirement it's massively cheaper to go down the disk route, and because the workload is sequential rather than random the benefits of SSD are considerably lower.

  8. Anonymous Coward
    Anonymous Coward

    There have been legitimate reasons...

    ...for consumers to NOT buy SSDs. Those reasons are dwindling thankfully, but they have been very real. Lost data and bricked SSDs are very common. In addition we had almost the entire consumer SSD industry selling defective SSDs for awhile due to them all using one particular brand of controller that had flaws in it. If you suffered this experience you know it caused Hell for a lot of people especially small Biz.

    Even now with new SSDs we still see some controller issues, mysterious capacity loss and even data loss. Contrary to popular belief, regular back ups won't prevent a serious system failure or lost data. It might mitigate how much data is lost, but that data could be very critical to mainstream users who do not routinely use data backup. The fact of the matter is that SSDs are just starting to reach the reliability of quality mechanical drives. The inexpensive mechanical drives have always been rubbish so using them as a reference point as SSD enthusiast and makers like to do, is pointless.

    Hopefully PCIe SSDs will prove far more reliable and much faster than the currently popular SATA SSDs.

  9. Solmyr ibn Wali Barad

    "Flash naysayers also tend to leave out the part where traditional magnetic disks have mechanical components that break down and flash doesn't."

    /stares grimly at a pile of SSDs that were killed by a leaking supercap/

    Oh. Carry on.

    1. Trevor_Pott Gold badge

      I would qualify that as electrochemical rather than mechanical. Any of a squillion electronic bits - from capacitors to volt regs - can go on either a magnetic disk or an SSD. Outside of the electronics driving the storage components themselves, SSDs have write life due to being a solid state medium, and magnetics have mechanical bits that can seize, are affected by vibration, air pressure differences, etc.

  10. Stevie

    Bah!

    Interesting overview.

    My own experience with various overpriced vendor-specific spinners is that the mechanical components, the moving parts, are the most reliable part of a modern disc drive. By far the majority of failures I see are in the support electronics.

    I speculate, after talking to the pater who was in the electronics biz for decades, that the problem is all-but dry joints in the circuit boards from new. Surface mounts should have made these a thing of the past, but I've had a board in my trembling hand where the failure was honest-to-gosh rust under a terminal that wasn't soldered at all, just held down by pressure maintained by the other connection that was soldered, so I don't know.

  11. Alan Sharkey

    I'd like people to name names. For example, I have a Samsung Extreme 250Gb SSD that has just failed on me. It wasn't very old but it was my O/S disk. I didn't lose any data but I did have to re-install all my programs. I also have a Crucial 512Gb SSD in my laptop that, so far, has been very reliable and fast.

    So, what's good and what's bad?

    Alan

    1. Gerhard Mack

      I'm not sure that is helpful. Some will be bad manufactures but some will be early model bugs/bugs and even old cases where SSD was new as a product (I recall a high failure rate when TRIM was new) and some will just be dumb luck.

      Point of example: my 5 year old 32 GB OCZ drive just died 3 months ago and the failure rate of that model was famously high.

    2. Anonymous Coward
      Anonymous Coward

      @Alan Sharkey: As with all things IT - it depends, and as you hang around here you know that already! Workload, quality of drivers, etc etc and of course the inevitable "Act of $DEITY".

      I'm late to the party because although I've been impressed with other people's use - they have been bloody expensive and I totally agree with TP's assessment of those who stick low grade flash into a production server with an incompatible workload and wonder why it dies.

      I removed one of the two 1TB Tosh spinning rusts out of my laptop and popped in a Crucial_CT512MX100SSD1. Creative use of a sysrescuecd, cp -a, gparted and a horrific fstab got me back up and running.

      So far "erase count" is embarrassingly low after four months. This is on a laptop that runs a shit load of stuff (including MariaDB, PostreSQL, Apache et al) and is installed with a compiler - something like 1GB of source gets converted into the latest updates in a monthly session (mmm Gentoo). I'll start moving stuff back to the SSD and see what effect it has with time.

      That fstab in full with truncated UUIDs:

      # <fs> <mountpoint> <type> <opts> <dump/pass>

      UUID="f1" /boot ext2 relatime,discard 1 2

      UUID="16" / ext4 relatime,discard 0 1

      UUID="0b" none swap sw 0 0

      UUID="1f" /var/lib/libvirt ext4 defaults,relatime 0 0

      UUID="14" /var/lib/docker btrfs defaults,relatime 0 0

      UUID="50" /var/log ext4 defaults,relatime 0 0

      UUID="f6" /portage ext4 defaults,relatime 0 0

    3. Trevor_Pott Gold badge

      Micron enterprise SSDs have been amazing to me. Micron M500DC? ****ing spectacular drive. Micron P420m? Life changing.

      Also up there are the Intel drives. 3500, 3700, even the 520. Anything out of that Micron/Intel fab has been extremely good to me.

      To contrast, OCZ is shit, covered in shit, with added shit, layered in shit, all wrapped up in a shit sandwich. The rest all fall somewhere between, with the consumer stuff generally being shite and the enterprise stuff being pretty passable.

      1. Anonymous Coward
        Anonymous Coward

        I'm no expert in who's who here but my Crucial SSD is described as a Crucial/Micron by GSmartControl so I'm guessing that Crucial is the consumer brand and Micron is the enterprise brand. I bought it based on a Toms Hardware review, there was a Samsung jobbie that was newly released at the time that claimed far superior performance for about 2x price.

        So far I'm really happy and I really have thrashed it. Anecdotally, I'm seeing a boot time reduced from say 2 mins to desktop to around 30 secs (KDE doesn't start quickly for anyone). However (cold) boot to login screen is now a few seconds, quick enough that I happily reboot again instead of quoting silly uptimes to Windoze users 8)

        So much for the consumer stuff. I'm still evaluating SSD for enterprise before I start deploying it in anger. I have my own views here but so far they pretty much jibe with TPs - I'm a little conservative ...

        Cheers

        Jon

        1. Trevor_Pott Gold badge

          Crucial is the consumer brand, Micron the enterprise brand. Crucial has a cult following thanks to their RAM. Micron has traditionally sold as an OEM to others who rebrand. That's changing, and Micron is selling more and more under the Micron brand.

          But yes, overall, Crucial = consumer, Micron = enterprise. Easier than remembering which model lines are which with Intel! :)

  12. BornToWin

    Where's the value?

    I doubt the cost/benefit factor is even remotely reasonable on larger SSDs for enterprise vs. conventional HDDs. I have SCSI drives that have been running 24/7 for over ten years with zero failure so it's hard to claim these drives are not reliable compared to SSDs.

    1. Anonymous Coward
      Anonymous Coward

      Re: Where's the value?

      It's tools for the job. You are no doubt familiar with when you should be deploying SAS or SATA and when to use a particular RAID level or not. Now we have another weapon in our arsenal with a set of characteristics. You sit down and study it, prod it and poke at it, play with it (ooo er) and listen to other's advice which you filter through your own experience. You monitor it and compare statistics with other storage devices. You read stuff on t'intertubes and you keep developing your knowledge on the subject.

      Then you deploy it appropriately (*). I'm a consultant, me. You?

      Oh, sorry: the value is in the benefit - calculate it!

      Cheers

      Jon

      (*) Err, this comment and real life might not completely match up

    2. Archaon

      Re: Where's the value?

      I doubt the cost/benefit factor is even remotely reasonable on larger SSDs for enterprise vs. conventional HDDs.

      It depends on what metric you're using. Cost per GB/TB is high on SSD, that's undeniable. On the other hand cost per IOPS is considerably lower on SSD than spinning disk.

      Say for example you need 20,000 IOPS from a storage array for a VDI cluster. The main server/storage environment (domain, email, databases, whatever) is all set up on an existing virtualised cluster backed off to an existing SAN (very common) so there is not a requirement for hosting virtual servers, file data etc, so in effect the array is just storing images for the VDI sessions.

      The majority of users will likely be run off of a small number of gold images so in terms of capacity the storage requirement would typically be quite small.

      To exceed 20,000 IOPS out spinning disk (assuming 90% read/10% write) you need approximately 120 15k 2.5" SAS drives. To run that you need a storage array, which will likely be in a 2U/24-bay format (say £6k for a Powervault or MSA level machine). You'll then need 4 additional 2U/24-bay shelves for said storage array (say £2k each/£8k total). You then need to whack on 120 disks at £180 a pop (£22k). Now assuming you're a good little customer and buy support you not only need to purchase support for the head array but also the 4 drive shelves (say £9k for half decent cover over 3 years).

      Total for that is £45k and you've got a 10U monster requiring 10 power inputs, running 120x drives that not only use around 7-8W each but also kick out a considerable amount of heat. So it's an expensive unit, it costs a lot to power, and it costs a lot to turn your air conditioning up to cope with it.

      If we were to be fairly convservative and say you needed 8 SSDs to hit 20,000 IOPS. You obviously need the array head (£6k) and support (£3k as there's no shelves). And we'll go for say, half decent mainstream endurance 400GB drives (it's read intensive workload so not a huge need for the high endurance drives). 8 of those would come out to say £15k. So total for that system is £24k - over £20k cheaper up-front, uses considerably less power, less rack space and less cooling.

      Yeah you end up with ~1.5TB not ~7.5TB but in this scenario you don't need the capacity, just the performance.

      And ok, I know you said high capacity drives, so for the sake of argument if I ran the sums with 10x 1.6TB SSDs and it comes out at £44k. That's marginally cheaper up-front than the spinning rust, massively cheaper to run and also offers ~7.5TB storage.

      And that's at a basic level (HP MSA or similar as I said) If you were to be a really good customer and buy a bigger, shinier storage array like a 3PAR you would also have to pay support and licensing costs per disk. Would you rather pay that bill for 120 drives (no word of a lie that could easily hit £50-100k just for the basic functionality) or 10?

      I know, it's an extreme use-case - but it's a perfect example.

    3. nijam Silver badge

      Re: Where's the value?

      For rotary rust, it's the 24/7 that makes it reliable. Spinning up/down (and the power changes to do so) is what kills them.

  13. Alan Brown Silver badge

    The real reason

    Flash is cheaper then spinners.

    Yes really.

    For any given task set where you specified spinning drives you can do it cheaper overall with flash and usually for a longer period of time between maintenance windows.

    This is _especially_ true in high speed applications where a couple of high performance flash drives might end up replacing 20 shortstroked spinning drives all hanging off a hellaciously expensive raid controller.

    About the only area where flash is still more expensive than spinning media is bulk storage - and the fact that just about all enterprise flash comes with warranties at least twice that of their equivalent spinning media gives a good indication of manufacturer confidence.

    I've replaced more 1Tb seagate enterprise spinning drives (constellations are best avoided) than I care to think about, but 500Gb flash drives installed at the same time just keep on trucking. By the time you factor in downtime, access speed and labour costs the inescapable conclusion is that the extra money upfront is worth it - especially as the reliability of spinning media is getting worse every year.

  14. martinusher Silver badge

    Its because they're small

    All our new computers come with SSDs. The reason for this isn't that they're big but rather you can still get smallish disks. Our IT department would prefer us to have no disks at all but because that's impractical they equip the desktops with the smallest disks they can get away with. On my old machine it was 80GBytes, the new system with the SSD is about 120GBytes. Try buying a spinning disk that's smaller than 320GBytes these days...

    1. Sixtysix

      Re: Its because they're small

      Our corporate build is deployed onto a 24GB partition... the rest of the drive is virgin territory, untouched since it left the factory (and yes, that means over 460GB "wasted" on most of our desktops!

  15. Anonymous Coward
    Anonymous Coward

    Another day...

    ...another SSD fix. Like is something new - NOT.

    http://www.fudzilla.com/news/memory/37539-new-samsung-840-evo-firmware-fix-coming-later-this-month

  16. SnapperHead
    Happy

    Trevor - nice job on this article. Avoided the hype, focused on reality.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like