back to article NVMe SSDs tormented for months in some kind of sick review game

NVM Express (NVMe) is the next generation specification for accessing non-volatile memory such as flash. Traditional technologies such as SAS and SATA are just too slow. In order to demonstrate how much of a difference NVMe makes, Micron has provided 12 9100 NVMe flash drives, 800GB each in the HHHL (standard PCIe card) format …

  1. lansalot

    cratered during file copy?

    How were you copying, because if it wasn't unbuffered then no wonder it died...

    1. lansalot

      Re: cratered during file copy?

      Ignore the "how were you copying" - screenshots (that I couldn't see too well on mobile) clearly show Windows Explorer.

      That's buffered IO and it absolutely WILL bring a server to its knees. Next time, watch the memory tab go through the roof and when it approaches maximum, that's when your server starts dying. If you're using Windows Explorer for benchmarking to copy files, then you're doing it wrong - the amount of memory in your server is taking up the slack and your results are therefore invalid.

      Next time, use "xcopy /j".

      1. Charles 9

        Re: cratered during file copy?

        What about robocopy, which is designed for bulk jobs and also supports unbuffered copy?

    2. Destroy All Monsters Silver badge
      Paris Hilton

      Re: cratered during file copy?

      > if it wasn't unbuffered then no wonder it died...

      One would hope to imagine that running low on I/O buffers would not mean that Buckle your seatbelt Dorothy, 'cause Kansas is going bye-bye!

      1. Trevor_Pott Gold badge

        Re: cratered during file copy?

        Hey guys, I did buffered, I did unbuffered, I did every kind of copy I could imagine. I tried multiple operating systems, I even used LiveCDs in an attempt to remove the local disks and the SATA controllers from any use whatsoever. I tried every conceivable kind of anything I could imagine and i regularly ended up with the SSDs faster than any of the operating systems in play could talk to before running out of CPU.

        1. Danny 14

          Re: cratered during file copy?

          Interesting. I remember when i got our database san, a dell MD3200 with a mixture of ssd and 15k spinning rust. It runs from quad HBA sas to a poweredge 720 with dual HBA (there are actually 2 of them in a 2 node cluster). It had 192gb ram too so well under your spec.

          Obviously the first thing you do is bugger the redundancy and see how fast you can MPIO them. The ssds were raid 0 on single lun. MPIO across the lot and turn off any sort of redundancy on the san.

          W2k12 (not r2) had no problems with windows explorer or robocopy copying big fat ISOs or 100ks (not millions i accept) of website files. I got 2.5GBs out of it copying email datastores (the largest file i could find). Nothing borked.

          1. Danny 14

            Re: cratered during file copy?

            After edit time. Perhaps the MPIO drivers were non certified? Shadow file copy disabled?

          2. Trevor_Pott Gold badge

            Re: cratered during file copy?

            Windows rarely give me issues until I hit 2.5M files. Around 5M you really start to notice it. By 10M it usually starts behaving very oddly.

        2. Alan Brown Silver badge

          Re: cratered during file copy?

          "I regularly ended up with the SSDs faster than any of the operating systems in play could talk to before running out of CPU."

          That was my experience on a far smaller scale too. These babies are _FAST_, which is good news when messing around with databases and spooling ~100 simultaneous backups (Bacula) across the network.

          I was using Intel NVMe HHHL, but it was clear the card was outrunning the systems for everything real-world I wanted it to do.

      2. Alan Brown Silver badge

        Re: cratered during file copy?

        "One would hope to imagine that..."

        One would, but that's Windows.

  2. Pascal Monett Silver badge

    "a software RAID wall"

    Hmm, seems like the old "640K is enough for anybody" is still lurking in the back, somewhere.

    Might it have something to do with CRC ? Like, Windows is expecting a nice little 100-files a second speed, gets 1 000 000 and goes "WHAT!!", then kernel panic and full retreat because the code was written with a loop still controlled by a 16-bit integer.

  3. m0rt

    "For fun, I tried creating a Windows RAID across the 10 iSCSI devices presented to Neon. "

    Mr Pott, I salute you for your dedication to your craft. :)

    1. Roo
      Pint

      "Mr Pott, I salute you for your dedication to your craft. :)"

      I second that, fair play Trevor. :)

    2. Anonymous Coward
      Pint

      Worthy of the Order of Torquemada Medal for most bat-shit insane testing practices in the biz.

      [And I thought I was alone in being this extreme.]

  4. G2
    Facepalm

    wrong image?

    https://regmedia.co.uk/2016/06/24/nvme_cards.jpg

    the ones in the picture look awfully close to RAM modules to me... or those NVMe cards have shrunk from their supposed HHHL format?

    1. Nick Ryan Silver badge

      Re: wrong image?

      They do look like I'd have some serious problems connecting those to the PCIe bus in a system without a little help.

      Maybe they meant this instead? https://regmedia.co.uk/2016/04/13/micron_9100s.jpg

  5. Cem Ayin
    Meh

    "Drives"

    With non-volatile storage devices such as these available it's a pity the concept of a single level store (as implemented in MULTICS [now defunct] or AS/400 [now IBM i]) never caught on in the mass market. These devices would neatly fit in the storage hierarchy between a DRAM write cache (to ease the wear on the flash storage) and the remaining, higher latency stuff. Am I the only one who thinks that it's really a waste having to use these devices as "drives", for lack of a software abstraction that is able to leverage their power?

    1. Destroy All Monsters Silver badge

      Re: "Drives"

      From:

      From: "Adapting to Thrive in a New Economy of Memory Abundance", Kirk M. Bresniker, Sharad Singhal, and R. Stanley Williams, Hewlett Packard Labs - IEEE Computer 2015/12, pp 44-53:

      Simultaneous adoption of massive NVM pools unifying storage and main memory, centimeter-scaling photonics, application-specific computation acceleration, and relegation of I/O to peripheral interfaces could indicate a fundamental shift in information processing that harkens back to Turing. With today’s emphasis on cheap computation, scarce volatile memory, and abundant nonvolatile I/O storage, systems must constantly manage data flow into and out of memory. The application code provides the translation mechanisms between the efficient, dense in-memory representation and the serialized, buffered persistent or communication representation, while the OS maintains application state and mediates hardware resources. Without the state provided by the OS and application code, the in-memory representations are meaningless. Data must be computed to be useful, but what happens when a vast in-memory representation lives much longer than the now ephemeral computation? Data might need to carry its own metadata and be packaged with its own applications and OSs. As with Turing’s universal machine, the heart of the new machine will be memory, with demonstrably correct access to data in perpetuity. Given that this concept of computing could be the catalyst for many profound insights, we have christened it Memory-­Driven Computing. Having emancipated memory from computation and made it the centerpiece of computing, how do we guarantee its correctness? Augmenting the interfaces to memory with a state-change mechanism based on a functional language could provide a formally provable evolution of data without side effects as well as a self-describing type system to guarantee continuity of data interpretation. Adding strong cryptography and a capabilities-based permission system could give future generations the confidence that our information legacy is trustworthy.

      Image of the Future Memory Hierarchy

      1. Cem Ayin

        Re: "Drives"

        Yes, the architectural concept behind the project dubbed "The Machine" by HP management is certainly interesting, but IIRC it all hinges on the availablility of technically and commerically viable memristor memory (it's supposed to be built not around a single level /store/ but really a flat, single level, persistent main memory, as in the diagram linked to in your post) and it remains to be seen if HP, currently a company very much in distress, still has the power to make this a reality.

        The problem is, as I (with my limited competence in the field) see it, twofold: 1. On the level of electrical engineering (provide the chips for a single level persistent memory, and with good-enough yields at a competitive price point) and 2. provide a software abstraction and SW development model that gives enough benefits to make abandoning legacy code attractive and commerically viable.

        Neither is a trivial task, to say the least...

        If they do manage (and I hope they do), interesting times could be ahead indeed

  6. David Roberts

    Hardware RAID couldn't compete?

    Time for better hardware RAID?

    The message seemed to be that the drives were just too fast for software RAID built into any common OS.

    1. phuzz Silver badge

      Re: Hardware RAID couldn't compete?

      Trevor couldn't hardware RAID them because the drives were on individual PCIe cards.

      So unless someone builds a RAID controller that has PCIe slots on it then software RAID is the only way.

      (Software RAID is less of an issue than normal, because the CPU has a decent amount of bandwidth to the storage, and plenty of oomph)

    2. Sil

      Re: Hardware RAID couldn't compete?

      It's probably true of hardware raid too.

      Driving a dozen high performance NVMe drives in Raid 5 ou 6 would require some serious computing & i/o capabilities.

    3. Lusty

      Re: Hardware RAID couldn't compete?

      When you hit limitations of PCIe v3 on the mobo and CPU adding a hardware abstraction will only make it worse. There are ways to improve the situation. Implement PCIe v4 will double throughput pretty much. Add lanes might improve throughput. Choose your motherboard, CPU and design very carefully might improve things. For instance a single socket would probably make design easier so that Windows doesnt accidentally split IO between two PCI buses with multiple cards. Also make certain that all PCI lanes are actually real PCI lanes, there are various ways to add more lanes at the cost of performance.

      The tests looked good and he rightly said at this level youre just moving bottlenecks. I once got 1s latency out of a Violin by increasing queues to ramp up IOPS :)

      1. Alan Brown Silver badge

        software raid and singlethreaded IO

        With linux: make sure you use the multithreaded IO scheduler when dealing with fast SSDs, otherwise all IO is singlethreaded (this is different to cfq/noop/deadline - those are all singlethreaded schedulers) and you'll max out (As Trevor discovered)

        Which in practice means "add scsi_mod.use_blk_mq=1 to your grub boot options"

        It does make a difference and it'd be interesting to see if Trevor can quantify it.

    4. Alan Brown Silver badge

      Re: Hardware RAID couldn't compete?

      "Time for better hardware RAID?"

      Software RAID has been eating hardware RAID's dogfood for years. The last time I bothered with HW raid cards was 5 years ago (£1200 apiece) - running SW raid on the same system was actually faster and had lower latency (SATA SSDs). The only advantage of HW RAID was battery backed write caching but once you have SSDs in play that advantage is mostly negated.

      You can get PCIe expansion busses but the problem is that the bus itself becomes the bottleneck before very long.

      1. John 104

        Re: Hardware RAID couldn't compete?

        Software RAID has its place, we use it here at work. But if your host operating system eats it, you've lost your storage....

        1. Alan Brown Silver badge

          Re: Hardware RAID couldn't compete?

          "But if your host operating system eats it"

          Then you can usually rescue with a CD boot. (Dunno about windows, but definitely true for BSD and linux)

          If your hardware card goes phut, you usually need to acquire an exact replacement to rescue your raid.

  7. Anonymous Coward
    Anonymous Coward

    ZFS etc

    What if the drives were instead set up as a ZFS pool (or whatever they are called) or any other suitable RAID like array other than linux software raid?

    1. Trevor_Pott Gold badge

      Re: ZFS etc

      Tried ZFS. Same problems as any other software RAID.

  8. allthecoolshortnamesweretaken

    Quick poll, fellow commentards:

    "Trevor the Tormentor"

    or

    "Trevor the Torturer"

    ?

    1. GrumpenKraut
      Windows

      "Trevor, the guy owning a machine with 384GB of RAM". Me: bloody envious.

      1. Daggerchild Silver badge

        "Trevor, Terrifier of Transistors"?

      2. Trevor_Pott Gold badge

        Thank Micron! They supplied the RAM for this little endeavour. I will be reviewing it separately. The full lab is here: http://www.trevorpott.com/thelab/

  9. Alistair
    Pint

    NVMe devices

    I'm old enough that calling them (or for that matter SSDs, SD cards etc) "drives" just doesn't quite feel correct.

    @ ATCSNWT:

    This was an example of Pottorture.

    Trevor:

    Given a distributed computing environment, and these being available in the 2G range, I'm actually considering one per node as scratchpad tmp space - we've an ETL that runs and generates up to 200,000 temporary files in a 100Gb lv that *might* just be precipitating internal timeouts. If we can serialize that we'd be able to keep it under 2G of data at a time. These just might be fast enough for that. Thoughts?

    Beer for playing with toys.

    1. Trevor_Pott Gold badge

      Re: NVMe devices

      SHOULD work...but I'd need to test to confirm. Ping me by e-mail and I'll see about getting you remote access to a node with some cards in, you can then test before you set about buying.

  10. Ellipsis

    When the number of files being read and written goes above a few tens of thousands, Resource Monitor will tie up a core just managing and displaying the list box in the “Disk Activity” section of the UI. Switch to the CPU tab (which doesn’t display that list) and watch it become responsive again.

    As ever, the act of observing affects the outcome…

    1. Anonymous Coward
  11. Jarndyce
    Devil

    Software RAID impact - ZFS?

    It would have been nice to see if ZFS could have made any impact on the utilization (Software RAID wise)

    1. phuzz Silver badge

      Re: Software RAID impact - ZFS?

      Or Storage Spaces come to that.

      1. Trevor_Pott Gold badge

        Re: Software RAID impact - ZFS?

        Both ZFS and Storage spaces were unable to cope with these units any better than regular Windows or Linux software RAID. Well, actually, that's a lie. They coped "better", but not "better enough". The software "lash it all together" solutions were clearly bottlenecks in all cases.

  12. John Robson Silver badge

    So basically...

    These are fast enough to make the bottleneck be somewhere else for most loads.

    To suggest that other bottlenecks will appear is hardly groundbreaking. It's what a bottleneck is, that thing you concentrate on until something else takes it's place...

  13. Brian Miller

    Use a benchmark or compile a large project

    Want to know disk performance? Either use a disk benchmark program, or else compile a large project. For instance, see how fast Gentoo will compile itself on your system. Copying a bunch of files using Windows Explorer is ... just not good.

    1. Trevor_Pott Gold badge

      Re: Use a benchmark or compile a large project

      If you had read the review you would have learned that I tried rather a lot of things. For months. Here are the benchmarks I've used.

      Databases

      Hammerora http://hammerora.sourceforge.net/  Microsoft SQL, MySQL, Postgres, OracleDB (if you have it).

      OStrell http://blogs.msdn.com/b/psssql/archive/2014/04/24/version-9-04-0013-of-the-rml-utilities-for-x86-and-x64-has-been-released-to-the-download-center.aspx Microsoft SQL, as part of the SQL RML Utilities.

      SQLIO http://www.microsoft.com/en-us/download/details.aspx?id=20163  This writes all zeros. It tells us a very specific thing about how "zero blocks" are dealt with. It's tricky. Follow http://www.mssqltips.com/sqlservertip/2127/benchmarking-sql-server-io-with-sqlio/ and http://www.brentozar.com/archive/2008/09/finding-your-san-bottlenecks-with-sqlio/

      SQLIOSIM https://support.microsoft.com/en-us/kb/231619?wa=wsignin1.0  this is to test stability, not performance. https://www.simple-talk.com/sql/database-administration/the-sql-server-sqliosim-utility/

      General disk tests

      FIO http://freecode.com/projects/fio Read http://support.sas.com/resources/papers/proceedings13/479-2013.pdf and all will be revealed.

      Iometer http://www.iometer.org/  Various configurations

      Exchange

      Jetstress 2013 http://www.microsoft.com/en-ca/download/details.aspx?id=36849

      Jetstress 2010 http://www.microsoft.com/en-ca/download/details.aspx?id=4167

      Background work tests

      Using iometer determine your peak global IOPS as per above test. Load the system to 25%, 33%, 50%, and 75% of IOPS capacity. Now run various common administrative tasks and time them.

      1) Full VM backup using VM backup software

      2) Snapshot

      3) Clone

      4) Creation of VM from template

      5) SQLIO test runs on a single VM (testing mixed workloads!)

      6) Exchange Jetstress (testing mixed workloads!)

      7) SQLIO and Exchange Jetstress (testing mixed workloads!)

      1. lansalot

        Re: Use a benchmark or compile a large project

        May be worth trying diskspd as well?

        http://www.happysysadm.com/2016/06/measuring-iops-part-2-diskspd.html

        I did see an article recently about some kit that could stress disks magnificently, but can't find it now...

  14. gypsythief

    How to _really_ stress these drives...

    10 Echo "Hello" > C:\hello.txt

    20 GOTO 10

    That would've taught 'em a lesson!

    1. Charles 9

      Re: How to _really_ stress these drives...

      I think what you mean is (using Linux notation):

      cat /dev/urandom > randomfile

      I tried finding the equivalent for a DOS/Windows Command Prompt but found any analogue to be nontrivial.

  15. fnj

    Endurance???

    What the heck is the endurance? Endurance is the sore spot of flash. Anybody with a clue is more concerned with the endurance than with any other spec.

    1. Nick Ryan Silver badge

      Re: Endurance???

      ...and there I was thinking that endurance really isn't that much of an issue with modern SSD technology. The things generally have enough wear space and good enough cycling (can't remember the actual term) algorithms that they will last for 5 years even when treated horribly. That's more than we'd usually trust spinning rust for.

  16. GeorgeWHerbert

    Endurance

    I always want to know performance, but almost more importantly the endurance or lifespan of the SSD drives. I keep melting them to slag and need to know replacement intervals and redundancy planning concerns.

    You obviously went to some effort to beat these up but didn't report drive total lifetime write activity. Any comment on how far you got, and if you can just sit and cycle one or more to destruction to satisfy my hunger for magic smoke (and, budget and ops planning for the bosses for whom I might have to buy a bunch of them...)?

    Thanks.

    1. Trevor_Pott Gold badge

      Re: Endurance

      I abused the piss out of those things for two months and they have since moved into regular lab use. Josh has two in his video editing desktop that he abuses all day long. I have them scattered about the lab in every server I can find. I have yet to see them go below 99% lifetime, according to the diagnostics.

      1. GeorgeWHerbert

        Re: Endurance

        The magic here is front side, bus and controller and buffer channels. Not flash device physics. They're going to die at some endurance point. Last time I pounded on devices for 2 mo I ate 65% or more of rated life and indicated wear. Do you have SMART data?...

  17. Elden

    More Details Please

    I am not sure what "Windows RAID" means...

    Were you using Storage Spaces or legacy Dynamic Disks?

    Were you using FAT32 or NTFS or ReFS file system?

    Were you using a Stripe or Mirror or Parity for resiliency?

    1. Trevor_Pott Gold badge

      Re: More Details Please

      I have used both storage spaces and the dynamic-disk based RAID. NTFS is the filesystem I most tested, but I did run a few ReFS tests. (ReFS can handle a few million more files, but honestly the difference, at least in Server 2012 R2, isn't that great.) I was using a stripe/RAID0 rather than parity or mirror.

  18. msroadkill

    For a very different nvme milleieu, howzabout local raid0 nvme storage on a gpu?

    http://www.anandtech.com/show/10518/amd-announces-radeon-pro-ssg-fiji-with-m2-ssds-onboard

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like