A few thoughts
Hi everybody,
Disclaimer: My name is Nick Triantos and I am and Nimble Storage employee
Lots of interesting comments in the thread, some with good intent, so I will dignify those.
For those not familiar with the Nimble Architecture...
CASL is a Log Structured Filesystem. Two of the basic principles of Log structured file systems are that you only write in free space and that space that has already written to can't be overwritten until it’s garbage collected first. If this this sounds familiar it’s because this Principle that also exists in Flash.
https://lwn.net/Articles/353411/
In fact, all SSDs under the hood use a form of a Log Structured File System (link above). Why ? Because, unlike disk, reads and writes in Flash are very Asymmetric. It takes much more time to erase and write a flash cell, than it takes to read. Flash also has a finite lifetime, therefore how storage writes to it becomes very important.
CASL writes to SSDs in chunks. A chunk is the amount of data that will be written to an SSD before writing to the next. Our chunk size is an even multiple of an Flash Erase Block. This leads to lower write amplification and wear.
Additionally, our RAID layout has changed. We write across 20 data drives vs 9 which means our Segment size vs our Hybrid has increased by 2.5x. Additionally, we use some of the SSD overprovisioned space as spare chunks. That allow us to reserve less space for rebuilds as well as have one of the highest raw:usable % in the industry.
Furthermore, not only do we protect against ANY Three Simultaneous SSD failures vs the standard industry approach of 2, but we also provide Intra-Drive Parity which affords us the ability to recover from 1 sector failure on the remaining SSDs.
Anyone who has read Google's recently published USENIX paper on a 6 years SSD reliability study, will understand why we provide these extensive levels of Data protection along with Data Integrity techniques for Lost and Mis-directed writes, segment checksums, snapshot checksums, Quick RAID rebuilt and many more only found in Tier 1 Enterprise systems.
http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/23105-fast16-papers-schroeder.pdf
CASL also does system level GC and uses the TRIM command to notify SSD Flash Translation Layer (FTL) of copy forwarded blocks. We do this to optimize write efficiency.
Throughout our development and almost 6 mos beta process we've used IDC's vdbench guidelines to test our AFA just so to make sure our performance isn't impacted like some of our competitors when capacity increases to dangerous levels and with inline data reduction on. So we don't take our foot of the pedal and continue process *everything* inline.
Lastly as a common sense point…, any architecture who can so effectively and transparently Garbage Collect on an SATA and NL-SAS drives as CASL has done the last 6 years, can Garbage Collect on SSDs.
Thank You
Nick Triantos