three astronauts in a Soyuz capsule
Looks nice and cozy, as long as you don't have to pee.
NetApp announced compaction as an extra form of data reduction in ONTAP v9.0. What is it? Compaction applies to NetApp’s all-flash FAS arrays. A blog post by Adam Bergh, data centre practise lead at Presidio, explains that multiple IOs can be written more efficiently, from a WAFL space point of view, if they are stored in a …
picture it seems as if reading a single block will need to uncompress the entire contents and then recompress what's not needed? Doesn't than mean more back-end I/O? Also wouldn't a lot of I/O be potentially concentrating in specific locations? No? It looks like it'll be effective for white-space but not sure how we'll it'll work for everything else.
Why not have a variable extent layer so you don't have to worry about this?
Note that this applies only to all-flash configurations. With all flash, back-end reads are essentially free from a resource perspective. Also, if I/O does end up concentrating in a specific location, that is actually a great thing. That is what memory in a storage array is for.
Even if you have a variable extent layer, you still have a fundamental smallest unit of allocation under the covers. In theory you could make this as small as 512-bytes since that was the sector size of disk drives. But, more and more devices are switching to 4k at that layer which is the size WAFL has always used.
With inline compression in particular, the ability to compact multiple sub-4k chunks into a single 4k block will have a measurable impact on storage efficiency. Not as much on its own as inline deduplication or compression, but it all helps.
Variable extent layer/variable block size would solve this perfectly. If you are at NetApp, ask anyone familiar with FlashRay about the ExtentStore layer (not sure if this was inspired by that)... Since you are dealing with an FS that lays out in 4K blocks you have to do this for blocks that compress down into variable sizes.
Because you have to read from the storage device at particular block size, doesn't mean your data has to be structured in that manner. Even with this you are going to have to read the full 4k, I'm assuming.
Filesystems based on variable extent sizes are cool
Given that by the sound of it you're an ex NetApp person we'll not talk about how discussing NDA material after you've left isn't exactly Kosher. But you'll also know that theres a LOT of interesting filesystem designs floating around NetApp. Arguably theres more filesystem design work done at NetApp than the rest of the industry combined.
Having said that, there's is certainly some interesting engineering challenges around having a completely variable extent size when the smallest physical allocation unit on the media becomes less granular over time e.g. what's the point of having a 512byte granularity in your extent sizes when the smallest amount of data that can be committed to and SSD is limited by the page size which today is typically around 8Kib. Given that it's not possible to read or write less than a complete page size to the media, so is it worth optimising transfers to and from the SSD at less than the page size of that SSD ?? What happens when you change to media that's byte addressable rather than block addressable ? Does that mean all your careful erase block size optimisation is now useless ?
The answers to many of those questions us unfortunatley buried under a lot of vendor NDAs .. but for those who are interested in the problem space check out http://codecapsule.com/2014/02/12/coding-for-ssds-part-1-introduction-and-table-of-contents/
Having an extent based filesystem solves a number of problems in an elegant fashion, but it is by no means the only way of solving that problem. Compaction is just one example that scales really well with compression (I've seen > 30:1 for some highly compressible workloads). Could that be combined with an extent based approach ? I really wish I was allowed to say :-)
Regards
John
Why recompress, if you read? Just throw away what you don't need...
Also keep in mind, that the in-memory cache will be compacted too, leading to "cache amplification", thereby *reducing* backend I/O for reads. Especially if "the I/O is concentrating on specific locations"!
Writes on the other hand might get a little more complicated, slightly mitigated by the fact, that WAFL always writes to new (4KB) blocks. (No read-rebuild-write, but more tricky garbage collection, I imagine)
Sebastian
(Disclaimer: I teach storage, mostly NetApp, but not affiliated with them)
If you have a fixed block architecture, implementing block folding (aka compaction) is a good way to not waste space from partially filled 4k blocks. Architectures with variable block filesystems implementing variable block compression don't need to solve for this.
Compaction addresses ontap specific inefficiencies given the fixed 4k architecture.
C'mon now! Nothing to see here.
3PAR released something similar about 3 years ago, but it was specific to I/O optimization not data storage, specifically optimizing the I/O to and from SSDs to minimize wear and maximize efficiency, also improved bandwidth efficiency as well.
You could google for "Adaptive Read Caching" and "Adaptive Write Caching" for HP 3PAR for more info on the topic(you could throw in the google search term 'techopsguys' for my blog post on the topic back in 2013).
I remember Novell Netware doing this back in Netware 4 days (circa 1995) and I am sure they weren't the first.
The only difference I can see here is they are doing it with flash instead of spinning rust...
Made limited disk space on academic server stretch that extra bit further...
Although 2,000+ user directories on 36GB RAID 5 array (5x9GB SCSI +1hot spare) was still quite a squeeze. (around 20MB per user)
All market leading Flash Array vendors must utilize a minimum, fixed, atomic size of capacity to address storage on their SSDs or flash modules. Here's my list of what their minimum addressable block sizes are:
XtremIO 8k
VMX flash 512 byte
Pure 512 byte
Violin 4k
NetApp's AFF 4k
Unity (VNX) 520 byte
3PAR 16k
IBM Flash System 512 or 4k
Whether they're grouping that minimal addressable block size into extents, compression groups or allocation areas is irrelevant. How many can consolidate partial block writes of their minimal addressable unit of capacity into a fewer number of those blocks from different LUNs and files? Only NetApp is advertising their ability to do it. Tell me who else can do this? Links?