Re: Apples and cheese?
Liran,
as the CEO of WEKA, the company in this case trying to twist results to compete with Spectrum Scale, you do a lot of guessing and spreading of false information on how spectrum scale works, please stop doing this. you should have disclosed in your post that you work for the company, in fact that you are the CEO of it as this is considered professional behavior, when making comments about a product your company is trying to compete with.
i for myself work on spectrum scale and actually know how it works, i don't post here as a IBM employee but simply as a person with knowledge and interest to correct some of your wrong claims and educate you on the entry bar for claiming some bragging rights.
first, there is no write cache in Spectrum Scale on the filesystem clients for the exact reason you mentioned, committed data would get lost in case of a power loss. all writes that are requested by the application to be stable are on stable storage before Scale would ever acknowledge them back to the application layer, like any of the other filesystem solutions published results with SPECSFS, including yours.
what the parameter you refer to actually triggers is making sure that all data that has NOT been explicit committed by the application to be FORCED within that time, not delay it in any way. the default if not set is whatever the linux distro sync interval is, which can be anything between 5 seconds and 120 seconds, so explicitly setting it to 30 seconds, simply means, even if the OS didn't sync data within 30 seconds make sure its committed to stable storage within that time.
i am glad you are agreeing that the latency on spectrum scale are exceptional low and almost unbelievable fast, unfortunate the reason for that is not what you claim it is, the reason the latency numbers are so exceptional low on Scale has to do with multiple factors, just to mention a few :
1) scale has a very advanced distributed coherent read cache, so some of the stats/reads SPECSFS does can be served from the cache, the data itself of the benchmark is far too big to cache a reasonable amount data to be relevant performance wise, but for metadata operations Scale gets some benefits from that.
2) scales latency, despite your spread of FUD that its optimized for HDD's is exceptional low, much lower than most any other filesystem on the market, even compared to some local filesystems. the Scale filesystem including network overhead is in double digit usec range, lower than most NAND devices itself.
to put some credibility to my claims, this presentation --> http://files.gpfsug.org/presentations/2017/SC17/SC17-UG-CORAL_V3.pdf shows end-to-end latency with spectrum scale against remote , not local NVME devices with a latency of ~70 usec on read and ~80 usec on write which includes media, network, filesystem and path to/from the application buffer, so the SPECSFS response time numbers of ~100 usec are absolutely possible.
that now actually begs a couple of interesting question. how can a filesystem that claims "flash native file system such as WekaIO Matrix™, out-performs Spectrum Scale by a wide margin regardless of the workload" , which is a pretty bold claim btw, have a minimum latency of ~500 usec for a mix of requests when the media used (link to the spec --> https://www.micron.com/~/media/documents/products/data-sheet/ssd/9100_hhhl_u_2_pcie_ssd.pdf) has a rated latency of 120 usec for read and 30 usec for write ? where is the remaining time spend ? Filesystem inefficiency ? bad raid implementation ?
for your own education on what you need to achieve, to have bragging rights in the distributed filesystem space, you should take a closer look at some of the other charts in the presentation shared above. you claim to outperform Spectrum Scale "by a wide margin regardless of the workload", can you share an example on how you deliver 23 GB/sec from a single client (page 13) ? btw this is limited by the 2 x 100Gb it Network Links used, not the Scale SW stack.
or how you do 2.5 TB/sec in a single filesystem ? or a workload like doing 2.6 Million File creates /second , writing 32k into each file and flush them after write and you need to sustain that for 20 minutes ? its only ~3.2 billion files within 20 minutes to help with the math.
how about a easy target of doing 14 Million Stat operations/sec on a 11 node cluster (page 24) ?
the publication by E8 uses 1/1000th of the in production supported scalability of Spectrum Scale (which is limited by a #define and test ressources), the system is proven to scale linear for independent work entities, essentially what SWBUILD does. Scale has multiple customers with thousands of nodes, its simply a matter of double the HW and the build number of SPECSFS will double, you multiply it by 4 and it will be 4x the results and we can continue this for a while as long as somebody is willing to spend the money for the HW costs for such a excessive benchmark setup.
Sven