* Posts by NinWalker

3 publicly visible posts • joined 29 Jan 2016

SPC says up yours to DataCore

NinWalker

These classes of benchmarks are pure crap and a relic of the 90s. Cheesy makes a point about how modern architecture scale with core count. Yes, that is the fundamental principle of the modern scale out software. And they shard data across nodes and uses message passing to coordinate.

Of course he didn't mention that to coordinate at scale almost all of these classes of applications increase replication factor or relax consistency. One creates hardware inefficiencies while the latter creates application logic headaches for anything except really trivial problems. Acid is a property that is much nicer to reason about for things that require any form of correctness guarantees. Having one global view of the universe across all nodes is a much easier thing for most developers to understand.

For storage though, and especially centralized (and by necessity) multi application storage that interacts with distributed applications there are no discernable patterns that are really cacheable. Spatial and temporal location of data doesn't work for inputs into centralized sister systems that aggregates shards of data from heavily distributed applications. Cache misses becomes the norm and tail latencies dictate true performance. Vendors who claim average latencies as proof points of end user experience but in reality a spiky tail latency and transactional semantics are rapidly becoming the bigger problem. And most apps can't bound response of data to trade off speed for correctness the way many I'd the web properties do.

Which leads me to the benchmark. It's crap. It's built in an era of tech that is rapidly disappearing. Fat database servers whith concentration of io's thanks to btree style data structures to find data when dram caches misses are the underlying assumption in these tests and it's flawed in a world where a storage device is expected to cater to a few apps running in 1,000 physical machines and 10,000 cores. Data access is concurrent, random and heavy in metadata.

So you don't have this environment you say?

You will, either by yourself as you are forced to evolve beyond your legacy Oracle/Microsoft apps or when you land in the cloud and your provider builds this for you. Either way, dram accelerating systems are a thing of the past and at best a detour in history vs the modern constant rate latency system that the hyperscalers are building to cope with concurrent and arbitrary length processing streams.

The network: Your next big storage problem

NinWalker

Hah humbug! Redoux

To the guy who made the comment that the various hyperscalers doesn't use arrays. You've never walked over if their data centers. There is much more enterprise arrays in these properties than you think, especially for critical datasets.

NinWalker

Naselus

I don't believe that Andy is making a protocol/transport comment, he is making an architecture comment. Most storage arrays doesn't increase connectivity (eg number of clients that can access the same dataset at linear high performance increase) with capacity. Even much of available scale out doesnt behave this way and relies on partitioning schemes or gateway schemes to transfer io's to a master node for that part of the name space. Ideal would be if the network path of the system is sufficiently abstracted and virtualized so that adding underlying resources increases bisectional bandwidth and connections linearly to the entire dataset all the way from the client and not only behind the entry point to the system. Linearly scalable front end and backends that parallelize sufficiently for efficient resource utilization without hot spots is hard both for old school numa computing systems as well as for scalable storage.

Or at least this is what I think is the argument the author is trying to make.