Is the Store Once Catalyst/B6200 8-node cluster a single system?
Well Registerites, with its 4 separate dedupe realms, is it a single system?
It is probably one more example of Benchmarks all vendors indulge in to outsmart the others.
I am saying this because, I wonder how many clients will need a Dedup System that can do 100 TB/Hr.
Having said that, I read about comments in this article questioning "Single Namespace, Multiple Indexes" in this HP Benchmark vis-a-vis other benchmarks including that of EMC's latest one where they have "Single Namespace, single / Global Index"...
I believe there is a merit / advantage in having "Multiple dedup Indexes" vis-a-vis one Single Large Global Index, especially as we are seeing more and more of these massive Dedup / Disk based Backup Systems backing up multitudes of Terabytes.
Back Then when we have limited capacities on Nodes & people needed a sort of "Federated Multi-Node" solution for more storage, having a Single Namespace & Global Index was a logical choice.
Today massive storage capacity on each node (B6200 max out at 192TB Raw per Couplet, that's really big) & that translate to 768TB Raw on 4 Couplet... What would be the performance if there is a Single Global Index ? I presume it doesn't need second guessing.
There are other downside of have one single large Global indexes. You can use little or no tricks to speed up specific backup jobs...
So, somehow I find this particular HP Benchmark do make Technical Sense, though practically there mat not be too many customers needing 768TB on a single namespace doing 100TB/Hr.
Many insensible benchmarks by many vendors come to my mind, not necessarily limited to Storage, but mostly in Systems benchmarks space..
1. Most recent is the VMAX-40K Mamoth with 2400 Drives
2. HP & IBM started the "TPC-C" benchmark competition way back -- Trying to out-do each other as to who can do the most millions of tpmc ona Single System/Single OS.
---- Probably if they partitioned their Superdomes and p595/p795 & achieved higher tpmC aggregates across multiple partitions, just benckmarks would have had real life relevance
3. So, when one vendor commits a crime another is infected too & Oracle does a 30million tpmC benchmark across a cluster of 27 SPARC servers -- TPC is not HPC, so why do a transactional workload benchmark on a massive SuperCluster..
List of such gross in-sensibilities is long. We users (who buy and run business on these technologies) have come to a point where where taking a holistic view of these.
HP recommend that you keep your different datatypes together in separate VTLs as this improves the dedupe rates. So I think that they can probably argue it's all one unit based on that, if it were supposed to all be in one big storage pool and actually you had to have four because of the nature of the software/hardware I would say it's separate units.
How do the two systems compare price-wise? If they cost about the same, and the HP system has a single addressable namespace, then I'd say that EMC is splitting hairs. If the HP system is 4x the cost, then you're really buying four different systems and slapping a manager in front of them.
This post has been deleted by its author
But is global dedupe inherently useful?
Do you want to check your HR policy documents against your accounts receivable data? You might pick up some speed increases by not doing that.
As data volumes increase, so does the benefit from applying intelligence rather than mass-processing.
It will probably depend on your own particular circumstances as to whether you want to use machine time or human time.
Mark Twomey here.
And just to be clear, EMC does offer Enterprise Manager for Data Domain which can manage 20 appliances from a single pane of glass.
I'd rather marketing not claim 620TB/hr throughput in such a case, but since HP appear to have decided it would be okay to do so...
Well according to HP quick specs the above allows management and monitoring of up to 400 D2D appliances, so using your logic, the marketing number based on managing multiple B6200 systems from a single console would be 40,000TB/Hr for HP vs EMC DD with 620TB/Hr.
But the point is these systems use totally different architectures, B6200 takes a scale out approach and offers high availability within a couplet, whereas EMC still seem to be stuck on a single controller, which I would think greatly simplifies the task of providing global dedupe. Yet both manage to provide very similar dedupe ratios and given the market, I would think have similar price points, so speed looks to be a good comaprison point.
Regardless of the underlying architecture, if it's ordered, managed and maintained as a single system then I don't see the problem. I suppose if it's that simple EMC could just strap a few DD boxes together. It's not like this system has just launched either, it appears to be the fact that this new catalyst API has allowed HP to leapfrog EMC's highest performing, and just launched DD box, by a very wide margin that seems to be the issue.
This post has been deleted by its author
I work for IBM.
We make dedupe products as well, although reading the Register you'd never know it.
We'll just sit back and watch these two marginal players (HP and Sepaton) knock each other about and while they're busy chipping their credibility to pieces in public we'll concentrate on beating the real enterprise competition - ie EMC/DD.
HP had a big storage event in Las Vegas last week and made some dedupe claims. Sepaton had its own dedupe news last week, so it seems reasonable enough to cover both companies (and EMC's rejoinder to HP here).
We'll be sure to pick up on EMC, IBM when they have news to share with world.
"....We make dedupe products as well...." Oh, is that a storage product you actually make for yourself, or just another one you badge from NetApp?
"....We'll just sit back and watch....." I doubt anyone in IBM's storage division is sitting back at all, they're all busy running round tryng to find a way to spin the news that their marketshare has declined again, and NetApp have overtaken them for the number two spot (http://www.theregister.co.uk/2012/06/08/idc_storage_tracker_q1_2012/). And that's still including the external storage that is really just badged NetApp products.
Have to wonder where IBM comes in the disk storage stakes when hp is allowed to count its NAS and Lefthand systems, which are not counted as external storage by IDC as they are server-based....
It looks like Sepaton don't understand some of the fundamental operations of the StorOnce. You shouldn't multiplex data into a StorOnce because it screws with the dedupe algorithm. This is because the device is smart enough to identify data types and adapt to them, if you're multiplexing data into it, it's not as efficient at learning your data. Now, this isn't actually a problem as you can present loads of virtual tape drives to your software and allow this to make up for the bandwidth lost. HP even advise you to make a separate virtual tape library - effectively a dedupe domain - for each datatype you backup, for maximum dedupe efficiency.