* Posts by marc.villemade

1 publicly visible post • joined 16 Aug 2012

Is Object storage really appropriate for 100+ PB stores?


Re: Is Object storage really appropriate for 100+ PB stores?

Hey Chris,

[Disclaimer for others than Chris: I work for Scality, Object Storage vendor]

I think there is a need out there for object storage. The problem is that I think it's hard to contradict that argument if you're focused on that technology (ie. with one of the vendors), but if you're not, it's much easier to dismiss it completely. My point is that there ARE projects out there in the hundreds of PBs. For those going to Tape (Like CERN or NCSA), I'm not entirely sure if it's because of a lack of good marketing/sales from the Object Storage vendors or because tape has a real competitive advantage (Technically, or financially, or both).

I am almost convinced that it's the former. And I think that i'm not the only one in the Object Storage world, as I pointed out in our blog: http://www.scality.com/the-object-storage-summit/ - Most of us object storage vendors see the way people perceive our technology as something that could be improved. A lot.

I am convinced that Object-based storage is the future for unstructured data. One of the point you're bringing in your article is that "The El Reg storage desk is cynical because no company is actually storing hundreds of petabytes of data in billions of files inside a single namespace across hundreds of data centres."

It is true that there are no publicly known storage infrastructure that are spread across hundreds of sites. But the reason for this is that i think it will never really happen. "Data Gravity" from Dave Mccrory is completely at play here. Applications can be moved easier than data. But we can also only replicate subsets of data.

My vision of this is that we will see a lot of large data stores replicated across a few sites for redundancy, failover and whatnot. For processing the data, workloads will be moved to those sites; for data access, smaller sets of data will be replicated to other sites closer to the end-user to improve latency to the end user. This will involve a lot of new technology in terms of usage patterns and analysis to make sure that the data is there when the user wants it. I hate to bring it it here because it's really over-hyped these days, but some aspects of Big Data will be involved in this to move the right data in advance of users actually requesting it.

We don't have this technology yet. However, there is no doubt that we have the technology to "store hundreds of petabytes of data in billions of files inside a single namespace" with Object Storage today . These will grow to hundreds of ExaBytes in the next 10 years, and Technology will be built up on top of it to make it available to the application and/or end-users in the best way possible (moving workloads or data subsets, to optimize latency, security, bandwidth, processing power ....).

It is really going to be an incredible journey to go there from here, and such an interesting time to be in the industry.

I'd be happy to hear what you think about my rant ;)