* Posts by @jgilmart

4 publicly visible posts • joined 12 Apr 2012

Does it make sense to use shared storage with Hadoop?

Thumb Up

Re: Does it make sense to use shared storage with Hadoop?

As Hadoop gains traction, we've noticed that data growth in HDFS often exceeds compute requirements. Scaling out compute and disk as a single unit can lead to stranded (and thus wasted) CPU cycles. Many organizations are now looking for ways to scale compute and storage independently, often with very dense compute infrastructures, without sacrificing the simplicity, performance, or cost profile of local disk. Plus, shared storage can lead to reduced recovery times in the case of compute failures and reduce the replication overhead.

With Ethernet SAN, we can provide the independent scaling and data protection without sacrificing simplicity or performance. We use commodity hardware, a parallel scale out design, and a simple interface to provide all the simplicity and performance of DAS but with the benefits of shared disk. As Hadoop becomes increasingly adopted in the enterprise and data ingest volumes grow, I expect the push for this kind of "virtual DAS" model will grow.

-John Gilmartin, Coraid

Running applications in the storage array


This is definitely an interesting question, but it's hard to predict how this evolves. As the previous poster points out, there is an historical pendulum in system designs that swings between integrated and modular. For some applications, such as Hadoop, this could be the answer, but there are many applications that don't have the same requirements profile for performance, capacity, and cost. A big challenge in all this will be in orchestrating applications and infrastructure to ensure that compute and data are at the right place at the right time in the right ratios. Applications change and grow constantly, and we need systems that can dynamically adapt.

Sony optical disk archive


The challenges in archiving data are unique, so perhaps there is a market there. But on the surface, it seems like disks have a number of advantages (price and capacity trends, always online, etc.) that would make them preferable. There are plenty of file systems (e.g., ZFS) that can continuously monitor for bit rot to address the concerns about physical media degradation.

Are SANS right for converged stacks


Re: Are SANS right for converged stacks


When it comes to traditional SAN infrastructure, I absolutely agree. You're adding unnecessary cost and complexity into the overall system. The resulting solution is inflexible and difficult to scale.

However, there is still value in separating compute from storage: independent scaling, shared access, data protection with lower overhead, etc. The trick is to make that shared storage as simple and as cost effective to deploy and manage as local disk while retaining all the benefits of the network. You can't do it with FC, FCoE, or iSCSI. The key ingredients are Ethernet SAN, a scale out architecture, and off-the-shelf hardware to turn shared disks into "virtual DAS" with low latency, massively parallel acces from every compute node.

John Gilmartin, Coraid