Post a comment link missing on several articles.
Okay, not a relevant post to this thread, but I have had 404 on links and missing 'Post a Comment' buttons on El Reg today. Is the hyperlink bloke on leave this week?
HP will resell Seagate's Lustre-ous ClusterStor 1500 and 9000 high-performance computing arrays, while Seagate is adding IBM's Spectrum Scale parallel file system to ClusterStor alongside Lustre. These scale-out HPC arrays run the Lustre parallel file system and are OEM'd by Cray, as Sonexion storage, and SGI. The ClusterStors …
These things are parallel filesystems for HPC an Big Data workloads.
I suppose you could say they are NAS on steroids.
The architecture of these things is that you have metadata controllers, which look after, well, the metadata, and object storage targets. The OSTs store the files.
A lot of attention is paid to having no bottlenecks - ie there can be multiple metadata controllers. Similarly you add more OSTs to add capacity.
Connection to these things is typically over Infiniband also.
Lustre can also integrate with a Hadoop cluster these days too.
So no, not a direct drop in for a 3Par.
But definitely what you need if your company is handling vast volumes of data.
HPC parallel file systems operate under different sorts of reliability and even performance constraints than enterprise or even SMB storage. You'll find Lustre especially, much less reliable in the medium to long term and it has terrible, terrible transactional performance. GPFS isn't quite as bad in either department, but I wouldn't trust Seagate to implement it well.
Also Lustre lacks almost all of the functionality that has become expected for enterprise and SMB storage, like snapshots and replication. Both 3PAR and MSA are far superior products unless you have the need to dump/load very large files, very very fast and don't really care what happens to them when you're done with the high IO processing. This is purely a niche play for HP so they can participate in HPC bids that require Lustre or GPFS in the RFP.
Huh, the days when academic storage environments could get along with lower reliability/availability figures are over! Today the cost of not being able to store or reproduce one-off experimental data is huge, not speaking of the reputation loss. Imagine telling a researcher that his data is lost, his samples - $50.000 a piece - were destroyed in the experiment, his residency time is over and the next available time slot is in two years? Do that three times and you're out of (funded) business!
The storage piece is not academic research, it is a strict service provider activity. Organisations beyond the do-it-yourself size protect their business with advanced redundancy techniques, like erasure-coded data rebuild without tangible impact on write or read performance. Few enterprise arrays can do that! This stuff comes native with Spectrum Scale, don't know about Lustre. [http://bit.ly/1WJ0M4o - last slide #39, impact of repair]
And while the first "pod" of a parallel file system might be harder to set up than a NAS box, that relationship reverses at larger scale: The parallel file system will still be a single instance ("single namespace, wide metadata"), while the countless NAS or SAN appliances will require substantial migration and balancing effort to replace old tech or eliminate organisational hot spots. Which is why "wide metadata" is so important, we're not talking of clustered NAS. Of course if you don't have scale out plans in the beginning, you wouldn't think that way. I'm pretty sure this happened to more than one NAS buyer.
Biting the hand that feeds IT © 1998–2021