Time to increase the budget
for NSA, GCHQ etc.
Who else needs that sort of storage?
Fujitsu has got itself a 50PB-plus scale-out array for very big data, the CD10000. Properly named the ETERNUS CD10000 Hyperscale Storage System, the box uses Ceph, open source software that presents file, block and object storage from a distributed cluster of object storage nodes across which data is striped. Theoretically, …
You'd be surprised. Many web companies have more than that already. Also the nature of storage management within the traditional enterprise is changing. There are hundreds of UK based orgs who have 1Pb plus of distributed and disparate storage silos (file & block, object & cloud, on multiple vendors technology, at different stages in their life-cycle). If someone could unify and consolidate that, not only would it drastically simplify data management and it would significantly reduce costs, but it would enable more people to access, more relevant data and make quicker and more informed business decisions. Think if the analytics angle too, much easier to run Hadoop if all the data is managed centrally. You also don't start with 50Pb, you start with a few hundred TB and scale, knowing that your'e covered. In the traditional enterprise file storage is growing at 50% per annum (block about 35%), It won't take long for a 500TB org to pass 2 PB (about 3/3.5 years). Think about it.
How do you back up that much storage?
"The system’s usable capacity depends upon the number of data replicas, two or three for example, set up to protect against data loss."
Data replicas, assuming they're on the same hardware, don't protect against a fire in the DC, or a few burly gentlemen on a dark and stormy night with sledgehammers and a Transit van.
These systems have replicas and for serious outfits they have multiple geographies with rules in place to make sure there are always replicas in multiple geographies. If the burly men can be in several places with sledgehammers and transit vans, you are in big enough trouble that backups aren't a concern. A good lawyer or a fake passport are in order.
Not having RAID does not magically protect you from disk failures; you are still going to have significant rebuild workloads as the array tries to re-protect you from a failed disk. And once you factor in protection; that 50PB rapidly becomes 25PB or less; depending how many times you store that data.
Ceph is also not the core storage or even the core object store for OpenStack; something that Fujitsu were corrected on at the press launch. Yes, we do see it commonly implemented with OpenStack but it makes it sound like is part of OpenStack.