MTBF?
What's the MTBF for any one of the individual drives in the array failing?
(conjures up picture of valves popping at Bletchley Park,.
It must surely be some sort of landmark? Object storage supplier Cleversafe now has a 10-exabyte system customers can buy. Cleversafe reckons the largest single storage box out there is a 3.6EB tape library, SpectraLogic's T-Finity. It now has a tape-beating system for storing data, it would argue, because it provides online …
I think you might be thinking of ENIAC in its early days when it's uptime was approximately 50%. That was remedied by the same means adopted at Bletchley which was to not turn the machines off, obviating thermal stresses on the heater filaments.
By the same expedient, not spinning down arrays of drives will prolong the lives of their bearings and motor drives. I suspect having a decent filing system which manages drive failure will also be on the cards. ;-)
I read quite a few years ago about a report by (I think) Fujitsu.
The report concluded that about 70% of all data (in all forms - computer, paper, whatever) was effectively useless because it was either (a) wrong/incomplete or (b) uneconomic to retrieve in a useable way.
I wonder if this is still true? I'm suspecting, at best, it isn't far off...
Care to put money on that claim? That's approaching 8 orders of magnitude above current capacities, or about 23 doublings. Depending on whether you take the doubling period of Moore's law at 18 months or two years, that's still somewhere in the 35-50 year range.
Of course if you want to be ambitious, 10 Exabytes is 10^16. If we manage to store 1 bit per silicon atom, that's around 10^17 atoms (before we consider any form of access circuitry). 10^17 silicon atoms has a mass of a bit under 2 milligrams, so some might argue it isn't completely impossible, but that's about the only grounds I could give for it.
> a "portable datacenter", containing 21 racks with 189 nodes and 45 3TB drives per node
So that's 21 racks each with 9 x 4U servers holding 45 drives each.
As it happens, Backblaze have published the details of how they build 4U servers containing exactly 45 drives. Details in the links from here:
http://venturebeat.com/2011/07/20/diy-secrets-of-creating-a-cloud-storage-farm-revealed-by-backblaze/
If you're buying the custom cases in that sort of quantity you should get a good price. Then stick Gluster or Openstack Swift on top of it, and away you go.
I think it was Sun that first sold 4U storage servers with disks "standing" up. Normally, disks are inserted into the front, but Sun storage servers are lowered down from above, by removing the top lid. Therefore x4500 could have 48 disks in 4U.
Normal 4U storage chassis, such as the Norco 4224 disk chassi which is 4U, only holds 24 disks. Inserted into the front.
Later, other vendors copied Suns design. Backblaze holds 45 disk in 4U. There is another vendor that holds 60 disks in 4U chassis.
.
.
Regarding this 10 Exabyte, there are several problems. First, the data will rot, and there will be lot of data corruption. RAM sticks have data corruption caused by cosmic radiation, current spikes, etc - therefore ECC RAM is necessary to detect flipped bits and correct them. The same bit rot will occur on all these disks, so you need checksums to detect and correct randomly flipped bits.
Second, 10 Exabyte is too much for most filesystems, such as BtrFS, because they are 64 bits. You need 128 bit filesystems to handle 10 EB in one single namespace.
Incidentally, ZFS solves both of these problems above.
If you were a hollywood studio who takes their assets seriously and you wanted to scan film into a format that was as good as it gets today knowing it's not good enough tomorrow, you'd use a modern film scanner capable of 8K storage. At 8K (which is about 32 megapixels) a 35mm film frame can actually be scanned without major problems related to "Modulation Transfer Function" and therefore is a good level for mastering. Here's some math :
8k = 8192x4320 resolution or..
35,389,440 pixels.
1.5 bytes per channel for 3 channels of color is 159,252,480 bytes per picture.
48 frames per second (modern) is 7,644,119,040 bytes per second
60 seconds in a minute, 60 minutes in an hour, therefore requiring 27,518,828,544,000 bytes per hour.
If assuming 2.5 hours of film are shot per single hour of produced film, the masters will require 68,797,071,360,000 bytes per hour of final film.
But since you will keep the master AND the final footage, that's 96,315,899,904,000 bytes per hour of master + final footage.
Assuming the film is 2 hours in length, you'd need 192,631,799,808,000 bytes for storing a single hollywood motion picture for archival purposes properly.
Now figure that one copy is useless since it costs to much to redo every time you need it, therefore you want two copies. That's 385,263,599,616,000 bytes per film.
So, to make the numbers a little more managable, let's convert that to terabytes by diving by 2 to the 40th power. That's 350.39 terabytes..
Therefore, 3 two hour films can be stored per petabyte or 3000 films per exabyte.
The list of Warner Bros films Wikipedia knows about is about 1850. Meaning that you could perform proper film scan of every Warner Bros film every made... almost times two per Exabyte. Probably times 4 if you consider the lower frame rates in the past. And you'd also have room left over for all the audio tracks and projects involved.
Still, consider that there are a lot of movie studios in the world and there are a lot of films. In Sweden, there's a movie series about a wrinkly old detective that always wears the same gray overcoat which has probably 900 films in the series alone :)
In the future, we should strive to make proper archives of our films, music, photos and other cultural treasures (using the word on a majorly varying scale) and then properly store them into the future.