Re: Not Magic, Just Maths
here's the text from a comment in a The Register forums from last year - courtesy of "foo_bar_baz" - as far as I understand things, he nails the explanation....
"Internet commenting is always entertaining, statements and declarations based on guesswork and conjecture. Let me indulge in some guesswork myself. Maybe the "rebuild" time is not about rebuilding a single disk, but about rebuilding the entire storage system from "degraded" to "healthy".
Let's say your data is distributed across 100 storage nodes. Any one chunk of data in a "healthy" system is stored on n nodes where 100 > n > 2. One node dies, so the array is now degraded. To "rebuild" the system you just have to copy the chunks to free space on a sufficient number of nodes to satisfy the above requirement. Given that GPFS gets its high performance from storing files in small chunks across many nodes (not just 2 as in RAID1), it follows that rebuilding is also very fast. "Rebuilding" does not even necessarily have to involve replacing the broken node with a new one."
Given that GPFS has been around as a product since 1998 (and has heritage back to "Tiger Shark" in 1993), I think it's unlikely that IBM is "following the Isilon data lake play" (with that company being founded in 2001) - although I see that kind of assumption a lot.... ;-)