Reply to post:

Crowd-sourcing interpretation of IBM RAID 5 extension paper

PeteMaths

So the first thing to understand is a simple error-correcting code. If you want to store some binary string, the simplest thing you can do is just store the string. If your storage is faulty, one failure of even one bit loses everything (or at least you have to do something clever to recover). So you could try to get round this for example by saving the same data three times. Now one bit failure can be corrected, since you can take the majority vote on each bit - in fact like this you can correct many bit failures, provided bit number i doesn't fail in more than one of the copies. But in practice if you think many bits may fail, then it's likely that for some i, bit i does fail in more than one copy and you lose everything. Usually you assume bit failures occur independently at random, and you want to be 99.99% sure that you will not lose data (or something similar, in practice with more nines). This save three times scheme isn't great protection (the bit failure probability has to be very small or you lose data) and it takes a lot of space. You can do better by doing something clever: using some `code' which transforms your binary string into some longer codeword in a more complicated way, which has the property that any two codewords are very different: to get from one to the other you have to change a lot of bits, so after a small number of failures you can still recognise that there was only one possible codeword.

This works if you can find a set of codewords. But it's practically useless if the only way you can do it is some brute-force search: then your computer takes a century (or more) to set up your storage system. So you need some clever construction that can be computed fast, usually this is from some piece of linear algebra over finite fields. What you want is to minimise both the disc space used (given the failure protection you want) and the time taken to code and decode. Various sorts of clever constructions are well known these days; the problem is essentially solved.

Now there is a new problem: your failures are not any more independent. If a disc fails, you lose the entire disc. If a sector fails, then other sectors on the disc are unaffected but the sector is gone. The classic codes don't work well (or at least they aren't known to work well) in this sort of model, because they would assume that failure of lots of bits is unlikely wherever those bits are and place an entire codeword in one sector of one disc - you lose the sector and the data is gone. You need a code which will cope well with this sort of thing. RAID 5 does this to a certain limited extent, but maybe even that is not good enough in (some) applications. Because you put lots of extra requirements on your code, you have to be even more clever with coming up with a code. Before this paper, there was RAID 6 (which is a more clever version of the save multiple times idea) and some brute force schemes which showed that it is possible to have such codes but which were practically useless. This paper gives something you could use in practice; it's not too wasteful and the construction is farly quick. But you would only do so if you complain about RAID 5 failures often, and this paper's method only gives you a moderate amount more protection, so it won't help you much. What you would really like is a more general scheme that lets you specify how many discs you can allow to fail before data loss. That doesn't exist yet.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon