Reply to post: Please read the FAST paper for context

Crowd-sourcing interpretation of IBM RAID 5 extension paper

jimplank

Please read the FAST paper for context

Hi Chris -- this is Jim Plank, from the University of Tennessee. I know a few of your readers have posted this already, but I'll reaffirm. The best way to put this paper into context is to read our FAST paper, "SD Codes: Erasure Codes Designed for How Storage Systems Really Fail". The FAST web site has the paper, talk slides and talk video all posted at https://www.usenix.org/conference/fast13/tech-schedule/fast-13-program

The idea is to design erasure codes that tolerate the loss of entire disks, plus additional sectors, by devoting m disks and s sectors per stripe to coding. This is as opposed to RAID 6, which devotes two whole disks to coding, or Reed-Solomon coding, which devotes m out of n whole disks to coding. The intent of the SD codes is to devote less space to coding, but still to tolerate useful failure scenarios (m disks + s sectors). Whether or not SD codes are truly useful, of course, depends on a lot of factors, which many of your readers have made very insightful comments on. However, the research that Mario and I have been doing it so present them to the storage community so that they may be considered as useful alternatives to standard erasure codes.

Reed-Solomon codes have a general construction that makes them really practical. Unfortunately, the SD codes only have general constructions in limited cases (s = 1). Mario proved those in his PMDS paper, which, like this TR, is also a difficult read. For the other cases, I threw hardware at the problem and empirically "discovered" SD codes for them -- those are summarized in Figure 4 from the FAST paper.

What Mario has done in the tech report that you're reading, is derive a general construction for the case when m=1 and s=2, and proving that it's SD. It's not an easy read, but for good reasons, Mario wants to document it, which is why he's written the tech report. I don't really think he meant for anyone to read it, except for when we cite it either in the journal follow-on, or in the software that I'll post. Perhaps a coding theorist may want to corroborate or consult the proof.

A follow-up to the FAST paper has been recommended to ACM Transactions on Storage, which we're submitting tomorrow. If you want, I'll make a UTK TR out of it and post it next week. In it, Mario has proven constructions for m=1, s=2 and m=2, s=2. I've derived a construction for m=3, s=2, but Mario hasn't proved that it's SD yet, but I've demonstrated that it works for all the cases I've tested. We don't include the proofs in the paper (they'll probably be put into an IBM TR -- ha ha), because they make the paper too hard to read.

I know Mario's work is hard to read, but the work he's done with this (especially these follow-on constructions and proofs) is brilliant. It has been a sheer pleasure working with him on it, and I'm looking forward to continuing our collaboration on coding & storage.

Sincerely -- Jim Plank

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon