Re: Does anyone have stats on how often hash collisions occur in zfs or netapp?
Few of my friends in the Netapp's dedup team used to mention that it is extremely rare to hit hash collisions and then fail to dedup the blocks. Still, Netapp didn't trust the finger prints (hashes). If we believe in the hashes and dedup two totally different blocks, just because hashes are a match, it leads to corruption of file system (A file content corruption in which a file system would never be able to recover from and only application knows that the file is corrupted)
But, I also hear the other side of the argument. As it is extremely rare to hit the case "hashes match but not block's content" and probability of hitting that case is smaller than the probability of disk errors, it is OK to trust the fingerprints/hashes. But, I don't believe in that argument as file systems can usually recover from disk failures. But, an induced corruption just because you believed in fingerprints is not.