Reply to post:

This storage startup dedupes what to do what? How?


OK, so I'm going back on my promise to not write any more

Well then, you're going to have to start doing your own research. This is the last time I'm going to spoon-feed you.

For your argument to work you'd have to explain why subspacing preferentially removes more colliding inputs, proportionately speaking.

We don't care about "proportionally", we care about "absolutely".

For a given hash & block size, there are a finite number of blocks that will cause collisions in a given hash. By removing some of that finite set, we have fewer potentials for collision. It is that simple.

It is clear that collisions are a problem in the general case - we know the number of collisions for any particular block by way of the Counting Argument. The only way this cannot be improved by llimiting the input set to textual data is if the hash were designed to maintain the same bit spectrum for similar input data. This would make it a very poor hash. And so, the number of possible colliding blocks is necessarily reduced by limiting the inpuit dataset. LIke I said.

And from this point on, you're on your own.


POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon


Biting the hand that feeds IT © 1998–2020