Reply to post:

This storage startup dedupes what to do what? How?


By your logic, git having less entropy in the source should imply less entropy in the output hashes, which would in turn imply more collisions.

No, not at all.

The set of possible inputs which could generate a given hash is much reduced by the requirement that the inpout in question is source code (with a limited character set). The same cannot be said for the general case of a file.

this risk is "vanishingly small enough"

And I'm, saying that, in the general case, this is simply not true. Only if you constrain your input set - such as is the case with git - can you make it so.


POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon


Biting the hand that feeds IT © 1998–2020