This storage startup dedupes what to do what? How?


By your logic, git having less entropy in the source should imply less entropy in the output hashes, which would in turn imply more collisions.

No, not at all.

The set of possible inputs which could generate a given hash is much reduced by the requirement that the inpout in question is source code (with a limited character set). The same cannot be said for the general case of a file.

this risk is "vanishingly small enough"

And I'm, saying that, in the general case, this is simply not true. Only if you constrain your input set - such as is the case with git - can you make it so.


