Reply to post: Like Vic said...

This storage startup dedupes what to do what? How?

Anonymous Coward
Anonymous Coward

Like Vic said...

If the hash is different between two blocks, you know (100% certainty) that the blocks are different.

If the hash is the same for two blocks, you have no certainty whether the data is the same in the two blocks.

To be certain whether the data is the same you have to look at the data. All of it - or at least until you find a difference, at which point you know the blocks are different. If you reach the end of a block without finding a difference, THEN you *know* the blocks are the same.

Any claim beyond that is statistical mumbo jumbo or vendor hype or both.

Hash tables themselves are old technology, dating back at least to the era when 64kB was all a computer could address. Assuming the storage industry hasn't redefined the meaning of hashing, of course.

Hash tables were primarily used for fast lookup - do I already have this value in my table, or (more usually) not have it. There's plenty of stuff out there describing how they work in general.

Knowledge of the hash value and the hash algorithm is not in general sufficient to re-create the input data.

Hashing has been repurposed by the de-dup industry - do I already have a matching chunk of data in the store? If the hash is different, as it frequently will be, it is 100% certain that this data block is not there already (otherwise the hash would exist already in the look up table), so I have to write it to persistent storage, enter the hash value in a lookup table, and make a note of where I wrote it so I can read the data back later when something else has that hash value (yes I know that's massively oversimplified, but it doesn't matter here).

If the hash is the same, the matching data *may* be there already, but only a fool would think that they *know* the data is the same, or that they can re-use the existing data without fear simply because the hash matches. Sadly the IT industry has no shortage of fools.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

SUBSCRIBE TO OUR WEEKLY TECH NEWSLETTER

Biting the hand that feeds IT © 1998–2020