Reply to post: Re: I'm a little confused

This storage startup dedupes what to do what? How?

Frumious Bandersnatch Silver badge

Re: I'm a little confused

The analogy did put me in mind of dictionary-based compression (including LZW), too, as well as Merkle trees (for the sync problem).

The idea of re-using dictionaries sounds good, but you need good algorithms and data structures to make it tractable. You can't just blindly try compressing all blocks against all dictionaries to find the best compression ratio. That would take forever. I think that some clever tree-based/recursive construction (probably along with Bloom filters or maybe Rice-Golomb encoding) and some dynamic programming could make this work.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon


Biting the hand that feeds IT © 1998–2020