Re: For corruption checking
" For corruption checking a 32-bit CRC is sufficient."
That depends on how many files you have to check. I once got 195 false positives for 1.32E6 files with CRC32 (same CRC32, but the different filesize flagged them).
IOW 1.32E6 apparently is already a significant fraction of 2^32 to have collisions by pure chance. If somebody could show the math that this wasn't a freak accident but about to be expected I would be grateful.
I repeated the calculation with MD5 (which took only 5 min longer for a total runtime of about 3 hours) and I got no false positives this time. So I learned that using CRC32 for speed reasons was a false assumption in my case.
I agree that CRC32 + filesize might be sufficient, if no malicious cause can be assumed.
BTW: My task was to find duplicate document scans resulting from a user error.