"read up the specs on your favorite hard drives and look at the specs for error rates. "

Many people are fully aware of the failure rates of hard drives. That's one of the reasons that some people choose to use disk mirroring, or some flavour of RAID: to protect valuable data against random hardware failure. You usually can't predict exactly when a hardware failure will happen but you can be confident that one will happen sooner or later.

"if the hash collision was even more unlikely [...]"

If the de-dupe system doesn't handle hash collisions in a sensible way, and someone is unlucky enough to get a collision, there is no recovery, the original data is gone forever and the alleged data on "disk" will be wrong forever with no recovery, whereas if someone gets an inevitable actual disk error, it doesn't need to matter, precautions are easy (RAID etc) and in case of desperation there are data recovery services that can sometimes help.

You're comparing apples with oranges, random hardware problems with "defective by design" systems.

Badly done de-dup may work for Google's search engine data and a handful of other applications where nobody cares much if the results are a bit (or a lot) wrong occasionally.

Badly done de-dup should not be relevant in the run of the mill applications where it matters that the data you get back from your storage subsystem really is the data you originally put in.

