I've worked with a lot of SAN disk, at a lot of firms. I've deployed both DeDupe primary storage as well as backup systems that dedupe data on the fly. Block and file level dedupe.
The average savings, for firms on SAN (local storage and DASD is just so cheap it's not worth the cost of the fiber controllers, let alone $10K/TB SAN licensed disk to have it deduped), is about 30%. Yep, not 7:1, 1.3:1.
Fact is, when we're talking TBs of data, we're talking record data, scanned images, e-mail, etc; its highly unique. The number of copies of data circulated that are true duplicates (any people same file) are typically sent via e-mail (which natively dedupes, except when crossing servers, and even Exchange 2010 does that now cross-server. A large enterprise with a DRM system sends links to start with, not raw files. Most of the duplication is on local workstation HDDs, not servers.
30% savings is still significant, when we're talking multi-dozen TB savings, especially in SAN, but the license cost of SAN to start with, vs tying together a bunch of smaller, less advanced, decicated disk clusters (server or node specific) is so much higher, it generally eats the cost savings. You buy SAN for IOPS, and for reliability, not because dedupe can save you money. Adding the dedupe licensing on top of the SAN costs most often rarely breaks even, except for user oriented file systems.
Since e-mail already dedupes, DRM already does as well, and most backup systems too (in the D2D backup world, not so much tape) the cost increase to add dedupe across an entire SAN is not often worth it. For this reason, I've been suggesting most of my clients to buy 2 or more SAN infrastructures, a high performance high reliability SAN for VMs, databases, and app servers, which have little to dedupe, and then getting a smaller SAN without the advanced performance, but with dedupe, to use on file servers hosting user data.
If the license cost of dedupe was cheap, say 5% extra, or better yet a function of the disk savings (save 30% disk, pay 10% more, but only save 3% disk, pay 1% more) then I could more openly agree.
In MASS systems, and especially where you're dealing wit storing mostly identical documents, it works, but most of tyhose are moving to "document on demand" models, and onyl storing the uinique differences in the database to start with, plus a few scanned bits like signature lines, and they re-build the doc when it;s requested, and don't store the whole thing locally anyway. This saved the SAN based dedupe technology for a lot of people.