As below, so above.
I was over at my parents place this weekend. Dad is trying to organize his "documents" - which in this case turns out to mean a collection of files going back to the windows 3.11 days that have been copied, zipped, copied, renamed, put on an external hard drive, copied to a new computer, zipped again, renamed again, copied to a NAS, backed up from the NAS to another machine, renamed again* ... and so on and so on for about 30 years.
An entire afternoon's digging left me with the impression that in about 600G of data spread across no less than 15 separate disks, mostly various sizes of spinny - some of which even 40 pin IDE** - there was probably about 5G of actual stuff the rest being either duplicates or desperately irrelevant garbage like the drivers for a scanner that he'd owned 15 years ago that had somehow ended up in the "Family Photos (2) - Copy" directory. Probably twice.
I tried all sorts of things that I thought would be clever to automate cleaning this up, but after a couple of hours messing around came to the conclusion that the only way to clean this mess up is with the Mk1 eyeball and some common sense. It's going to take a human with some context to work out what the hell to do with all of this, and even then it's pretty much inevitable that something that should not be lost is going to get lost.
We like to think that digital data will exist for ever and that all the problems of duplication-degredation and physical decomposition don't apply. We are wrong.
In short, none of this should come as a surprise to anyone, and if they're serious about cleaning this up the only realistic answer is an absolute army of civil servants doing it by hand. It'll take decades, and by the time they're done, whatever format they've converted it all into will be obsolete.
* lost, found, and finally recycled into soft compost.
** and finding a working IDE card to read those was a challenge let me tell you.