Not just useful for data centres
I've had similar thoughts about my little home network. I've got a bunch of Raspberry Pi boards, each with its own external USB drive. I use them mainly for archival data and I've wired up an old PC power supply to some GPIO and a little circuit board so that I can use a software trigger to turn on the drives when needed.
It's nice giving the Pis something undemanding to do, like basic NAS duties, but I've wondered what else they could be doing when the power to the drives is switched on. Some ideas:
* periodically run something like shatag to calculate hashes for any changed files
* feed those reports into a central database that can do on-disk file de-duplication or identify files that only exist on one drive (and thus need more replicas)
* examine archived tar, zip, arj, etc. files and generate table of contents (and perhaps feed into hash database)
* re-compressing some files so that they take up the least amount of space
* full-text indexing on any documents found
* frequency analysis on documents (then ignore all frequent English words to get an idea of the specific terminology used in that doc; collate to try to cluster similar documents together)
* collect and collate metadata from selected file types (eg, video, audio files) or dir types (eg, git repositories)
I can see where the map-reduce model could be quite useful in several of the above. With Pis you don't have a lot of CPU or RAM to run, eg, complex databases, but they can definitely do plenty of pre-processing work and have a beefier machine collate the results.