
Data Centre
That shiny new data centre wasn't built exclusively for the HPC crowd. Lots of other University people have, or will have, kit in the building.
Cambridge University has been involved in high performance computing for 18 years, starting with a “traditional” supercomputer, before a major restructuring eight years later led to what would now be considered high-performance computing (HPC). We read much about how IT needs to become a profit not a cost center. Well, as part …
Well, with ATLAS, f'rinstance, producing raw data at 1PetaByte/s (after zero-suppression), you can forget storing anywhere near all the data from major scientific experiments for a long time into the future, even if storage reaches "commoditised" prices (I thought it already had, but never mind).
So, by extension, we'll be "throwing away" data pre-determined as "uninteresting" for a long while yet. Probably forever, as we can pretty much guarantee that the experiments will be producing more data at a faster rate than affordable (or even feasible) storage size increases
>"commoditised" prices
As a grad student in the 90s, when the price of an external 1Gb SCSI drive for a SUN dropped to 1000quid we got one for every machine. No more booking scratch disk space in advance and reading from tapes, processing data them and writing back to tape, we would have all the disk space we needed.
>" "throwing away" data pre-determined as "uninteresting" for a long while yet. Probably forever"
The detectors have layers of processing just because it's impossible to get the raw data out of the experiment fast enough (there isn't enough space for the cables)
So each layer tries to throw away as much as possible and only passes interesting events up the chain. When they were first designing ATLAS I know they were planning on only sampling a fraction of even the final data stream because they couldn't keep up.
The big problem of archiving all the results is the associated knowledge/context.
You can keep all the configuration/calibration info linked to the data and design the data formats to be future proof - but it's harder to capture the institutional knowledge of "those results were taken when we were having problems with the flux capacitor so anything above 1.21GW is probably a bit dodgy, Fred knows about it but he left".
In HEP it rapidly becomes cheaper/easier to re-run the experiment with the newer kit rather than try and wade through old stuff.
In astronomy it's different, we can't rerun the universe so we religiously save every scrap of imagery.