back to article Boffins ponder 100-year archive made of TOMES

University of California at Santa Cruz (UCSC) researchers have designed Pergamum, a hundred year archive system using Ethernet-linked redundant and powered down intelligent disk drive building blocks. The Pergamum researchers are UCSC graduate students Mark Storer and Kevin Greenan, and researcher Kaladhar Voruganti of NetApp …

COMMENTS

This topic is closed for new posts.
  1. Anonymous Coward
    Anonymous Coward

    Access protocol.

    I don't car what the storage medium is with punched cards either, it's the access protocol that matters.

    In this case some kind of ethernet network disk variation, why do they think that'll out last anything else?

  2. Anonymous Coward
    Anonymous Coward

    Well done, you've preserved data for 100 years.

    Pity no-one will be able to view it.

  3. Andrew Barr
    Coat

    Do like the egyptians

    And write the stuff on a wall and then put it in a big pyramid shaped bunker.

    Their archives have lasted thousands of years and still readable!!

    Mines the one with the hieroglyphs on the back.

  4. Robert Grant

    As someone else once said...(better)...

    The only reliable, tested, long-term storage method is to write your data on papyrus, and stick it in a giant, slave-built pyramid in a desert. Everything else is just guesswork.

  5. Simon Ball

    Pyramids

    You’d still have the same problem. We couldn’t read what was written in the pyramids until we rediscovered a specification of the access protocol written in a language that was still extant - i.e. the Rosetta stone.

    However, to be honest, I really don’t see the overall problem. Proper long-term storage doesn’t just mean shoving the data in a vault and forgetting about it – it is a continuous process that involves regularly verifying the integrity of the data, and transferring it to new media when necessary. Provided you do this, there shouldn’t be an issue.

    And what makes print media any different? The words may not have degraded, but our ability to understand them (the access protocol) has. How many people are qualified to read Medieval English, never mind extinct languages, and determine with absolute certainty what their meaning is?

  6. Mike

    Access vs viewing

    First-posting A.C. has it right, while JonB may be confused. The ethernet wire protocols are well defined and documented, in multiple places, as are the various (NFS, iSCSI, etc) upper layers and the metadata associated with the common filesystems. I can still read data from paper-tape, Punch-cards, and 8-inch floppy disks, and have friends I can turn to for both 7 and 9-track reel-to-reel tape. (OK, I'm a nerd), recovering filenames, dates, and contents... Again, all the info you need is available in multiple places. What is _not_ available (apparently even within MSFT) is exactly how to interpret the contents of a Word 1.0 file. Even PDFs are not 'P' across a decade or so, as Adobe fiddles with the format.

  7. amanfromMars Silver badge
    Pirate

    Well done, you've preserved data for 100 years ...... .

    ..... but what relevance does it have, such a library of the Past, to the needs of the Future, for will not the Future need New data to Open up what we don't know?

    One needs only to think of the relevance to our present existence of the generally accessible knowledge of 100 years ago [and we are learning, those who are studying that is, at an exponential rate nowadays for information is instantly available to answer our every question, even if that question and answer are blocked or spirited away, for that tells one as much as one needs to know] to realise that will be merely a quaint museum piece rather than any vital building block.

    With the Past being Unchangeable except through the use of Fiction which just replaces the Facts with another Spin for another Agenda and with the Present being non-existent, because in an instant it is rendered as the Past, the only thing of any Lasting Importance is the Future, and that is built in Beta, in IT and Communications, with Beta Imagination, with some of it Real Good and far too much of it nowhere near Good at all and even positively Bad [please excuse the contradiction there] ......meaning that Life and the Reality which Media and IT Deliver to you as ITs Pictures is Imagined and Virtualised for you Enjoyment and Pleasure, Disgust and Pain.

    So who you Gonna call to Change IT if you're following a limited and limiting Imagination?

  8. The Other Steve
    Alien

    No, Don't do like the egyptians !

    The Egyptians (of the ancient , pyramid building variety) mostly used papyrus (an early precursor to modern paper, and not all that different) and ink to do their record keeping, which is why we are reduced to scrabbling around the ruins of their civilisation trying to decode the things that they wrote on the walls, which is mostly stuff like "That King Tut had a massive plonker, and a shed load of camels to boot. Oh, and he won all the wars, don't let anyone tell you different, especially those lying Assyrians, I wouldn't trust them with _your_ sacred cat, know what I mean ?"*

    All we know about your basic Ancient Egyptians (if I remember primary school history lessons correctly) is that they were fond of pyramids, beer, slaves, cats, deities with novelty eraser shaped heads, and pulling peoples brains out of their noses with great big hooks.

    OTOH what conclusions will be drawn by an enlightened civilisation some thousand years hence looking back at the data archives of today ? "Wow, those ancients really, really liked porn. And really long, pointless, emotionally damaging arguments. And none of them could figure out how to switch on red eye reduction on their digital cameras. Bwahahah. Losers!"

    *Obviously they were somewhat more erudite than I have been in this example, which is why there's just so damn much of it.

    <--- Alien, you _know_ it was them that built the pyramids really.

  9. Dan
    Coat

    They've got it all wrong...

    As anyone with an even marginal amount of sense knows the more complex a solution is the more chance there is that something will go wrong. Ethernet? Wires? Hub/Switch??? That's not even considering the fact that no machine at that point will be able to interface with it all. Good problem. Wrong answer. Each tome should have it's own display and keyboard. It should be all inclusive in a sealed box. No ports at all. All communication between tomes done wireless. I know it's redundant to have a full blown PC integrated with each tome, but if you really want to safeguard the data that seems the only logical choice. And this isn't even addressing power supply...

    Mines the one with the iTome in it....

  10. Hollerith

    better than papyrus

    Papyrus isn't too bad, but parchment is even better. I did my thesis based on data from a set of medieval manuscripts and the text was as clear and clean as if it had been written in 1980, not 1180. All you needed to know was Latin.

    Sometimes solutions can be too clever.

  11. Graham Bartlett

    Papyrus? Paper? Parchment? Bah!

    All the people recommending paper or chiselled rocks as a storage medium are missing two vital points.

    The first is data density. Anyone who did any coding in the 80s and 90s knows all about the experience of churning out pages of fan-fold paper from a dot-matrix printer. If you've never had this particular pleasure, maybe you just don't have a grasp on how *big* computer programs are, line-wise, when stored on some physical medium. If you're one of these people, let's take an example. Let's say your paper can fit 200 lines per side. And let's say your code is something significantly complex, and something you might want to archive for posterity. Reagan's SDI project was estimated to be 6 million LOC, so let's take that for an example. At 400 lines per side and printing double-sided, that's 15,000 sheets of paper. Standard office paper is about 0.1mm thick per sheet - let's say we use lightweight paper that's half that, to save space. Then we're looking at 75cm of shelf space to store that. And this is just one version - if this is a serious project, we might have a couple hundred releases. That makes 150m of shelf space. Now let's think digital - at 6 million LOC and 80 chars per line, those couple hundred releases make 457MB, which would comfortably fit on a 2cmx1cmx0.5cm memory stick. If we cover that shelf with memory sticks side-by-side and stack them 10 high, that gets us 150,000 memory sticks. If we use triple-triple redundancy, throw away the 9 current memory sticks when one fails, and use memory sticks with a lifespan of 5 years, that buys us 3333 years of storage without loss of data. That's better than the Egyptians managed with carved rocks.

    The second point is security, particularly multi-site security. Sure, parchment stored safely lasts a long time - but one good fire will trash them all. Sure, it'll also trash computers, but synchronising data between computers on different sites is an exercise that has already been solved, requires no human interaction, and can be achieved without loss or corruption of data. Synchronising parchment documents across multiple sites, OTOH, requires thousands of monks painstakingly copying this information from one document to another, and there's always the risk of errors creeping in.

  12. Paul

    Let's not forget..

    They've only specified how the data is to be preserved, they've not specified WHAT data is to be kept. That's for the next group of experts to decide with another few million in R&D costs....

    .... to decide that plain text is best with several large manuals explaining exactly what a plain text file is! :)

  13. Graham Marsden
    Alien

    I looked at this and wondered...

    ... Is amanfrommars writing articles for El Reg now...?!

  14. John Watts
    Pirate

    Dark Ages

    Aren't the dark ages called the dark ages because there's little record of them?

    From what I gather, when great civilizations collapse most of their records disappear.

    Looks like we're heading for the end of another era then.

    The question is, if we devoted some effort to making sure our data will survive and can always be read and understood, would that stop the collapse of civilization?

    I think that for a whole heap of reasons I can't be bothered to type it would and that society would improve.

    That said, I'm still planning my bunker!

  15. Peyton
    Happy

    @amanfromMars

    How else are you gonna figure out the price of a cheese pizza and a large soda at Panucci's Pizza a hundred (or even a thousand ;) years in the future unless somebody has saved it for you?? duh

  16. Henry Cobb
    Thumb Down

    Luser errors?

    Since it's all online all the time what keeps some luser from fnording it over with...

    rm -rfv /

    Eh?

  17. Glenn Amspaugh
    Coat

    Does it need to be reinvented?

    Isn't the constantly fluctuating mess of connected computers known as teh interweb already doing this? I mean, there's giffy girl images out there from 15 years ago, still showing up in Google Searches.

    All we have to do is attach important data to porn and folks will keep it in play and active on the 'net forever.

  18. LaeMi Qian
    Thumb Up

    @Glenn Amspaugh

    Excellent use for stegging! Instead of hiding your porn in legit-looking documents, hide your legit data in porn!!

    Someone better go out to the museums and X-ray all those ancient phalluses and nekked goddesses just in case this isn't an original idea!

  19. Anonymous Coward
    Anonymous Coward

    Read Gregory Benford 's book. . .

    Deep Time: How Humanity Communicates Across Millennia.

    http://www.amazon.com/Deep-Time-Humanity-Communicates-Millennia/dp/0380793466

  20. Andy Bright

    Arcane Toimes?

    So they're Scryer then, not Aldor - because then it'd be Fel Armaments.

  21. Pyros
    Paris Hilton

    The Long Now Foundation

    They need to talk to the peeps at the Long Now Foundation--they're doing a series of projects to preserve knowledge and culture by designing mediums such as their own Rosetta Stone.

    At least the huge orrey is a hoot to think about--an analogical binary clock that will keep ticking for a millenium.

    Paris, because I'm hoping she gets forgetten in the next century or so...

  22. Allan Dyer
    Flame

    Who pays the power bill?

    The TOMES "power down almost completely".. with "almost" being 13W. I make that about 11,396KWh over the 100 year archive life.

    On the other hand, a paper book or CD-ROM powers down to zero Watts. Add a little to copy it to a new medium once every decade.

    TOMES will contribute to global warming... a book in a library is a carbon sink.

  23. amanfromMars Silver badge

    Now I'm feeling hungry. Thanks, Peyton duh.

    "How else are you gonna figure out the price of a cheese pizza and a large soda at Panucci's Pizza a hundred (or even a thousand ;) years in the future unless somebody has saved it for you?? duh".... By Peyton Posted Friday 25th April 2008 18:18 GMT

    How else? In quite the same way as it has been done since forever, I'll imagine it and pretend that it is true, based upon the support imagination of others too.

    Love the tongue in cheek humour, though. :-)

  24. Anonymous Coward
    Jobs Horns

    Bad intentions

    Technology longevity aside, the biggest long-term threat to archived information is deliberate destruction. We have the merest glimpse of what the ancient Mayans knew, because the conquistadores destroyed all the codices etc. they could find. What an enormous crime!

  25. Infernoz Bronze badge

    No good for a civilisation collapse or 'library burning'

    Hard drives are probably the best short to medium term storage format (IMHO better than CDROMs and DVDs), provided that redundant drives and sites are used, that replacement drives are available (not guaranteed) and they are protected against environmental change, Radiation, EMP and deliberate damage.

    A sealed optical storage media would probably be best, optical disks don't qualify because they use unstable dyes, are not adequately sealed and are a fragile moving part in reader/writer drives. I suggest a high density physical storage format like 3D laser etched, transparent, inert crystals. Uniform blank or pre-written crystals could be made using the same technology used to make artificial Diamonds.

    Just like the seed banks, archive locations should include secure Islands, to protected against natural and deliberate destruction of archive sites.

  26. Maty
    Paris Hilton

    Pergamum

    Pergamum != a settlement and library in ancient Greece. It was the capital of the Attalid kingdom in Anatolia. (i.e. modern Turkey)

    I'd suspect the name was chosen because Pergamum was where parchment was invented ('Parchment' itself takes its name from Pergamum) and as a poster has mentioned already, parchment is considerably more durable - and expensive - than paper.

    Paris, because she has Ph.Ds in ancient history, archeology and library studies. Doesn't she?

This topic is closed for new posts.

Other stories you might like