back to article Picture this: An exabyte of cat pix in the space of a sugar cube of DNA

University of Washington and Microsoft Research boffins have successfully used DNA as an image store. Their argument is that although it's hard to work with, DNA has a couple of killer characteristics as a data store: the “raw” storage limit is an exabyte per cubic millimetre, and its storage half-life is more than 500 years …

  1. Charles 9

    No word on actual data transfer rates. I know it's slow, but HOW slow? That's kind of important in a world where if your transfer rate is slow enough, the data can get stale before you finish writing it. Which is why enterprise tape speeds have needed to keep up along with the tape capacities.

    1. Mage Silver badge

      re: No word on actual data transfer rates.

      About 10 hours per access operation, minimum according to the article. Really slow but high density and long life. A USB stick might be 5 to 10 years. A writeable CD left on window sill less than a week?

      An old HDD in a drawer might last a long while (not sure about shingled or other very density).

      Tape can last 20 years if stored properly, you need to make copies as slowly get print through.

      I think Magneto Optic storage was good (I remember 3.5" Sony cartridges in 1990s with 250Mbyte)

      1. Anonymous Coward
        Anonymous Coward

        Re: re: No word on actual data transfer rates.

        It also sounds like it's not random-access either, so you'll just have a soup of DNA strands and have to decode them all, to get the data you want.

        I suspect the time to decode an exabyte of data will be pretty high.

        For comparison:

        https://en.wikipedia.org/wiki/Human_genome

        an entire human genome is about 3.1-3.2 billion base pairs

        At 2 bits per base-pair that's about 0.8GB.

        So your sugar-cube exabyte could encode the DNA of about 1.25bn humans without compression. Allowing for the huge amount of duplication in that dataset, it should easily be able to encode the DNA of the whole human race.

        The time and cost of sequencing the entire human race should give you an idea of the cost of data retrieval.

        OTOH: having a backup of the entire human race in a sugar cube is an interesting idea :-)

        1. Dave 126 Silver badge

          Random Access

          >It also sounds like it's not random-access either.

          Read the article again!

          We propose a method for random access that uses a polymerase chain reaction (PCR) to amplify only the desired data, biasing sequencing towards that data. This design both accelerates reads and ensures that an entire DNA pool need not be sequenced.

          We demonstrate the feasibility of our system design with a series of wet lab experiments, in which we successfully stored data in DNA and performed random access to read back only selected values.

        2. Inventor of the Marmite Laser Silver badge

          Re: re: No word on actual data transfer rates.

          Sweet!

        3. krakead
          Thumb Up

          Re: re: No word on actual data transfer rates.

          Reminds me of Harlan Ellison's Demon with a Glass Hand

          https://en.wikipedia.org/wiki/Demon_with_a_Glass_Hand

        4. Anonymous Coward
          Anonymous Coward

          Re:OTOH: having a backup of the entire human race in a sugar cube is an interesting idea :-)

          Until tea time in the marketing department...

          1. Mark 85

            Re: Re:OTOH: having a backup of the entire human race in a sugar cube is an interesting idea :-)

            From an SF point of view... they will have "interesting" offspring.

            1. Anonymous Coward
              Anonymous Coward

              Re: Re:OTOH: having a backup of the entire human race in a sugar cube is an interesting idea :-)

              If you labelled your CDs/DVDs with a spirit-based pen (ie, most pens) then they're probably hosed already. Copy what's left and this time use a water-based marker. Also, humidity, sunlight.

      2. Pascal Monett Silver badge

        Re: A writeable CD left on window sill

        is dead.

        An optical disk properly stored away from sunlight in a room with a low level of humidity should logically outlast tape and HDD, probably by a few decades.

        My first DVD backup disk was done in 2005, it is still perfectly readable. I don't actually have the date of my first CD backup, but I'm pretty sure it was done around 2000-ish, and it is also perfectly readable. Music CDs I bought in the 90s are also still perfectly readable.

        Optical storage is the best thing for personal use. Obviously, I've migrated to BluRay discs now, 25GB size. I'm confident that they'll all be good until the day I die, and then some.

        Of course, to be of any use, you have to make the backups first. Joe Public has some issues with that.

        1. BinkyTheMagicPaperclip Silver badge

          Re: A writeable CD left on window sill

          There's m-disc, if a sunlight resistant archival medium is required. Not all drives successfully read them, but sufficient numbers do..

        2. Anonymous Coward
          Anonymous Coward

          Re: A writeable CD left on window sill

          > Music CDs I bought in the 90s are also still perfectly readable

          However, they're not made the same way as CD-R or CD-RW which you write at home.

          They are physically pressed from a master - pretty much the same way as vinyl LPs in fact.

        3. John Robson Silver badge

          Re: A writeable CD left on window sill

          I have just gone through a data refresh project - discs which were written and then filed in individual sleeves in a disc storage case, with silica gel left inside. Stored in a fireproof safe.

          Those from 2005 are *mostly* readable. There was a ~5-10% failure rate, and those discs were looked after rather well. It is possible that the data was corrupt when written.

          Unfortunately I am no longer in that role, and the 2001 discs were next on the list...

          1. Charles 9

            Re: A writeable CD left on window sill

            That sounds about right. I did a similar migration out of books of CD-Rs and DVD-Rs. The inks and other substrates used in recordable discs simply degrade over time, until they eventually become unreadable. Inks can fade and phase-changing media can suffer unwanted changes. Depending on the quality of the manufacture, this can take anywhere from months to maybe 5 or 10 years, but that's gonna be true of any consumer-level archival medium today. After several years, a refresh definitely needs to be considered. The only reason archival quality opticals can last is because they're essentially laser-etching a stone-based medium. Thing is, they're NOT cheap. I do have a BD-XL burner capable of using something of the like, but the price/GB (not to mention it tops out at 100GB when hard drives regularly go 4-5TB these days) meant it was simpler and more affordable to just get a pair of hard drives and mirror them and keep parity archives within to deal with bit rot. Since I have to cycle my drives every several years anyway, this provided the greatest balance of affordability and risk mitigation.

        4. Sir Alien

          Re: A writeable CD left on window sill

          I use properly stored Blu-Ray discs as well but opt for using rewritable discs rather than write-once. Write-once discs tend to be based on dye staining/burning which over time can fade.

          Rewritable discs use a phase change medium that physical changes the layout and should technically last longer. The closest analogue I can find is "It melts the damn disc". It also allows updating or refreshing the disc by rewriting it again.

          Personally I find backup to only have two options at this moment. Tape which is large, removable, robust. Optical which is smaller, removable and mostly robust.

          Hard drives are not a good alternative for storing long term archive data. Some hard drives will even seize after standing for a very very long time.

          - S.A

        5. Anonymous Coward
          Anonymous Coward

          Re: A writeable CD left on window sill

          Music/video CD/DVD use a different "printing" technology than user writeable CD/DVD, and also usually have better "labels" applied to protect the reflecting surface. In some ways, some older CDs were better built than later cheaper ones.

          1. IT Poser

            They don't build them like they used to

            Also get off my lawn you stinking kids.

            If we go that route I have vinyl that is over a half century old and provides far better sound quality than any mp3 I were to burn on a modern cd. That's not going to help if I need an exabyte of storage.

        6. Putters

          Re: A writeable CD left on window sill

          Magnetic media seems to last quite well, even under less than ideal storage conditions.

          Eg the games that went with the Spectrum I sold a year or two ago, including the home recorded "backups", loaded quite happily (mid 1980s)

          Similarly the QL microdrive cartridges, the Atari 3.5" disks and the IBM PC XT 5 1/4" disks, as well as the 10 meg hard drive (late 80's).

          Just stored in cardboard boxes in the (distinctly non temperature controlled) loft.

          1. sisk

            Re: A writeable CD left on window sill

            The life expectancy of an optical disc in my house is directly proportional to how well I hide it from my kids. One left on the window sill wouldn't survive long enough to be damaged by UV. Seriously. As an example I've got a couple Veggie Tales discs that are coated with what looks to be a mixture of peanut butter and super glue. (In case you're wondering, they're in my fix-it pile until the kids forget about them so that I can throw them away without drama).

            On the other hand, I can count on an optical disc in a case (be it a jewel case or a paper sleeve) in my non-environmentally controlled - and, even worse, non-dust-proof - storage unit to work regardless of how long it's been in there. I've got discs from the 90s that have been in storage except when I need them for as long as I've had the unit they still work just fine. In fact thanks to the recent death of my home file server I'll be pulling quite a few of them in the near future.

        7. Anonymous Coward
          Anonymous Coward

          Re: old CDs

          Music CDs I bought in the 90s are also still perfectly readable

          MUSIC CDs, yes, but early data backup CDs will have turned into hazards by now, due to innovation. Early data CDs were made to be written & read at up to 8x speeds compared to music, and that gets very interesting when you stick those in a later 48x speed drive.

          It will spin up, fail to read, slow down and speed up again, followed by a shattering sound as the disc disintegrates due to centrifugal forces it was never designed for..

          1. x 7

            Re: old CDs

            "Early data CDs were made to be written & read at up to 8x speeds compared to music, and that gets very interesting when you stick those in a later 48x speed drive"

            even worse were those early gold 2x CDs which suffered a fungal infection which ate the glue holding the metallic label onto the disk..........they'd delaminate leaving pretty flecks of gold dust in the drive. If you left them in the sun or near a radiator the plastic turned purple

      3. Roo
        Windows

        Re: re: No word on actual data transfer rates.

        "I think Magneto Optic storage was good (I remember 3.5" Sony cartridges in 1990s with 250Mbyte)"

        My experiences with MO were a bit disappointing, despite storing "guaranteed for 25 years" verbatim 5.25" cartridges at ~15C in sealed envelopes in a drawer they all developed fatal write errors within 18 months, across several batches of disc. The backup hard drives & CD-Rs I have from that same era (late 90s) still work perfectly so all was not lost... Spinning rust can surprise you in good ways sometimes. ;)

      4. Dallas IT

        Re: re: No word on actual data transfer rates.

        M-Disc media (and supporting drives) burn DVDs or Blurays onto granite-like medium (not ink-based) that last 1000 years,

    2. Lusty

      "the data can get stale before you finish writing it."

      This is for archive storage, not backup. You're making the same mistake as those who put backups in AWS Glacier and those who consider old backup tapes to be an archive. Backups (i.e. recovery points) need to be local and fast as a general rule because you are expecting to need them. Archives need to be protected, complete, and cheap because you are expecting to almost never need them but the data is important enough to keep or regs require it.

      1. Charles 9

        No, I'm saying that if it's SO slow it's not going to be practical even as an archive medium because the amount of data you have to archive accumulates faster than you can offload it into the archive medium.

        1. Lusty

          Appologies that wasn't how I read your post, no offence intended. In that situation you could write in parallel. I was mainly addressing the "stale" part. Archive data wouldn't get stale because you'd capture all of it so it's just a calculation of cache vs write speed and parallelisation to get the stuff out. All data can be written in parallel one way or another, just a matter of how complex you want to make it. In this instance as has been pointed out all over the thread the medium would probably break anyway so it's all moot :)

  2. Kevin McMurtrie Silver badge
    Terminator

    This is all fine until

    Until your documents get a virus. A real virus.

    1. Anonymous Coward
      Anonymous Coward

      Re: This is all fine until

      Since the base pairs correspond to the binary data of the encoded document and therefore you could create any arbitrary sequence of pairs during the process, I'd be more worried about my document actually BEING a virus and wiping out humanity.

  3. Anonymous Coward
    Anonymous Coward

    500-year half-life?

    I'd be a little sceptical about the claimed half-life. DNA is very fragile; the reason that the mutation rates are quite low in living DNA is the constantly-operating repair machinery, which can often fix both the point mutations and structural damage (such as strand breaks). As the last resort, a living cell unable to repair its DNA will self-destruct - and its functions will hopefully be taken over by other, similar cells.

    Over time, non-living DNA tends to become a soup of short fragments. It is still possible to decipher at least some of the information thanks to the huge redundancy of the genome in your average multi-celled organism - but the computational effort needed to align the fragment is huge.

    Personally, I am not holding my breath.

    1. Dave 126 Silver badge

      500-year half-life? Yep.

      >I'd be a little sceptical about the claimed half-life.

      I'd rather look at the evidence myself - and an internet search isn't that bothersome, is it? A half life of 500 years has been observed in the DNA from bones of Moa, extinct birds, dating from between 600 to 8,000 years, preserved in similar conditions.

      - http://www.the-scientist.com/?articles.view/articleNo/32799/title/Half-Life-of-DNA-Revealed/

      Were this DNA archival process ever to be used, there is no reason why the archive couldn't be based somewhere cold - much like the Svalbard Global Seed Vault. (https://en.wikipedia.org/wiki/Svalbard_Global_Seed_Vault#Construction )

      Then of course error correction methods and redundancy can be built into any DNA-archival process.

      You mutations you mention are those seen in living cells, and usually occur during the copying stage (and yeah, our cells have several error-correcting mechanisms) - but this is very different to these inert strands of DNA that have been removed from the molecular machinery.

      1. Anonymous Coward
        Anonymous Coward

        Re: 500-year half-life? Yep.

        A half-time of 500 years means that only one half of your data will still be readable in 500 years, only a quarter in 1000 years etc.

        You will need a pretty good erasure coding strategy to deal with that.

      2. Anonymous Coward
        Anonymous Coward

        Re: 500-year half-life? Yep.

        Were this DNA archival process ever to be used, there is no reason why the archive couldn't be based somewhere cold - much like the Svalbard Global Seed Vault. (https://en.wikipedia.org/wiki/Svalbard_Global_Seed_Vault#Construction )

        Certainly. And your average optical or magnetic medium might last quite long under the conditions of reduced ambient temperature and in oxygen-free, low-humidity air. With the added advantage of not being literally digested by any fungus or bacterium which manages to sneak in. Of course, these are rarely the conditions in which the real-life "archival" medium is stored - so YMMV.

        Like most research papers in flashy journals, this is a nice idea, demonstrated through hard work of many very smart people using cutting-edge technologies, and then served as a 5% solution in pure hype.

        I'll be very happy if it eventually leads to a practical storage product - but this is rather unlikely for a variety of reasons.

    2. Naselus

      Re: 500-year half-life?

      "I'd be a little sceptical about the claimed half-life."

      I wouldn't; the half-life of DNA is pretty well established and 500 years is used as a rule of thumb by most archaeologists. DNA can be extracted thousands of years after the death of the animal in question.

      The density is interesting, but this is pretty clearly too slow for any practical use - even ultra-slow high density archival ops need a better R/W timecale than this. I seem to recall a similar idea 3-4 years back using crystal-based storage that was also hugely space-efficient and very, very long-lasting, but simply too slow to have any serious purpose. I can see it being used to, for instance, make some kind of 'golden disk' for strapping onto a deep space probe or something, or a knowledge super-archive buried under Greenland (like the seed archive) in case of thermonuclear war... but other than that, this is just a curiosity.

    3. tony2heads

      Re: 500-year half-life?

      There are some trees and bushes that last for many hundreds -thousands of years (Bristlecone pine) so the replication needed is clearly possible.

      Bacterial spores have huge lifetimes, so maybe packaging is important.

      Check out this list

  4. Anonymous Coward
    Alien

    Maybe someone already did that?

    How would we know, if their data is compressed and/or encrypted? Maybe that's what the Grey's anal probes are for, to retrieve the data they stored in Egyptian times!

  5. DesktopGuy

    My wife works as a microbiologist and currently does PCR at her lab.

    I quite liked we were in totally different professions.

    Looks like in future she could be an "organic storage expert"!

    Imagine sending off some DNA for data retrieval - you think data recovery firms are expensive and slow...

  6. x 7

    who the heck wants an exobyte of cat pix? You'd have to be twisted in the head to enjoy that

    1. Mark 85

      Ok.... maybe that's code for what we really need.. an exobyte of porn.

  7. jake Silver badge

    So basically ...

    ... as you are archiving terabytes of data, the media is already breaking down?

    Trust me, DNA decomposes. We're in the middle of harvesting Spring critters ...

    1. Anonymous Coward
      Anonymous Coward

      Re: So basically ...

      Hmm. So what's needed is a way to *replicate* the DNA to stop it degrading over time, and to have multiple copies in case one is damaged, for example by injecting it into a living creature.

      Ah yes, it looks like there is a project for it already:

      https://www.cockroachlabs.com/

      If you find you have too many copies of your data, then you may need to make some adjustments to your RAID level.

      http://www.raidkillsbugs.com/

  8. Anonymous Coward
    Anonymous Coward

    Can haz exabyte ov locats pls?

    Iz mek mai hed esplode from da kewtnesss overlode!!1!

    Kthxbye!

  9. CrazyOldCatMan Silver badge
    Boffin

    So what would they see...

    ..if they ran an average persons[1] DNA through the retrieval method? A text file that says "I told you it didn't matter"? A long and complex equation that boils down to 42?

    Or a copyright notice from a small company somewhere in the Lesser Magellanic Cloud?

    [1] Were such a thing able to be located. Maybe they need to slurry together lots of DNA and pick the nicest^W best-preserved one..

  10. x 7

    DNA half lifes

    the problem with talking about DNA half lives is that its not relevant here.

    If you're talking about radiation, after storage until its half life is up, on average half the useful material is still there.

    However with something thats encoded such as DNA, just a tiny change could render the data unreadable. A change of 1% in the data could make the whole set unintelligible.

    You'd need to store multiple copies of the data in some kind of DNA raid, but even then without long-term reliability tests you'd be taking a heck of risk. DNA is inherently unstable: until that is overcome, this technology is nothing but an irrelevant curiousity

  11. x 7

    lisping

    don't try talking about cat pix if you lisp

    imagine an exabyte of cat piss

  12. DubyaG

    Hmmm, I wonder...

    How many cat pictures would I find in my own DNA? In the human race's?

  13. redpawn

    Sugar with your tea sir?

    Since it can be unpacked in about the time food is digested, can we absorb all the worlds knowledge with our morning tea?

  14. JeffyPoooh
    Pint

    Sci Fi plot...

    The Human Genome Project database is accidentally captured in transit and loaded into the Utah Data Center. The NSA dutifully starts cracking the code. After a few weeks, the plain text emerges:

    "GATACAATCGGTACCATGGTACACA... Hi. God here. Please, call me Bob. I really do exist; and I've left this Human DNA encoded message as proof. That concept that I know that you'll call 'Last Thursdayism'? Yep, it's all true. Except about 100,000 years. Now, for Part II, please decode the DNA of cats. Good luck. GTCACTAGCATCATCGATCAT"

    It's all been predicted years ago at the Library of Babel.

    https://libraryofbabel.info/bookmark.cgi?godwuzhere

  15. Schultz

    An exabyte in the space of a sugarcube...

    but for reading out you need a lot of supplies to run the polymerase chain reaction and those will not be reusable. Add up the total volume or supplies required for writing and reading data, it'll look a lot less efficient. (As in many orders of magnitude less efficient.)

  16. madick

    Floppy discs will never replace mylar

    For those who don't remember, or perhaps never knew, mylar tape was the industrial/military version of paper tape. Although it wears each time it's read - it starts to look a little frayed after about 1000 passes through a tape reader, it's long term storage potential is pretty good. (Left over from the late 60s, in the back of a drawer, I've still got about 20 yards of mylar tape which shows no signs of degenerating.)

    As I recall, a full reel (about 9 inches diameter) typically held 100K bytes. So an exabyte of data on mylar tape would require a mere 3,995,417 Olympic size swimming pools to store it.

    Incidentally, for those worried about data retrieval time, a high speed tape reader could whistle through a full reel in just over three minutes. So the 1E byte of data stored on mylar could have been read completely over 217 times since the Big Bang.

    1. x 7

      Re: Floppy discs will never replace mylar

      I was surprised recently when I was told by a RAF tech that they were still using punch tape to program the RAF Jaguar's mission avionics right up to when they were retired in ~2007

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon