
Now that Microsoft can add code to your DNA
They're adding a whole new dimension to the term "Blue Screen Of Death".
Scientists in the US, working alongside Microsoft, have managed to encode "hello" into a readable strand of synthetic DNA, using a fully automated data storage system. The unusual apparatus to perform this work was created as a potential first step to eventually bring the technology to data centres. The experiment is described …
“Hey,” he said, “is that really a piece of fairy cake?”
He ripped the small piece of confectionery from the sensors with which it was
surrounded.
“If I told you how much I needed this,” he said ravenously, “I wouldn’t have
time to eat it.”
He ate it.
The platter is 2D. You can stack platters, but that's just a bunch of 2D areas on top of each other, with relatively huge distances between them that don't hold any data. DNA can actually fill a volume with data.
That said, I strongly doubt that volume can scale much above the microscopic. If you actually packed an exabyte worth of DNA in the volume of a sugar cube, I suspect it would become too entangled to read.
Natural DNA has 4 nucleotides ATGC at any position so rather than base 2 (binary) it encodes in Base 4 (Quaternary). Hachimoji DNA for example has 8 nucleotides ATGCPZBS so goes to base 8. This is without additional modifications. ASCII in binary each letter takes 8 electrons in base 8 it would take 3 nucleotides. Physically DNA is several orders of magnitude larger then electrons on a HDD but multiple possible outcomes at each position increase its storage density.
Then DNA is also able to pack itself together in 3D incredibly tightly. Each of our nuclei fits 3 billion nucleotides into a space just 6 microns across around 113 cubic microns. And that packing still allows for read write access.
Natural DNA has 4 nucleotides ATGC at any position so rather than base 2 (binary) it encodes in Base 4
Well, not exactly, AIUI. The encoding in DNA is in terms of base pairs. The pyrimidines (cytosine and thymine) pair with the purines (guanine and adenosine, respectively). The two strands of DNA are complementary, and the ribosome reads the code by unlinking the strands and generating the complementary RNA. In nature, the base pairs are read as triplets, with a mapping between the possible triplet patterns and the exact amino acid to be added to the growing protein product.
One doesn't have to replicate (ha!) this triplet encoding for non-biological purposes, but still at any given position on the double strand of "natural", i.e. ATCG DNA, there is either an AT pair, or a GC pair. I think that makes natural DNA a binary information store, unless there is a clever way to read only one specific strand.
Further reading on Chargaff's Rule, etc.: Principles of Biochemistry
Sadly you are incorrect. Binary encoding would allow only 8 amino acids. There are 23, two of which (tryptophan and methionine) have only a single representation. So base 4 encoding it is, in principle, but some amino acids have as many as 4 alternative sequences. There are 3 stop triplets. It doesn't, to say the least, look "intelligently designed".
There was no indication when if this will be put to commercial use.
Apart from any other consideration let's think what DNA contains: a sugar (the "-ose" at the end or "ribose" is a clue), some nitrogen and some phosphorus. A selection of some of the most important elements in metabolism. A single bacterium could multiply itself through your sugar-cube size DC in a few hours so you end up mostly decoding bacterial DBA to convert it back into bits.
My worry would be not so much that some micro-organism could damage the data store, but that the data store itself could be a micro-organism. Suppose someone's database when placed in this DNA storage just happened to encode for, say, an airborne virus with 100% lethality to human beings. Wouldn't that mean any large-scale DNA data stores would have to be kept inside the highest possible level of biosecurity?
That's not directly possible. DNA by itself does nothing - you need all the array of cellular components (ribosomes and mitochondria and al that stuff) to go from DNA to an organism.
Basically, all DNA is is a recipe list for proteins. A cookbook cannot become a macdonalds and give everyone heart disease.
There's an outside chance that a bacteria might get in there and read the database like it was it's own DNA (maybe? I guess? Seems unlikely, given that bacteria don't spontaneously become other bacteria when they eat 'em, as far as I know) but as already pointed out this stuff will need to be kept in suuuuper clean conditions to stop micro-scale wildlife eating it anyway.
We have loads of DNA that originated from before K/T. Chordates were well advanced by then and the earliest mammals were around, from whom we are descended.
I guess too we will have RNA from before oxygenation. Muscles can carry out anaerobic respiration for short periods and that may well originate from very ancient processes.
Seems unlikely, given that bacteria don't spontaneously become other bacteria when they eat 'em, as far as I know
It's not quite the same thing, but bacteria can exchange DNA with each other horizontal gene transfer. Maybe it'd be wise to use an encoding scheme that prevented replicons appearing, but chances of random data encoding for something meaningful are likely to be in the bardic simian typist range.
Yes, what Mr Boyle says. To illuminate further, a lot of a protein's ability to do what it does (mostly, they're enzymes) depends critically on its folded-up structure in 3-D. What the gene encodes for is simply its "2-D" amino acid sequence. It seems that prions can catalyze the transformation of specific proteins into a different, defective, folded structure which nonetheless has the same amino acid sequence.
If it's sugar, why does it taste salty and slimy?
https://boingboing.net/2017/10/14/what-does-dna-taste-like-an-i.html
(yes, it is safe for work)
This medium would seem to have promise, but I doubt it will be used in a data center. It is never going to be fast, which means it would only be useful for long-term storage. There are a number of solutions, though, that will last longer than this is likely to and are most likely faster than it ever will be. I wouldn't look to this to revolutionize the storage world. At the same time, developing this might lead to new technologies that will allow us to do things we cannot today, haven't even considered... or not. The point, I think, is to learn what is possible first and figure out a real world use for it later. Good luck to the researchers!
The University of Washington and Microsoft research team behind this latest experiment previously said DNA-based storage could fit the contents of an entire data centre into a sugar cube-sized unit ...an MIT startup called Catalog said it was designing a machine that could write a terabyte of data a day
It's cute. And if it ever comes to fruition, I'm sure that there will be applications. But allow me to point out:
. You can order sugar cube sized USB flash drives off the shelf today as well as 2TB units although to latter look to be closer to small coffee cup than sugar cube size.
. Existing storage technologies are moving forward steadily with regard to capacity and speed. By the time DNA storage actually is usable, semiconductor competitors will likely be much more compact, faster, and greater in capacity than today.
. DNA can develop defects -- cancer and mutations are the subject of huge amounts of medical research. Might want to include ECC in your DNA storage technology.
. DNA is pretty sturdy. It doesn't last forever. But a few decades/centuries/millennia is probably good enough for most purposes. Here's a link to the Wikipedia article https://en.wikipedia.org/wiki/Ancient_DNA
This post has been deleted by its author
How do you verify what you wrote?
Just like a disk or cloud file or a stack of punched cards for that matter -- parity, checksum, ECC, hash, etc,etc,etc? Or maybe you can deep fat fry it, then taste it? If it tastes off, or makes you sick or causes your toenails to fall off, assume it has been corrupted.
I am sceptical about the "long lasting" claim. DNA is so fragile that almost none of it survives more than a few years even though it is almost never exposed to anything approaching an "extreme environment". Only a very, very tiny proportion of all the DNA created actually survives for thousands of years.
Presumably you would keep your "data center in a sugar cube" in conditions ideal for the longevity of DNA, and since it is so small have lots of backups. The reason little survives is because DNA rarely ends up in ideal conditions.
For bonus points, create a life form with a LOT of junk DNA you can use as scratch space, and put it in a zoo. With enough redundancy you can get your data back from one of its descendants :)
I actually tried this. I thought I would be able to store and print some DNA info, but it turned out the printer had it's own ideas and though I was just supplying the toner.
The resulting hybrids will probably enable some information continuity, but I wouldn't really call them backups.
I don't want to diss efforts like this—there's always something to be learned, and I am not a biologist—but: is this ever going to be a practical route to high-density data storage? There is so much work going on with silicon-based, plasmonic, photonic and holographic approaches, many of them offshoots of developing nanotech ideas, that it's arguably likely that we'll eventually be able to read/write data at the molecular scale anyway. It may well be a failure of the imagination on my part, but while I think humanity may do some remarkable things with DNA (both wonderful and horrible, almost certainly), routinely stuffing petabytes of data into a vial of the stuff for ve-e-ery slo-o-ow retrieval just doesn't seem that likely. Perhaps it'll be a niche product for spies and smugglers?
(Didn't Friday have a special pouch concealed behind her navel ...?)
The article isn't clear as to how the data is read, but I'm guessing it is destructive so here's the dilemma.
Do we seek a biological way to replicate the data before it is destructively read - introducing the possibility of a random mutation, or recreate it from the read data - introducing the possibility of non-random editing over the thousands of years to come.
You know. I'm now looking at blockchain in a new light.
If they actually developed any new technology, or if they just strung together on DNA synthesizers with DNA sequencers, both of which are quite established technology and heavily automated / computer controlled already?
Oris the fact that an article appeared in a renown scientific journal sufficient to swallow any hyperbole?
Now, just to douse your enthusiasm a bit more, stop a moment to think on (1) what quantity of DNA molecules they synthesize, and (2) what quantity / volume of chemicals is required to write and read the information. Maybe factor that into the storage density equation before comparing it to actually functioning and commercially available data storage systems.