Researchers at the European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI) have discovered a viable way to store data in DNA. Molecular biologist Nick Goldman and colleague Ewan Birney, associate director of EMBL-EBI, published their paper on DNA storage in Nature in January.
There is approximately three zettabytes (3000-billion billion bytes) worth of digital information floating around in the world and a constant generation of new digital information. Thus, archiving has become somewhat of a challenge in recent times. Hard disks are expensive and require a constant supply of electricity, while even the best ‘no-power’ archiving materials such as magnetic tape degrade within a decade.
DNA can last for tens of thousands of years. Therefore, this new method of storage makes it possible to store at least 100-million hours of high-definition video in about a cup of DNA for extremely long periods of time. “We already know that DNA is a robust way to store information because we can extract it from the bones of woolly mammoths, which date back tens of thousands of years, and make sense of it,” explains Goldman.
The major hurdle in DNA storage has been in writing it. Firstly, using current methods, it is only possible to manufacture short segments of DNA at a time. Secondly, both writing and reading DNA are prone to errors.
“We knew we needed to make a code using only short strings of DNA, and to do it in such a way that created a run of the same letter would be impossible. So we figured, let’s break up the code into lots of overlapping fragments going in both directions, with indexing information showing where each fragment belongs in the overall code, and make a coding scheme that doesn’t allow repeats. That way, you would have to have the same error on four different fragments for it to fail,” says Birney.
California-based company Agilent Technologies volunteered to help Goldman and Birney to synthesise various pieces of DNA including an mp3 of Martin Luther King’s speech, ‘I have a dream’, a PDF of Watson and Crick’s seminal paper ‘Molecular structure of nucleic acids’ and a text file of all Shakespeare’s sonnets.
Agilent mailed the sample to the EMBL-EBI, where the researchers were able to sequence the DNA and decode the files without any errors. “We’ve created a code that’s error tolerant using a molecular form we know will last in the right conditions for 10 000 years, or possibly longer,” says Goldman.