RNA functions as the reader that decodes this flash drive. This reading process is multi-step and there are specialized RNAs for each of these steps. Below, we look in more detail at the three most important types of RNA. The nitrogen bases in DNA are the basic units of genetic code, and their correct ordering and pairing is essential to biological function.
The four bases that make up this code are adenine A , thymine T , guanine G and cytosine C. Bases pair off together in a double helix structure, these pairs being A and T, and C and G. RNA molecules, by comparison, are much shorter 3. Eukaryotic cells, including all animal and plant cells, house the great majority of their DNA in the nucleus, where it exists in a tightly compressed form, called a chromosome 4. This squeezed format means the DNA can be easily stored and transferred.
In addition to nuclear DNA, some DNA is present in energy-producing mitochondria, small organelles found free-floating in the cytoplasm, the area of the cell outside the nucleus.
The three types of RNA are found in different locations. If it receives the correct signal from the ribosome, it will then hunt down amino acid subunits in the cytoplasm and bring them to the ribosome to be built into proteins 5. Ribosomes are formed in an area of the nucleus called the nucleolus, before being exported to the cytoplasm, where some ribosomes float freely.
Other cytoplasmic ribosomes are bound to the endoplasmic reticulum, a membranous structure that helps process proteins and export them from the cell 5. DNA degradation can occur through several mechanisms including oxidation and hydrolysis of the phosphate backbone or the base from the sugar depurination. In addition to temperature which generally accelerates rates of most degradation reactions, controlling pH is perhaps the next most obvious candidate for improving DNA stability against these mechanisms.
Both acidic and basic conditions can enhance the hydrolysis rate of DNA by either increasing the electrophilicity of the DNA or the nucleophilicity of water. For example, it was estimated that a change from pH 6 to 5 could increase the degradation rate of DNA by an order of magnitude 50 , and even just a min exposure to pH 4. Room temperature operation would not provide adequate energy to first melt any secondary structures that might have formed that would block hybridization, and would also result in non-specific sequences hybridizing with each other.
Shorter DNA sequence lengths that hybridize near room temperature would be too short to confer adequate sequence specificity amongst large, highly diverse libraries of DNA strands that will comprise most DNA storage systems.
Therefore, tightly controlling pH as well as the time exposure to elevated temperatures will be important design considerations for storage systems. This review has focused on empirical measurements of DNA stability under a range of different conditions. Together with theoretical analyses 4 , 53 , 54 , there is strong evidence for the utility of DNA as a storage medium.
Nevertheless, DNA has finite stabilities, and is especially labile in conditions relevant for frequent access and dynamic processing. This does not preclude the use of DNA for storage applications but does affect the information density of systems by requiring higher redundancy in the number of copies of each distinct strand as well as in the encoding methods used to achieve certain reliability. Understanding the effect of DNA stability on these tradeoffs will be important in designing unit processes and systems for specific types of storage applications.
Here we discuss these tradeoffs through a series of analyses shown in Fig. The goal here is to show how models can reveal both intuitive and unintuitive relationships between parameters; it is important to note that ranges of parameter values and observed trends shown here may change depending on the specific details of each system. A Reed-Solomon inner-outer encoding scheme. B Relationship between log decoder error probability during RS decoding and DNA strand length, including the effects of symbol error rate mutations, insertions, and deletion, P error and copy number.
C Relationship between information density of a DNA storage system and the probability of symbol erasure strand loss due to breakage as a function of strand length. D Relationship between information density and strand length as a function of the probability of strand breakage. C and D assume a copy number of 1.
These addresses typically are complementary to short DNA oligomers used to amplify the files through PCR 3 or to extract them through affinity-based methods However, addresses cannot be much shorter without sacrificing diversity in addresses The addresses take up space on each strand and do not encode data e.
Index sequences are also overhead and are necessary to include to know how the different strands comprising a file should be ordered. In addition to the address and index, error correction codes may use additional overhead. Electronic storage systems adopt and employ many error correction mechanisms to ensure the reliability of stored data. Similarly, to cope with frequent errors, DNA storage systems can leverage error correction codes that are capable of detecting and correcting errors that occur as a result of strand breakage or loss and due to substitutions, insertions, and deletions within strands.
Error correction codes work by adding enough redundancy to recompute the original data even in the presence of errors or missing strands. Hence, to maintain the same reliability of information transfer or recovery, error correction forces a trade-off between density of information and tolerance to errors.
The higher the likelihood of strand error or loss due to reduced DNA stability, the more overhead must be spent on error correction. While a variety of codes have been proposed for DNA storage, Reed-Solomon RS codes are particularly popular given their configurability and tolerance to errors 24 , 56 , 57 , The error correction properties of RS codes are well known, and we use them here to explore the combined effects of strand errors, breakage, and length on the information density and reliability of DNA storage systems.
We will not focus on the details of different codes but rather use RS codes to illustrate some general trends to consider when designing DNA storage systems. Additional details of popular error correction approaches can be found in a few recent references 14 , 15 , while details about our implementation of RS codes and parameters used shown in brackets can be found in the Supplemental Information.
In brief, the key features of the implementation used here are 1 that the RS code is capable of detecting and differentiating two kinds of errors, symbol errors such as insertions, deletions, and mutations, collectively having a probability of p error , and symbol erasures such as the breakage or loss of a DNA strand with probability p strand erasure ; 2 inner and outer RS codes are interleaved where the inner code protects against errors within a strand and the outer code corrects for missing or erroneous strands Fig.
To provide a non-exhaustive example of the potential importance of trade-off analyses, here we focus on a major mode of DNA degradation, strand loss through hydrolysis or mechanical breakage. First, to better understand the impact of strand loss on the probability of a decoding error, we analytically model its effect on the decoding error probability of an outer RS[,,33] code. Based upon experimental observations, we conservatively model the probability of strand breakage as linearly dependent on strand length, and we assume that strand loss due to breakage is equivalent to an erasure in the outer code.
We further assume that multiple copies of a strand exponentially reduce the likelihood of loss because all copies of the strand must be lost to cause an erasure. Figure 2B shows the impact of longer strands on the decoder error probability for several system configurations with different copy numbers of each strand and different error rates. The y -axis shows the log decoding error probability and the x -axis is the strand length.
Other parameters matter, too. Lower p error significantly reduces the residual likelihood of error. Also, higher numbers of copies per strand also have a large effect since it becomes exponentially less likely that all copies of a strand are lost. For a given strand length, there are many different possible designs for a RS inner-outer code. In Fig. We assume a single copy of each strand to maximize information density and fully exploit the error-correcting capability of RS codes.
In this system, shorter strands achieve superior density at high strand breakage rates. This is a result of shorter strands being overall less likely to break, and therefore the outer code needs fewer error correction symbols.
Longer strands have a higher overall likelihood to break and need more error correction in the outer code to compensate. However, as the overall probability of breakage per nt decreases to the left , all strands are less likely to break, and this gives longer strands an advantage since they can hold a larger fraction of information per strand and the outer code can work effectively even with a relatively small number of error correction symbols.
A deeper analysis further shows that the magnitude of the strand breakage rate can fundamentally alter the relationship between information density and optimal strand length Fig. In fact, it is not simply that shorter strands are better for high strand loss rates but that there can be optimal lengths that balance the overhead needed for file addresses and indices with the encoding overhead required to account for strand loss. This analysis is just one example of many that can be performed interrogating the effects of diverse parameters of DNA storage systems.
It underscores the need to tailor error correction and system parameters like strand length and copy number to different settings including error rates that vary according to environmental conditions and data storage applications. For example, long-term archival storage will likely encapsulate the data in silica helping keep the probability of strand loss or breakage low, allowing longer DNA strands to be used and achieving higher information densities.
In contrast, working storage or short-term dynamic storage would benefit from shorter strand lengths to compensate for higher strand loss rates. The robustness and failure rates of information storage systems are of utmost importance as the reliability of data retrieval must be concretely reported, verifiable, and trustworthy 62 , 63 , 64 , While we have some rough estimates and measurements of DNA stability in a variety of conditions, often measurements exhibit considerable noise and variability between experimentalists and research groups in addition to substantial noise between samples in an individual experiment.
There are likely experimental details affecting the accurate interpretation of measurements including the manipulation of DNA itself in setting up experiments, or confounding parameters like DNA solubility.
Experiments exploring a more comprehensive set of parameters and that assess sources of variability in results could provide more confidence in the design and utility of DNA storage technologies. Fine-tuning exact buffer conditions, assessing changes in its composition, and maintaining strict control and provenance records over the environmental exposures of the DNA throughout its complete lifetime starting with DNA synthesis will be important for commercial DNA storage products.
In addition, while many studies have quantified DNA degradation through some form of quantitative PCR, mass measurements, or even next-generation sequencing, the definition of DNA degradation remains unprecise and too limited despite its considerable impact on the physical design and encoding of reliable storage systems. For example, there are many ways a DNA storage system can be degraded.
How each of these types of degradation mechanisms are affected by environmental conditions is important for system design and should be carefully assessed.
There is also the intriguing possibility that the physical stability of storage systems could be enhanced by the use of another polymer that is chemically more stable than DNA, although substantial work would likely be needed to replicate the synthesis, processing, and sequencing technologies and infrastructure available for DNA There are many other potential chemistries of nucleic acid polymer backbones that may offer differing stabilities tuned for specific environmental conditions or applications, including bicyclo-DNA or glycerol-DNA that have altered sugar backbone chemistries 67 or nuclease resistant nucleic acids In addition to altering the biopolymer substrate itself, protective additives or even active repair systems similar to those in natural biological systems may improve storage system reliability.
There are clearly many opportunities and an important need to better understand, characterize, and improve DNA stability. While seemingly a straightforward concept, assessing the stability of DNA and making appropriate choices of storage methods is not trivial. DNA stability has been experimentally measured and reported in many diverse ways, including mutational rate, breakage rate per base, and loss of intact DNA strands. Degradation rates have also been reported in a mix of many different environmental, temperature, buffer, and temporal conditions.
Furthermore, the functional impact of different types of degradation will depend on the nature of the storage system.
For example, mechanical degradation may affect systems that use longer DNA strands compared to shorter strands, degradation rate may be complicated to predict depending on the density of storage systems due to its potential nonlinear dependency on DNA concentration, and some encoding algorithms may sacrifice information density but be more resistant to the loss of strands. However, it is clear that even with our current nascent knowledge of DNA stability, reliable DNA storage systems can already be created with existing technologies.
In addition to developing a better understanding of DNA stability, what will be important is the recognition that the appropriate tradeoffs and limitations in system properties and capabilities should be made and can be supported through models.
With improving measurements, a better understanding of degradation mechanisms, new technologies, models, and enhanced encoding algorithms, the efficiency of these systems will continue to improve the commercial viability of DNA-based information storage.
Further information on research design is available in the Nature Research Reporting Summary linked to this article. Plots in Fig. Watson, J. Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature , — Wiener, N. Interview: Machines Smarter Than Men? Bornholt, J. Ceze, L. Molecular digital data storage using DNA.
Adleman, L. Molecular computation of solutions to combinatorial problems. Science , — Cox, J. The complexities of DNA computation. Trends Biotechnol. Article Google Scholar. Shehabi, A. Masanet, E. Recalibrating global data center energy-use estimates. Jones, N. Bednarz, A. Mullis, K. The unusual origin of the polymerase chain reaction. Hughes, R. Synthetic DNA synthesis and assembly: putting the synthetic in synthetic biology. Cold Spring Harb. Burel, A. Macromolecules 50 , — De Silva, P.
New trends of digital data storage in DNA. BioMed Res. Erlich, Y. DNA Fountain enables a robust and efficient storage architecture. Byron, J. Measuring the cost of reliability in archival systems.
Allentoft, M. The half-life of DNA in bone: measuring decay kinetics in dated fossils. B Biol. Bada, J. Amino acid racemization in amber-entombed insects: implications for DNA preservation.
Acta 58 , — Hofreiter, M. Willerslev, E. Long-term persistence of bacterial DNA. Organick, L. Small Methods. Liu, Y. DNA preservation in silk. Kohll, A. Scientists don't yet fully understand what significance this might have or whether this has any evolutionary importance.
Based in San Diego, John Brennan has been writing about science and the environment since Why Are There 61 Anticodons? Nucleic Acid Facts. The Formation of Hydrogen Bonds.
Nucleic Acid Functions. Types of Bonding in Crystals. What Is the Importance of Nucleic Acids? What Is Histone Acetylation? References David L. Nelson and Michael M.
0コメント