The metaphors are overstated, sometimes misleadingly so, but DNA is widely characterized as the Book of Life—a repository of the information of biology and the program from which organisms are computed. But this is very different from a book a human would write for the same reason that an organism is not truly a machine. A book or machine is assembled with a purpose in mind, from parts derived from all sorts of other knowledge. An organism is built from within, and its DNA must function that way.
We have described many ways that DNA fills a number of physiological roles in the cell. Only some of these are related to protein coding itself. Looking at the genome from without, as we have done, shows its proliferating, diversifying, modular nature. Remarkably, one might say that there are too many modular units to make any sense (biological or otherwise): any stretch of DNA contains nothing but sequences identical to codons. Every three base pairs of every enhancer has a codon sequence. Similarly, because of the tolerance of TFs for variation in their RE
sequences, and the fact that they are short, the genome is also nearly saturated with possible REs. Even a given 10-mer (that is, stretch of 10 consecutive nucleotides) occurs every 410 nucleotides on average, about 1Mb or about 3,000 times in the human genome. In fact, the critical regions of REs are often shorter than 10 base pairs. This also means that any stretch of sequence is simultaneously saturated with wholly overlapping codon and enhancer sequences.
Even if we only consider exons, the same codons are repeated millions of times in the coding parts of the genome, in no detectable order from the point of view of the sequence itself. Likewise for RE sequences. This is not an illusion because, for example, it is the functionally open nature of polypeptides, and hence of codons, that makes the diversity of life possible. As a consequence, one could view DNA sequence naively and say that it would be laughable to think it actually contained any information at all! Yet, these short otherwise essentially random motifs carry the most important biological information in the genome.
Thus, if DNA is so densely packed with information, how can an organism open and read the book? We don't have a very good answer to this, other than to say that it happened incrementally over eons of time. Also, because of cellular continuity from one generation to the next, the apparatus for reading the genome properly may always be changing as cells do what they have to do, but is never entirely missing. How scientists learned to read genomes (to the extent that we can) is more easily explained.
By what authorization do we turn Figure 4-2 into Figure 4-6B? We do not learn about DNA just by examining DNA. For example, it is not possible even to identify all the genes in a given genomic DNA sequence. There is no known signature by which all genes can be identified, and indeed it is not even clear what should constitute a "gene." If our understanding of the sloppy, contingent nature of evolution is accurate, there undoubtedly is no single signature or definition.
Instead, we have let organisms (sometimes enslaved by us experimentally) teach us their workings step by step. But it is important to understand the degree to which what we ask of organisms affects what we see and how we interpret what we see. Thus, our knowledge is built on observation filtered through and directed by theory. The theoretical understanding that DNA codes for mRNA that codes for proteins was discovered through experiments, but the information now can be applied to entirely new sequences.
For example, it is possible to separate mRNA from DNA in cells. We can then use mRNA sequences to determine the sequence of the corresponding exonic DNA sequences in the genome. The exons are separated by introns, which are excised when a gene is transcribed and thus are not part of the messenger RNA, but computer-based sequence alignments can be done to find the sections of genomic DNA that code for the successive exons that assembled to form the RNA. This shows us where the gene is in the genome, and, since only exonic sequence is included in mRNA, we can identify exon-intron boundaries by finding how an mRNA sequence is interrupted in the genome. This in turn makes it possible to identify the sequence characteristics of exon-intron boundaries and transcription start/stop sequences in the genome. (We can also work backward to DNA by first identifying the amino acid sequence of a protein, which can be done chemically).
Of course, the Central Dogma of DNA ^ RNA ^ protein is only partly correct. DNA does many other things. Once we knew that genes were selectively expressed in ways affected by DNA itself, we could identify the sequence elements (REs) that are responsible. mRNAs are informative of gene regulation because genes used in the same tissue may share some regulatory mechanisms. For example, once we find their genomic location, we can search the chromosomal regions of coexpressed genes to see whether they share any sequence elements in their flanking regulatory regions. Genes and their flanking regions can also be inserted in expression systems (letting bacteria, flies, or even mice express our test region); systematic deletions can identify sequence elements necessary for particular expression patterns. Then, because our theory is that response elements are bound by transcription factor proteins, we can use these elements experimentally to "trap" proteins that are bound to them in cultured cells and use this in turn to identify the transcription factor genes themselves.
There are developing databases of known RE sequences (pdap1.trc.rwcp.or.jp/ research/db/TFSEARCH.html). It is clear that simple catalogs of REs cannot suffice to identify or characterize regulation from bioinformatic (sequence data) approaches alone. Some success with such computerized searches has been reported, however, and candidate sequences have been experimentally confirmed (e.g., Bonifer 2000; Bussemaker et al. 2000; Chiu et al. 2002; Dermitzakis and Clark 2002; Dermitzakis et al. in press; DeSilva et al. 2002; Hardison 2000; 2001; Michelson 2002; Pennacchio and Rubin 2001; Spellman and Rubin 2002).
The theory of evolution instructs that related organisms share similar traits because they share common ancestry. The same is true of genes. Through our understanding of evolution and the nature of DNA and mutation, the amount of sequence variation in regions that are functionally important to selection will be constrained, relative to the amount of drift that occurs in less stringently functional elements. This allows us to identify homologous regions of genomes of one species, to find regions that resemble the sequence of known genes in other related species. Similarly, because of our theory that genes arise by duplication, we can find new genes by looking for different sequences within the same genome that are similar to those of genes we have identified, and we can guess—often rightly—about what types of cells the new genes will be expressed in from the expression pattern of the known gene and the notion that related genes have related function.
We can align sequences from different species, first anchoring the alignments by known regions such as homologous genes. Then, we can search the aligned regions for conserved sequence shared between them. Since theory suggests that highly conserved sequence is likely to have some function, an analysis of the candidate sequence can show us what that function is. This approach can be used to identify at least some of the control elements that are conserved between the tested species (Schwartz et al. 2000). Alignments can identify previously known elements and other good candidates. Phylogenetic methods may be useful (e.g., Blanchette et al. 2002; Blanchette and Tompa 2002; Chiu, Amemiya et al. 2002; Dermitzakis and Clark 2002; Shashikant et al. 1998) in the search for conserved REs in subsets of related species that share a particular trait (e.g., type of teeth or limb or leaf structure).
We read the book of life by applying external information—viewing life from outside, whereas organisms have to do it all from within. They were also "designed" from within, that is, through evolution by phenotype, bit by bit over billions of years.
It easy to see why we have trouble stereotyping the function of sequence too tightly, or thinking too deterministically about how it all works, or that it must be as neatly organized and regular as something we might design from the outside. Nonetheless, the various methods described have allowed scientists to peer rather successfully into the private business of every species we choose to look at.
Chickens and Eggs—Literally
Because this all occurs in biological context, elements that on the surface do not appear patterned or modular or meaningfully repeated are in fact densely filled with absolutely vital information. The trick is not the sequence alone but in its context along the chromosome in living cells because that is what determines what the sequence means.
So then, what is primary? DNA or its context—the chicken or the egg? Life may not exist without DNA, but DNA by itself would just sit there, inert. Nor does DNA control the expression of proteins or their structure. That only happens in the natural cellular context of proteins and other elements already in a cell (which is why viruses are not "alive" by themselves).
TFs themselves are proteins, which means that, if they are actively regulating genes in a cell, their own regulating mechanisms must be active in the same cells. This shows the rolling circle of developmental regulation because these TFs must thus be activated by other TFs, and so on. The mechanism that activates each TF in a cell must either have been active in that cell's mitotic ancestors or must be stimulated in the new cell by developmental signals coming into the cell from outside.
Biologists give primacy to DNA and say that the egg only makes a chicken because it starts out with DNA. But this reasoning, that the proteins that make all this possible are encoded in the DNA, is inextricably circular. This chain of events goes back to the organism's first cell(s), and hence to its parent, and ... as fleas on fleas on fleas on dogs ... ad infinitum. To break the circle, and settle the ultimate chicken-or-egg argument, we would have to go back 4 billion years—to the hypothesized RNA world. If the original "soup" comprised chemical reactions between RNA and amino acids, then there was neither chicken nor egg: the code developed by the addition of function to elements already interacting. Neither came first in time or importance. Nor has one been able to function without the continued presence of the other for the consecutive billions of years of life on Earth.
It is basically for our own practical reasons, to provide a research and interpretive strategy, that today we assign evolutionary primacy to DNA: the circle of life continually rolls through a DNA component, and that codes for the proteins that do everything else (including regulate DNA). This means that DNA is the entry point for new heritable variation, and to that extent stream of evolution is then mediated through this process. But that perspective on life is a decision of the culture that is science, not the only way we might view life, if we chose to feature other of its aspects. As discussed in Chapter 3, the flow of genetic information it is not even the exclusive element of inheritance.
Was this article helpful?