Sequencing an Entire Genome

The ultimate goal of structural genomics is to determine the ordered nucleotide sequences of entire genomes of organisms. The main obstacle to this task is the immense size of most genomes. Bacterial genomes are usually at least several million base pairs long; many eukaryotic genomes are billions of base pairs long and are distributed among dozens of chromosomes. In addition, for technical reasons, it is not possible to begin sequencing at one end of a chromosome and continue straight through to the other end; only small fragments of DNA—usually from 500 to 700 nucleotides — can be sequenced at one time. Therefore, determining the sequence for an entire genome requires that the DNA be broken into thousands or millions of smaller fragments that can then be sequenced. The difficulty lies in putting these short sequences back together in the correct order. As we will see, two different approaches have been used to assemble the short, se-quenced fragments into a complete genome.

The first genomes to be sequenced were small genomes of some viruses. The genome of bacteriophage X, consisting of 49,000 bp, was completed in 1982. In 1995, the first genome of a living organism (Haemophilus in-fluenzae) was sequenced by Craig Venter and Claire Fraser of the Institute for Genomic Research (TIGR) and Hamilton Smith of Johns Hopkins University. This bacterium has a relatively small genome of 1.8 million base pairs (I Figure 19.9). By 1996, the genome the first eukaryotic organism (yeast) had been determined, followed by the genome of Eschericia coli (1997), Caenorhabditis elegans (1998), and Drosophila melanogaster (2000). The first draft of the human genome was completed in June 2000.

Map-based sequencing The first method for assembling short, sequenced fragments into a whole-genome sequence, called a map-based approach, requires the initial creation of detailed genetic and physical maps of the genome, which provide known locations of genetic markers (restriction sites, other genes, or known DNA sequences) at regularly spaced intervals along each chromosome. These markers can later be used to help align the short, sequenced fragments into their correct order.

Numbers represent a scale in base pairs.

1,600,000 Sma I Sma I

1,400,000 Sma I

Sma I 1,200,000

Numbers represent a scale in base pairs.

1,600,000 Sma I Sma I

1,400,000 Sma I

Sma I 1,200,000

Sites where restriction enzymes cleave DNA are shown with the enzyme name.

The fourth circle shows ribosomal operons (green), tRNAs (black) and |-like prophage (blue).

Sma I 1,000,000

The fifth circle shows positions of simple tandem repeats.

Sites where restriction enzymes cleave DNA are shown with the enzyme name.

The outer circle shows genes whose function is indicated in the key.
0 0

Post a comment