With the completion of the human genome sequence and the increasing availability of whole genome shotgun sequences (WGS) for numerous other eukary-otic species, we are poised to begin to understand the complexity and dynamic nature of chromosomes. Segmental duplications are nearly identical segments of DNA at two or more sites in a genome; for human they comprise about 3.5 to 5% of the total DNA content (1,2). Segmental duplications also account for 1.2 to 2% of the mouse genome (3,4) and approx 3% of the rat genome (5). Segmental duplications (also called low copy repeats [LCRs]) can be predisposition sites for increased opportunity of nonallelic homologous recombination leading to deletion, inversion, or duplication of large segments of DNA (6).

From: Methods in Molecular Biology, vol. 338: Gene Mapping, Discovery, and Expression: Methods and Protocols Edited by: M. Bina © Humana Press Inc., Totowa, NJ

These structural alterations may lead to the gain or loss of dosage-sensitive genetic material and may result in a spectrum of diseases defined as genomic disorders (7-9).

The presence of segmental duplications is a common feature of many mammalian genomes, and their involvement in chromosome evolution and natural variation is an area of active investigation (10-12). Duplication of large segments of DNA can generate duplicate genes in whole (13), or in part (14), and may lead to an expanding repertoire of similar gene products. The identification of recent segmental duplication therefore gives us the ability to map the origin and fate of duplicate genes, which are a driving force in species evolution (see Note 1).

Here we define recent segmental duplications as paralogous regions of a genome having a length greater than 5000 nucleotides (nt) and having greater than 90% DNA sequence identity. We present a computational protocol for identifying and mapping recent segmental and gene duplications in eukaryotic genomes. The major procedures involved in identifying recent segmental and gene duplications include comparing genomic sequences using BLAST (15), parsing and filtering BLAST alignments, and mapping genes to segmental duplications to identify gene duplicates. We note that much of our methodologies have arisen in an ongoing initiative to map segmental duplications accurately in the human (2), chimpanzee, mouse (3), and other mammalian genomes as displayed at publicly available websites ( and http://

0 0

Post a comment