Perform Sequence Alignments of All Possible Pairs of Chromosomes Using MegaBLAST

The MegaBLAST program is used to perform sequence alignments because it was designed to identify long alignments efficiently between similar sequences. Since we have defined recent segmental duplications as long stretches of DNA (>5000 nt) having greater than 90% sequence identity, MegaBLAST is ideal at identifying these paralogous regions of the genome. After creating the BLAST databases for each chromosome, MegaBLAST is used to perform sequence alignments between all possible pairs of chromosomes. In other words, each FASTA file is compared with each of the BLAST databases (see Note 10).

The following command is an example of using MegaBLAST to find sequence alignments between mouse chromosome 7 and mouse chromosome 3.

% megablast -d chr7.fa -i chr3.fa -D 2 -F 'm' -U T -o chr7.3.blast

In the example above, the "-d chr7.fa" option specifies that MegaBLAST use the mouse chromosome 7 BLAST database as the subject of this comparison and the "-i chr3.fa" option specifies mouse chromosome 3 as the query sequence. Sequence alignments are stored in the chr7.3.blast output file as specified by the option "-o chr7.3.blast" and the format of output generated is "traditional BLAST output" as specified by the "-D 2" option. Furthermore, "-U T" specifies that lower case letters in the query sequence should be recognized as a repetitive element. The "-F 'm'" option denotes that the MegaBLAST algorithm should not find word matches in the repetitive regions of the query sequence but should allow for extension of sequence alignments through these regions.

Below is a detailed description of the command line options that are required to perform sequence alignments using MegaBLAST to identify segmental duplications in a genome:

megablast 2.2.10 arguments: -d Database [String]

default = nr -i Query File [File In] -D Type of output:

0 - alignment endpoints and score

1 - all ungapped segments endpoints

2 - traditional BLAST output

3 - tab-delimited one-line format [Integer] default = 0

-F Filter query sequence [String] default = T

-U Use lower case filtering of FASTA sequence [T/F] Optional default = F

-o BLAST report Output File [File Out] Optional default = stdout

A full description of this command and its options is included with the documentation supplied with the BLAST suite of programs and is also available at the NCBI website (

Sequence alignments generated by MegaBLAST between a subject database and a query sequence of the same chromosome are used to identify intra-chromosomal segmental duplications (i.e., duplications that occur within the same chromosome). Sequence alignments generated by MegaBLAST between a subject database and a query sequence of different chromosomes are used to identify interchromosomal segmental duplications (i.e., duplications that occur between different chromosomes). Executing MegaBLAST on a subject database and query sequence generates many sequence alignments. Not all of these represent sequences involved in segmental duplications, so further steps are required to convert, filter, and process these alignments based on a variety of criteria. These criteria are described in the sections below.

0 0

Post a comment