Table 5 List of Nucleotide Sequence Databases Available at NCBI
All nonredundant GenBank + EMBL + DDBJ + PDB sequences (but
Month dbest dbsts htgs no EST, STS, GSS, or phase 0, 1, or 2 HTGS sequences) Subset of nr that is new or modified in the last 30 days Nonredundant database of GenBank + EMBL + DDBJ EST divisions Nonredundant database of GenBank + EMBL + DDBJ STS divisions htgs unfinished High Throughput Genomic Sequences: phases 0,1, and kabat
2 (finished, phase 3 HTG sequences are in nr) Yeast (Saccharomyces cerevisiae) genomic nucleotide sequences E. coli genomic nucleotide sequences Sequences derived from the 3-dimensional structure Kabat's database of sequences of immunological interest repeats from query sequences Eukaryotic Promoter Database
Genome Survey Sequence, includes single-pass genomic data, exon-
trapped sequences, and Alu PCR sequences stretch). Such regions of sequence could spuriously obtain extremely high scores. For this reason, the default option is to include filtering in the NCBI BLAST server. The SEG  program is used for masking proteins, and the DUST  program is used for masking DNA sequences. These programs are not guaranteed to filter all the low-complexity sequences. The user has to be careful that sometimes valid hits might be missed if part of the sequence is masked.
It is advisable for users to go with the default matrix and gap parameters. These parameters determine how similarity between two sequences is determined. When two residues are aligned, programs use the matrix to determine whether the amino acids are similar or very different. The default matrix is BLOSUM62 . The user has to understand the evolutionary implications of various matrices before using them in a sequence search. The gap parameters determine how much an alignment is penalized for having gaps. There are other parameters that determine the heuristics that BLAST uses. By altering these numbers, the user can alter the sensitivity and speed of the search. These parameters are complex and beyond the scope of this chapter. It is very rare for users to alter these parameters from the defaults. The FASTA program has one such parameter, which can be beneficial for users. It is called ktup. Searches with ktup = 1 are slower but more sensitive than BLAST; ktup = 2 is fast but less effective. The third set of parameters determines how many matches have to be reported. These numbers can be changed at the user's discretion.
Was this article helpful?