trapped sequences, and Alu PCR sequences stretch). Such regions of sequence could spuriously obtain extremely high scores. For this reason, the default option is to include filtering in the NCBI BLAST server. The SEG [51] program is used for masking proteins, and the DUST [52] program is used for masking DNA sequences. These programs are not guaranteed to filter all the low-complexity sequences. The user has to be careful that sometimes valid hits might be missed if part of the sequence is masked.

4. Other Parameters

It is advisable for users to go with the default matrix and gap parameters. These parameters determine how similarity between two sequences is determined. When two residues are aligned, programs use the matrix to determine whether the amino acids are similar or very different. The default matrix is BLOSUM62 [53]. The user has to understand the evolutionary implications of various matrices before using them in a sequence search. The gap parameters determine how much an alignment is penalized for having gaps. There are other parameters that determine the heuristics that BLAST uses. By altering these numbers, the user can alter the sensitivity and speed of the search. These parameters are complex and beyond the scope of this chapter. It is very rare for users to alter these parameters from the defaults. The FASTA program has one such parameter, which can be beneficial for users. It is called ktup. Searches with ktup = 1 are slower but more sensitive than BLAST; ktup = 2 is fast but less effective. The third set of parameters determines how many matches have to be reported. These numbers can be changed at the user's discretion.

