Motiffinding Techniques for Analysis of Composite Elements

The 'ab initio' motif-finding techniques, which do not take into account any previous knowledge about possible known motifs, often have problems in finding TF binding sites that are 'too weak'. These are instances of sites that differ significantly from their consensus while still serving as targets for the TFs. As described before, such 'weak sites can function due to synergism with other sites in CEs. Usually the binding of transcription factors to such weak sites is stabilized by protein-protein interactions of this TF with other TFs that bind to nearby sites. Because traditional motif-finding algorithms usually find one (or a few) high-scoring patterns, they often fail to find CEs that consist of pairs of weak TF sites. One or both sites in such pairs may

Fig. 12.4 An example of NF-AT/AP-1 CE in the promoter of the mouse inter-leukin-2 gene. The AP-1 site differs from the canonical AP-1 consensus.

not be statistically significant on its own. An example of such a composite element is shown in Figure 12.4.

We can see in this composite element that the AP-1 site differs very much from the canonical AP-1 consensus (shown below). It is clear that such a site cannot be found alone.

Recently, a couple of new approaches have appeared for revealing such composite motifs in sets of sequences: BioProspector (Liu et al., 2001), Co-Bind (GuhaThakurta and Stormo, 2001), and MITRA (Eskin and Pevzner, 2002). The first two are based on an extension of the Gibbs sampling techniques for finding significant motifs consisting of two modules that have some flexible distance between them. The algorithm maximizes the joint likelihood of co-occurrence of two motifs. The MITRA approach is a pattern-driven approach based on enumeration of l-mers. The algorithm uses a mismatch tree data structure to split the space of all possible patterns into disjoint subspaces that start with a given prefix. It thus avoids an 'explosion' of the search space for long composite motifs consisting of two parts.

All these approaches have proved their efficiency for some examples of composite motifs in yeast and bacterial genomes.

0 0

Post a comment