The page at provides a link to an interface that would allow you to query the database. Several queries were tailored to facilitate retrieval of information about the rank of each 9-mer, in a given segment in the promoter region of human genes, and to obtain a listing of the genes with which each 9-mer was associated. Gene association was deduced from the definitions listed in GenBank files, on February 2004 (see Note 4).

Figure 1 shows the control keys through which queries can be made. Below, we explain the type of information that can currently be retrieved.

1. Use "Search 9_mers by ID" to obtain information about a 9-mer according to its ID (Fig. 1). The format is RF, followed by a number, followed by r or f (providing the orientation of the 9-mer). For example, type RF129572f and click on submit Query next to the query box. The output will provide:

a. The ID of the sequence (in that example, RF_ID = RF129572f ).

b. The sequence of the 9-mer (TTTCGCGCC) associated with that ID.

c. The number of times the sequence was found between positions +50 to -50 [Hits (+50/-50)], between positions -50 to -500 [Hits (-50/-500)], and between positions +50 to -500 [Hits (+50/-500)], of the analyzed genes.

d. The rank of each 9-mer, reflecting its statistical significance in the specified region.

2. Use "Search 9_mers by sequence" to obtain information about a 9-mer according to its nucleotide sequence (Fig. 1). For example, in the query box, type TTTCGCGCC. The query typed in that box should contain exactly 9 bases (see Note 5). You will get the same information as that obtained from the query "Search 9_mers by ID," described above. You can use the "save as" option in your web browser to save the results as a text file.

3. Use "List genes corresponding to a 9_mer" to obtain a listing of genes whose promoter region includes a specific 9-mer (see Notes 5 and 6). For example, TTTCGC GCC includes the consensus sequence (TTTSSCGC) for interactions with the E2F family of transcription factors. Therefore, you might be interested in determining whether the occurrences of that 9-mer (in promoter regions of human genes) could be correlated with genes associated with the control of the cell-cycle. This information can be obtained by using the control key named "List genes corresponding to a 9_mer" (Fig. 1). The listing is for the occurrences between positions -500 to +50 of each gene. This region of some genes may contain more than one occurrence of a given 9-mer. In that case, the gene will be listed twice.

As an example, Table 1 shows a subset of the gene list obtained for TTTCGCG CC. The list includes the ID associated with that 9-mer and the GenBank accession number for the cDNA sequence with respect to which the promoter region of the gene has been defined. The list also includes the definition (Annotation) in the GenBank files (see Note 7). From the inspection of the listing in Table 1, we can identify several genes that are candidates for regulation during the cell cycle. Examples include dihydrofolate reductase; MCM5 minichromosome maintenance

I a BINA - Microsoft Internet Explorer f- n5 II X

File Edit View Favorites 7oo!s Help


Qtok - © [si iî> psw e B- ■ □ 10. &

V Go Lmkn **

Google- V (fr Search LMeti - 0 S>5 blocked 3 0Optigns

Search for sequence elements in promoter regions of a subset of human genes; For details please check and cite die following reference

Bina M, Wyss P, Ren W, Szpankowski W, Thomas E, Randfoawa R, Reddy S, John PMr Pares-Matos EI, Stein A, Xu H, Lazarus SA. Exploring the charactensbcs of sequence elements in proximal promoters of human genes Genomics 84:929-940 (2004) PubMed

Search 9_mers by ED

[ Submit Query

• Search 9_mers by sequence

Submit Query (

List gene s corresponding to a 9_mer Note 1

[ Submit Query

* Search for 9 conbgous bases or less.

List gene s corresponding to 9 conhgous bases or less Note 1

j Submil Query ]

Search for 9 bases containing ambiguity codes: Format 1

[ Submit Query

List genes corresponding to 9-base elements containing ambiguity codes: Format 1

[ Submit Query

List genes corresponding to 9-base elements containing ambiguity codes: Format 1

[ Submit Query


•0 Internet

| start

.6 B G « »

M 3 winctawaEx.. * gt Adobe AcrnbatP... r-:- r^pnrpF?H Prf:

C5pmreEze Rra..

Prevew and Erfri.

■aaiNA-MldTSOft ... Àj î:48 PM |

Fig. 1. The web-interface to a database of 9-mers from promoter regions of human genes.

Table 1

Example of Output of Gene List

Table 1

Example of Output of Gene List



Accession no.





Pyridoxal (pyridoxine, vitamin B6)





MCM5 minichromosome

maintenance deficient 5




Dihydrofolate reductase




Histone 1, H2bk




Transcription factor DP-1

deficient 5, cell division cycle 46; and transcription factor DP-1 (Table 1). From the listing we can infer that TTTCGCGCC may function as a cis element in the regulation of genes associated with the cell cycle. Therefore, the results offer hypotheses that could be tested in experimental studies, for example, DNA binding assays and analyses of data obtained from microarrays.

4. Use the control key "Search for 9 contiguous bases or less" to obtain information about sequences that are shorter than 9 bases (see Note 8). For example, you may wish to identify sequence elements that include a potential E2F site. The results will be a listing similar to that obtained in step 2, but the list will include more hits (see Note 9).

5. Use "List genes corresponding to 9 contiguous bases or less" to type a sequence that is shorter than 9 to obtain a listing of genes whose promoters contain the sequence used as query (see Note 9).

6. Use "Search for 9 bases containing ambiguity codes" to type a sequence that includes ambiguity codes. To include ambiguity codes, use brackets. For example, the query GGGG[CT]GGGG will search for GGGGCGGGG and GGGGTGGGG. TT[ATGC]AA will search for TTAAA, TTTAA, TTGAA, and TTCAA. You can include ambiguity at several positions. The total length of the sequence in the query should add up to 9 bases or less (see Note 8). The bases in the brackets are counted as one base. For example, for the consensus E2F site (TTTSSCGC), the query would be TTT[GC][GC]CGC. The result of that query will list all 9-mers that contain TTTSSCGC, as well as the ranking of each 9-mer in the specified regions. From the ranks, you can identify the statistically significant sequences.

7. Use "List genes corresponding to 9-base elements containing ambiguity codes" to type a sequence that includes ambiguity codes to obtain a listing of genes that contain that sequence in their promoter region. The format of the sequence used as query is the same as that described in step 6. Clearly, short sequence motifs for TF sites would produce a long list. This would increase the number of the candidate genes. The tradeoff is that the list could include false predictions. Nonetheless, the query can narrow down the list of genes that could be tested for experimental validation (see Note 9).

4. Notes

1. The complete set (BINA_RF_sorted_master9) can be downloaded from http://www.

2. Referring to 9-mers as pairs would allow their identification, irrespective of their orientation in DNA. This scheme also eliminates problems arising from redundancy, since it considers the complementary pairs to represent the same sequence element in the genomic DNA.

3. For more details, see Chapter 11 in this volume.

4. The definition in a GenBank file may change when it is updated. Therefore, for genes of interest check GenBank for the updated definitions. To do so, go to the nucleotide database at NCBI. For batch retrieval, go to Batch Entrez and upload a text-file that includes several accession numbers.

5. Other query boxes can be used to type a sequence containing fewer bases or a sequence that includes ambiguity codes. However, it will take a longer time to obtain the results.

6. The current database includes the promoter regions of nearly 4500 human protein-coding genes.

7. If the gene no longer exists in GenBank, you may get accessionnumber.promoter, with no associated definition.

8. Queries that are shorter than 9 bases and those that include ambiguity codes may take several minutes to produce an output.

9. To save the output in a text file, use the "Save As" option in the File menu in your browser.


1. Collins, F. S., Green, E. D., Guttmacher, A. E., and Guyer, M. S. (2003) A vision for the future of genomics research. Nature 422, 835-847.

2. Bina, M., Wyss, P., Ren, W., et al. (2004) Exploring the characteristics of sequence elements in proximal promoters of human genes. Genomics 84, 929-940.

3. Baldi, P., Brunak, S., Chauvin, Y., and Pedersen, A. G. (1999) The biology of eukaryotic promoter prediction—a review. Comput. Chem. 23, 191-207.

4. Bina, M. and Crowley, E. (2001) Sequence patterns defining the 5' boundary of human genes. Biopolymers 59, 347-355.

5. Lemon, B. and Tjian, R. (2000) Orchestrated response: a symphony of transcription factors for gene control. Genes Dev. 14, 2551-2569.

6. Hutchinson, G. B. (1996) The prediction of vertebrate promoter regions using differential hexamer frequency analysis. Comput. Appl. Biosci. 12, 391-398.

7. Marino-Ramirez, L., Spouge, J. L., Kanga, G. C., and Landsman, D. (2004) Statistical analysis of over-represented words in human promoter sequences. Nucleic Acids Res. 32, 949-958.

8. FitzGerald, P. C., Shlyakhtenko, A., Mir, A. A., and Vinson, C. (2004) Clustering of DNA sequences in human promoters. Genome Res. 14, 1562-1574.

9. Trinklein, N. D., Aldred, S. J., Saldanha, A. J., and Myers, R. M. (2003) Identification and functional analysis of human transcriptional promoters. Genome Res. 13, 308-312.

0 0

Post a comment