The information in the human genomic DNA has been referred to as the book of life (1). Linear arrays of three-letter words (codons) specify the coding regions of genes and thus the amino acid sequence of proteins. The language metaphor can be extended to state that genomic DNA should include regulatory words (cis elements) that specify the signals controlling the expression of genes (2). cis-regulatory elements often specify functional transcription factor binding sites but may also include signals with currently unknown functions (2-5).

Control signals may correspond to sequence elements that occur frequently in the regulatory regions of genes. Based on this hypothesis, researchers have compiled several collections of sequence motifs derived from the promoter regions of protein-coding genes (2,6-8). It is thought that collections of this type could help with discovery of the sequence context of "words" that exert

From: Methods in Molecular Biology, vol. 338: Gene Mapping, Discovery, and Expression: Methods and Protocols Edited by: M. Bina © Humana Press Inc., Totowa, NJ

control over gene expression (2). Furthermore, discovery of lexical characteristics of specific sequence motifs in genomic DNA can help with classification of cis elements according to the genes that they control.

To discover the lexical features of regulatory words, previously we created a database of 9-mers derived from the promoter regions of a subset of the human protein-coding genes (2). In this report we describe how information can be extracted from that database through the web: http://bina-grid.chem.purdue. edu/genome/.

