Cells have to react and adapt to various external constraints like temperature, oxygen supply, or mechanical stress. Evidently, external conditions require complex responses involving the coordinate expression of many genes. Unlike in bacteria, coordinately expressed genes are not spatially linked. Nevertheless, it should be possible to activate genes from disjoint parts of the genome, simultaneously. Britten and Davidson (1) proposed a model for coordinate expression in unlinked genes. Genes regulated in parallel with one another would contain common control elements. As specific signals have to be met by individual responses, cells require a set of freely combinable control elements. The product

From: Methods in Molecular Biology, vol. 338: Gene Mapping, Discovery, and Expression: Methods and Protocols Edited by: M. Bina © Humana Press Inc., Totowa, NJ

of an integrator gene would recognize a specific control element. This product would then activate all genes containing one particular control element.

Today, we know that DNA sequence elements about 6 to 30 bp in length serve as regulatory elements (2). Such elements can be divided into two classes: general and specific ones. General ones like the TATA box (consensus sequence TATAWAW) are constituents of many promoters, whereas combinations of specific binding site make up the identity of a promoter region.

Regulatory elements close to the start site of transcription constitute the promoter region. Incoming signals from many different sources can be integrated at this level. Activating factors either ease complex assembly of the transcription machinery or stimulate the activity of the already assembled complex. Activators have a multitude of targets to exert their function, and multiple interactions are the reason for strong synergistic activation.

Speaking of gene regulation, it has been known for a long time that there is considerable sequence conservation between species in noncoding regions in general and promoter regions in particular. Sequence conservation within promoter regions often stems from transcription factor binding sites that are under selective pressure (3).

The CORG workbench (4) provides access to precompiled annotation of promoter regions. Information of two kinds is explicitly considered. First, biological meaningful cross-species conservation is detected within upstream regions of homologous genes. An upstream region is defined by a sequence window of 15 kb upstream of the translation start site. If other data (validated transcription start sites or exon annotation) suggest a different extension of an upstream region, this is taken into account, and the corresponding region is adjusted. Pair-wise as well as multiple sequence alignments are computed employing motifs as alignment anchors. Second, binding site descriptions (position-weight matrices) are used to predict conserved regulatory elements with a novel approach. Binding site description stems from the TRANSFAC database (5). Exon annotations and verified transcription start sites are incorporated to distinguish exonic from nontranscribed sequence. CORG is built on top of the EnsEMBL database (6) and utilizes protein homology and gene structure information from this resource.

CORG is fitted with an intuitive interface that leads the user to the information of her choice. CORG contents are accessible as graphical and textual information. Various export functions exist to obtain and process any CORG data locally.

The subsequent sections will guide you step by step through the CORG protocol for upstream region analysis. An entire example session dealing with the analysis of an example gene, E2F2, is shown there. The protein encoded by this gene is a member of the E2F family of transcription factors. The E2F family plays a crucial role in the control of the cell cycle and the action of tumor sup pressor proteins and is also a target of the transforming proteins of small DNA tumor viruses (7).

0 0

Post a comment