[1 Exploring Trafficking GTPase Function by mRNA Expression Profiling Use of the Sym Atlas Web Application and the Membrome Datas ets

By Cemal Gurkan, Hilmar Lapp, John B. Hogenesch, and William E. B alch

Abstract

Despite complete sequencing of the human and mouse genomes, functional annotation of nov el gene function still remains a major challenge in mammalian biology. Emerging strategies to help elucidate unknown gene function includ e the analysis of tissue -specific patterns of mRNA expression. A recent study investigated the steady-state mRNA expression profiling of the vast majori ty of protein-encoding human a nd mouse genes across a panel of 79 human and 61 mouse nonredundant tissues. The microarray data from this study constitutes the Genomics Institut e of Novartis Foundation (GNF) Human and Mouse Gene Atlases and is publicly available for exploration through the SymAtlas web -application (http://symatlas.gnf.org). We have recently rep orted the use of these data and hierarchical clustering algor ithms to generate a global overview of the distribution of Rabs, SNAREs, and coat machinery componen ts, as well as their respective adaptors, effectors, and regulators. This systems biology approach led us to prop ose Rab-centric protein activi ty hubs as a framework for an integrat ed coding system, the membrome ne twork, which orchestrates the dynamics of specialized membran e architecture of diff er-entiated cells. Here, we describe the use of the SymAtlas web-application and the Membrome datasets to help explore trafficking GTPase function. The human and mouse membrome datasets are available through the Membrome homepage (http://www.membrome.org/) and correspond to subsets of the SymAtlas content restricted to known membrane trafficking components. considering the fragmentary nature of the current reductionist approaches in elucidating trafficking component functions, the membrome datasets provide a more focused systems biology perspective that not only complements our current understanding of transport in complex tissues but also provides an integrated perspective of Rab activity in controlling membrane architecture.

Introduction

Traffic along the eukaryotic secretory and endocytic pathways is characterized by the formation and maintenance of numerous subcellular compartments defined by the encapsulating lipid bilayer with varying phospholipid composition and unique sets of integral and peripheral membrane proteins. These subcellular compartments are dynamic structures that are in continuous and specific communication through carrier vesicles and tubules that mobilize cargo to specific destinations (Bonifacino and Glick, 2004; Kirchhausen, 2000; Pfeffer, 2003). By harnessing and regulating the fundamental processes of membrane fission and fusion through the action of protein complexes, the lipid bilayer can be exploited to produce a variety of distinct subcellular compartments that have unique chemical environments and play essential roles in cell and organ function. While it is clearly evident that the key to these dynamic processes is the systematic and reversible regulation of protein interactions, our understanding of the molecular basis for the global organization of the exocytic and endocytic trafficking systems still remains fragmentary.

Traditionally, phylogenetic analysis of proteins in a gene family is commonly used to identify potential functional relationships to other family members, such as that of the Rab GTPases and the SNARE family of docking/fusion proteins directing eukaryotic membrane traffic (Bock et al., 2001; Chen and Scheller, 2001; Pereira-Leal and Seabra, 2000, 2001; Ungar and Hughson, 2003). Computational approaches applying hierarchical clustering algorithms to systematic tissue profiling can complement this annotation by providing insights into the physiological activity of close and distant family members, and to different gene families in different cell types (Panda et al., 2002; Su et al., 2004; Walker et al., 2004). We have also recently described the use of steady-state mRNA expression profiling and hierarchical clustering algorithms to generate a global overview of the distribution of Rabs, SNAREs, and coat machinery components, as well as their respective adaptors, effectors, and regulators in 79 human and 61 mouse nonredundant tissues (Gurkan et al., 2005). Our systems biology approach had led us to propose that membrane trafficking events are largely orchestrated by Rab-regulated protein hubs that can be linked to biochemically characterized components of the coat, and tethering, targeting, and fusion machineries (Gurkan et al., 2005). We refer to this collection of interacting components that define the specific membrane architectures of a given cell type as the membrome network. Here, we describe the use of the SymAtlas web-application to help explore trafficking GTPase function by browsing through the Genomics Institute of the Novartis Research Foundation (GNF) Human and Mouse Gene Atlases, as well as their subsets, the human and mouse membrome datasets that we have compiled from the literature (Gurkan et al., 2005).

Materials Microarray Data

Microarr ay data mentioned in this chapter are from the GNF Human and Mouse Gene Atlases (Version 2) ( Su et al., 2004).

SymAtlas Web-Application

SymAtlas is a web-application (http://symatlas.gnf.org) for publishing experimental gene functionalization datasets (e.g., the GNF Human and Mouse Gene Atlase s) integrat ed with a flexibly searchable gene-centric database of public and proprietary annotations. It is publicly available for searching and visualization by keyword, accession num ber, gene symbol, genome interval, sequence, expression pattern, and coregulation. By default, visualization of the mRNA expression profile for each gene is provided in the form of a bar ch art, following con densation of the raw data using the Microarray Analysis Suite 5.0 (MAS 5.0) Software (Affymetrix, Inc., Santa Clara, CA) or GeneChip RMA (GCRMA) Algorithm (Wu and Irizarry, 2004).

Human and Mouse Membrom e Datasets

To help explore trafficking GTPase function, we have compiled human and mouse membrome datasets that currently comprise ^450 human/ mouse proteins corresponding to known trafficking component s within the cell (Gurkan et al., 2005). These datasets can be accessed through the Membrome homepage (http://www.membrome.org) for direct searching and visua lization using the SymAtlas web-application as described above. They may also be accessed by selecting either the human or mouse "membrome'' option from the dataset selection box/pull-down menu available on any given SymA tlas gene annotat ion or mRNA exp ression profile page.

Exploring Trafficking GTPase Function

Use of the SymAtlas Web-Application

SymAtlas web -application (http://symatlas.gnf.org) can be accessed using an internet browser such as the Microsoft Internet Explorer, Apple Safari, Mozilla Firefox, etc. Once at the SymAtlas homepage, the mRNA

expression profile of a gene or protein of interest (e.g., Rab3A) can be searched using either of the two query forms (or search fields) available. The smaller query form at the top of the homepage (and the larger query form immediately below it when using the default parameters provided) treats the user input as gene symbols, aliases, accession numbers, or identifiers from all species available. Through the use of pull-down menus provided, the larger query form also allows selection of additional search parameters such as that by keywords in order to locate all the related genes (e.g., Rab3A, Rab3A interacting protein Rab3IP, etc.) in SymAtlas. Further selectable search parameters at the SymAtlas homepage include user input of the reference author or title, sequence, protein domain and family, and genome interval, as well as expression pattern and coregulation.

Figure 1 is a snapshot image of a typical search results page available through SymAtlas. In this particular case, the search results page was reached following user input of ''Rab3A'' in the query form available at the SymAtlas homepage and then implementing a search using the default parameters provided (for direct access to this page, click here). The screen layout of a typical search results page at SymAtlas is organized as follows.

A. A navigation panel on the left-hand side for exploring the search results set (Fig. 1A). The result set shown in this panel may contain a single gene or multiple genes from one or more species, depending on the criteria chosen on the starting search page. All genes in the result set are hyperlinked to their corresponding ''bioentry'' pages at SymAtlas in two ways. The bar chart icon to the left of each gene name is clickable and provides a link to the corresponding mRNA expression profile (profile view), whereas the gene name (shown in blue and underlined) itself provides a link to the corresponding full gene annotation page (annotation view). If there is no bar chart icon present to the left of a gene name, then there is no mRNA expression profile data available for this gene within SymAtlas.

B. A main panel in the middle displaying the mRNA expression profile of the gene of interest (Fig. 1B). Alternatively, when selected from the navigation panel, the main panel may show the full annotation view for a given gene, or an annotation table for multiple genes. While by default the mRNA expression profile is presented in the form of a horizontal bar chart, alternative forms of data presentation are also available for selection through the ''render'' pull-down menu available above the navigation panel. In the bar chart view, the microarray data for each tissue are presented along with three additional lines for the median, three times the

Fig. 1. A snapshot of a typical search results page at SymAtlas. In this case, the results page corresponds to a simple search carried out at the SymAtlas homepage (http://symatlas. gnf.org) as described in the text using the search term "Rab3A" and the default search parameters provided. The screen layout is divided into three sections, namely a navigation panel on the left-hand side for exploring the results list, a main panel in the middle for the graphic display of the selected gene's mRNA expression profile, and finally an annotation panel on the right-hand side corresponding to that of the selected gene. (See color inserts labeled in red fonts with A, B, and C, respectively.)

Fig. 1. A snapshot of a typical search results page at SymAtlas. In this case, the results page corresponds to a simple search carried out at the SymAtlas homepage (http://symatlas. gnf.org) as described in the text using the search term "Rab3A" and the default search parameters provided. The screen layout is divided into three sections, namely a navigation panel on the left-hand side for exploring the results list, a main panel in the middle for the graphic display of the selected gene's mRNA expression profile, and finally an annotation panel on the right-hand side corresponding to that of the selected gene. (See color inserts labeled in red fonts with A, B, and C, respectively.)

median, and 10 times the median (shown as black, blue, and red in Fig. 1B). This allows the user to quickly assess how the peak expression values relate to the median expression for a gene of interest, and also how tissue specific a gene is being express ed. The median expression of each gene is calculated across all of its replicate- averaged expression levels. We find that the median value differs considerably between genes depending on whether it is a housekeeping gene or a tissue specialist, such as in the case of the Rab GTPases (Gurkan et al., 2005). Expression values below 250 roughly correspond to 1-2 copies/cell and define the lower limits of confidence. Finally, the error bars provided with the majority of samples represent the standard deviation between two or more replicates that have been averaged.

C. An annotation panel on the right-hand side for partial annotation of the gene of interest (Fig. 1C). To reach the full annotation page (annotation view), the gene name of interest in the navigation panel needs to be clicked. In both cases, annotation provided includes hyperlinks to relevant entries in various external databases.

Following an initial analysis as described above, the search results obtained can be further extended to locate other genes/proteins that are coclustering or exhibiting similar mRNA expression profiles with the gene of interest. The first assumption is that the level of the mRNA signal reflects the corresponding protein activity. The second assumption is that coclustering of two or more evolutionarily divergent components based on the similarities of their mRNA expression profiles may indicate a potentially direct or indirect interaction between these species and their contribution to a common cellular pathway (Gurkan et al., 2005). To locate such potentially interacting partners, the SymAtlas web-application features a ''profile neighborhood'' search function to identify genes whose expression profile is correlated with that of a query profile as defined by the given Pearson correlation coefficient. To execute this search, the profile view of the gene of interest needs to be selected by clicking on the corresponding bar chart icon in the navigation panel (Fig. 1A). Next, the user needs to specify the cut-off value for the Pearson correlation coefficient between the selected profile and other profiles in the same dataset using the query form located above each profile (Fig. 1, above B and C). Genes with one or more profiles that are correlated by the threshold value will be returned in a new search results list appearing in the navigation panel. It should be noted that the lower the Pearson correlation coefficient cut-off value specified, the longer it would take for the server to complete the profile neighborhood search, and the more ''hits'' are likely to be received.

There is also an additional query form available (through the ''search expression'' tab at the top of each SymAtlas page) for searching genes by fold-over-median expression. Here, the user selects a dataset and the threshold value for the tissue of interest. Executing the search will return all genes with one or more profiles that have an expression level of at least the threshold times the gene's median expression value in the chosen tissue.

Use of the Human and Mo use Mem brome Datasets

We have compile d human and mouse membrome datasets based on the current literature and made them available online for direct searching and visualization using the SymAtlas web-application as describ ed above. These datasets can be easily accessed through the Mem brome homepage (http://www.membrome.org), which also features direct links to relevant literature and supplementary material. They may also be accessed by selecting either the human or mouse ''membrome'' option from the dataset selection box available on the SymAtlas gene annotation or mRNA expression profile pages (e.g., above B in Fig. 1). While necessarily restricted to only currently known components of membrane trafficking, membrome datasets may be better suited for the identification of potentially interacting partners by eliminating genes populating different processes (i.e., intermediary metabolism, mitochondrial function, etc.) that show similar expression patterns (Gurkan et al., 2005).

Further Notes on the Use of Tissue mRNA Profiling Data

Use of expression profiling as a systems biology tool for understanding membrane architecture and trafficking GTPase function requires several considerations.

1. Annotation is affected by the presence or absence of the probe set (s) (reporter) corresponding to the gene of interest on the microarray chip and the quality of the target or labeled DNA prepared from each tissue (Su et al, 2004).

2. Even though the SymAtlas web-application indexes as many gene symbols and identifiers as annotated in several source databases, it should be noted that differences in nomenclature in the databases and the literature may still pose a problem in identifying a gene target of interest.

3. The functional significance of the microarray data used for computational clustering (i.e., profile neighborhood search at the SymAtlas web-application) is based on the assumption that the level of mRNA reflects protein activity. This in fact could vary due to either the differential half-life of a given protein, the presence or absence of regulatory posttranslational modifications, and/or the presence of splice variants not discriminated by this technique. However, elevated expression is expected to be diagnostic of the importance of a protein in the trafficking pathways of a given tissue.

4. The complexity of the samples being examined contributes significantly to the profile. In the case of tissue samples, the profiling represents the aggregate of total mRNA message levels for all cell types in the tissue and the abundance of a particular cell type in the given tissue. Thus, profiling may highlight dominant, cell-specific pathways. More information on the exact tissue samples and cell types analyzed (Su et al, 2004) can be found by following the relevant links provided through the ''download data'' tab at the top of all SymAtlas pages.

5. It should be noted that for some genes (e.g., human RablA), more than one mRNA expression profile may be available at SymAtlas. This is due to the fact that the GNF Human and Mouse Gene Atlases in part use commercially available gene expression arrays (Su et al., 2004), wherein multiple probe sets were designed to one gene at times to ensure that different splice variants are all interrogated, or when a single probe set that is specific to the target transcript and meets melting temperature and other sequence composition parameters could not be found. Alternatively, the UniGene (NCBI) clusters used at the time of the microarray design may have been subsequently merged during the reannotation process. Regardless, any significant differences in the expression patterns between multiple probe sets that are annotated as targeting the same gene can be due to a variety of technical and scientific reasons. These include probe sets that despite the initial in silico prediction, show poor hybridization efficiency, cross-hybridization, or fail to recognize their target due to target site obstruction by secondary structure formation in the mRNA or the probe sequence. Since probes need to target the 3' untranslated region (UTR) of a transcript to optimize the likelihood of detection, the design process relies heavily on the correctness of the current 3'UTR annotation of transcripts in public databases. If for less well characterized genes the 3'UTR was overpredicted, a number of probes in the probe set may fail to detect the transcript. Conversely, if the 3'UTR was substantially under-predicted, the reverse transcription step in the hybridization protocol may not yield cDNA fragments long enough to be detectable by a sufficient number of probes in the probe set.

6. Hierarchical clustering methods described in our recent manuscript (Gurkan et al., 2005) and the profile neighborhood search function provided by the SymAtlas web-application simultaneously examine all pathways within a given cell or tissue. A systems biology approach highlights both constitutive and cell-specific pathways that may be linked to accomplish a particular exocytic and/or endocytic activity. Given this limitation, not all relationships established using biochemical approaches will be necessarily highlighted in the clustering profile, for instance, reflecting the relative level of activity of a particular Rab-regulated hub in a given cell type (Gurkan et al., 2005). Conversely, mRNA expression profiling provides a relative measure of the possible relationships and can serve as a guide to identify potential protein interactions and/or cell systems that define particular trafficking pathways. However, the physiological role of a particular membrome component will still be best achieved using reductionist approaches that involve the tissue or cell type in which the protein is normally expressed as a component of the appropriate Rab-regulated hub. This does not necessarily negate current studies in heterologous expression systems, but suggests that components of a particular Rab hub that may have an important impact on understanding the mechanism of trafficking by a particular Rab may be missing in these artificial systems where the hallmark of true function is the specialized membrane architecture.

Acknowledgments

These studies are supported by grants from the National Institutes of Health (GM33301, GM42336, and EY11606) to W.E.B. C.G. is a Cystic Fibrosis Foundation Postdoctoral Research Fellowship recipient. This is TSRI Manuscript No. 17372-CB.

References

Bock, J. B., Matern, H. T., Peden, A. A., and Scheller, R. H. (2001). A genomic perspective on membrane compartment organization. Nature 409, 839-841. Bonifacino, J. S., and Glick, B. S. (2004). The mechanisms of vesicle budding and fusion. Cell 116, 153-166.

Chen, Y. A., and Scheller, R. H. (2001). SNARE-mediated membrane fusion. Nat. Rev. Mol. Cell. Biol. 2, 98-106.

Gurkan, C., Lapp, H., Alory, C., Su, A. I., Hogenesch, J. B., and Balch, W. E. (2005). Large scale profiling of Rab GTPase trafficking networks: The membrome. Mol. Biol. Cell 16, 3847-3864.

Kirchhausen, T. (2000). Three ways to make a vesicle. Nat. Rev. Mol. Cell. Biol. 1, 187-198. Panda, S., Antoch, M. P., Miller, B. H., Su, A. I., Schook, A. B., Straume, M., Schultz, P. G., Kay, S. A., Takahashi, J. S., and Hogenesch, J. B. (2002). Coordinated transcription of key pathways in the mouse by the circadian clock. Cell 109, 307-320. Pereira-Leal, J. B., and Seabra, M. C. (2000). The mammalian Rab family of small GTPases: Definition of family and subfamily sequence motifs suggests a mechanism for functional specificity in the Ras superfamily. J. Mol. Biol. 301, 1077-1087. Pereira-Leal, J. B., and Seabra, M. C. (2001). Evolution of the Rab family of small GTP-

binding proteins. J. Mol. Biol. 313, 889-901. Pfeffer, S. (2003). Membrane domains in the secretory and endocytic pathways. Cell 112, 507-517.

Su, A. I., Wiltshire, T., Batalov, S., Lapp, H., Ching, K. A., Block, D., Zhang, J., Soden, R., Hayakawa, M., Kreiman, G., Cooke, M. P., Walker, J. R., and Hogenesch, J. B. (2004). A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. USA 101, 6062-6067.

Ungar, D., and Hughson, F. M. (2003). SNARE protein structure and function. Annu. Rev. Cell. Dev. Biol. 19, 493-517.

Walker, J. R., Su, A. I., Self, D. W., Hogenesch, J. B., Lapp, H., Maier, R., Hoyer, D., and

Bilbe, G. (2004). Applications of a rat multiple tissue gene expression data set. Genome

Wu, Z., and Irizarry, R. A. (2004). Preprocessing of oligonucleotide array data. Nat.

Biotechnol. 22, 656-658.

0 0

Post a comment