Fig. 1. Global view of Rab sequence space with 2D PCP analysis. The x and y axes represent the values of the second and third principal component, respectively. The analysis was performed on a database containing 560 individually checked and unique Rab sequences including each Rab protein identified in S. cerevisiae. Automatic clustering with the clusterdata function in MatLab was performed to identify groupings in the data. The 10 major groups that are color coded and named according to a representative mammalian member of the group. Dotted lines show the position of the cutoff values (0.02, —0.02) used to identify exocytic Rab sequences, shown in more detail in Fig. 2.

lettering, lowercase lettering indicates residues that vary within each subfamily, and the position of the nucleotide binding and hydrolysis motifs (universally shared among GTPases) is indicated with yellow highlighting. For reference, regions from a previous analysis predicted to define Rab family and subfamily function are indicated with red and pink highlighting,

Fig. 2. A closer look at the clustering of Rab subclasses with exocytic function. (A) Rab sequences from Fig. 1 with PCP values above the cutoff of x > 0.02 and y > 0.02 have been

respectively (Pereira-Leal and Seabra, 2000). This analysis used HMM profiling and phylogenetic tree analysis to identify the major contiguous linear epitopes; each of these regions contains both residues that are conserved within the Ras and Rho families and also residues that differ among subfamilies and subclasses. From evolutionary considerations, residues that discriminate among subfamilies should be conserved within a particular subfamily, but be divergent between the subfamilies. In Fig. 3A, the residues conserved for each subfamily and shared with at least one other subfamily are highlighted with black shading. This is a reasonable assumption if the residue is conserved between all subfamilies but may be too simplistic an assumption in the light of dual specificity. Such residues would not be predicted to define an individual subfamily.

In contrast, the residues highlighted in blue shading in Fig. 3B represent conserved residues that are divergent between subfamilies and would be predicted to contain the information that defines the individual subfamilies. Those amino acids that define individual subclasses of a subfamily are expected to be divergent and not conserved in the consensus sequence. Such positions are highlighted with green shading in Fig. 3C. It is the variation of the particular residues highlighted in Fig. 3C that contributes to the subclass groupings displayed in Fig. 1. Figure 3 also shows a limitation of primary amino acid homology analysis, namely that it does not take the three-dimensional shape of the protein into account. In this representation, residues predicted to specify the function of a particular subfamily or subclass in general are distributed throughout the entire protein. However, one determinant of effector recognition for Rab proteins is the orientation of a set of three conserved residues (indicated with asterisks in Fig. 3A) that is dictated by the variation in residues at the hydrophobic core of the protein (Merithew et al., 2001). Four out of five of the core residues that participate in this variation (indicated with a § in Fig. 3C) are residues predicted by phylogenetic homology algorithms as potential subclass discriminant residues (green shaded positions in Fig. 3C). This example illustrates the predictive power of the phylogenetic homology study plotted separately. These Rab sequences, where known, represent exocytic Rab function. Automatic clustering with the clusterdata function in MatLab was performed to generate groups or subclasses containing four or more members. The groups are color coded and named according to a mammalian representative of the group. Rab sequences that do not cluster into groups are collectively defined as a miscellaneous group (gray filled circles). (B) This diagram identifies the Rab proteins that are located into the space occupied by exocytic Rab proteins indicating the relative positions and accession number for each Rab protein of the miscellaneous group.

Fig. 3. Comparison of the core domain of Ras superfamily sequences between Rab, Ras, and Rho families. The core domain is aligned showing in uppercase bold letters, those residues conserved at the 50% consensus level (i.e., 50% or greater sequences) show this residue at the position indicated. Bold is also used for positions conserved for positive (+, H,

Fig. 3. Comparison of the core domain of Ras superfamily sequences between Rab, Ras, and Rho families. The core domain is aligned showing in uppercase bold letters, those residues conserved at the 50% consensus level (i.e., 50% or greater sequences) show this residue at the position indicated. Bold is also used for positions conserved for positive (+, H, while illustrating the fact that structural information is required to interpret the molecular features identified by the homology search algorithm.

Scope and Comments

PCA is, in general, a method for finding linear combinati ons of variables that can be grouped together and specified by a single variab le or component. One impo rtant point to note is that this method is very scalable , and many differen t variables can be included to make a multidimensional matrix. The variables can be theoretical such as global physico-chemical parameters of the protein sequence, or raw experimental measurements such as microarray datasets (see Chapter 1). The combination of homology data with experimentally measured data is expected to create a redun dancy of information that will be complementary and add reliability to the analysis. Note that the representation of the pro tein sequence homology da ta differs from the more common method of

K, R) or negative charge (—, D, E). In lowercase letters is shown the consensus sequence at nonconserved positions designated according to the amino acid class abbreviation; o (alcohol, S,T), l (aliphatic (I, L,V), a (aromatic, F, H,W,Y), c (charged, D,E,H,K,R), h (hydrophobic, A, C,F,G,H,I,K,L,M,R,T,V,W,Y), p (polar, C,D,E,H,K,N,Q,R,S,T), s (small, A,C,D,G,N,P,S,T, V), u (tiny, A,G,S), and t (turn-like, A,C,D,E,G,H,K,N,Q,R,S,T). The consensus sequence data were obtained from the SMART database (Schultz et al., 1998) ( and are derived from 339 Ras domains, 460 Rho domains, and 1120 Rab domains. The location of the Rho insert region is marked; this insert is not contained in the Ras or Rab families. For greater clarification, the G protein-conserved sequence elements are shown highlighted in yellow. Numbering is arbitrary and intended as a descriptive guide. Highlighted in red are Rab family (RF) regions that have been proposed to uniquely distinguish the Rab subfamily of the Ras superfamily (Pereira-Leal and Seabra, 2000). Indicated below each row with an asterisk is a triad of conserved hydrophobic residues that provides structural plasticity in stabilization of the activated conformation of Rab3A and Rab5C (Merithew et al., 2001). (A) Conserved positions among Rab/Rho/Ras subfamilies. In this representation, all residues that are conserved at the 50% consensus level within a subfamily and shared with at least one other subfamily member are shaded in black. An asterisk marks the positions of a triad of hydrophobic residues (position 38, 54, 70) that stabilizes the active conformation. (B) Signature motifs of Rab/Rho/Ras subfamilies. In this representation, all residues that are both conserved at the 50% consensus level within one of the subfamilies and unique within that subfamily are shaded in blue. (C) Subclass discriminant residues of Rab/Rho/Ras subfamilies. The alignment of core domains highlights nonconserved residue positions with green shading. Highlighted in pink are Rab subfamily (RSF) regions that have been proposed to identify subclasses of Rab proteins (Moore et al., 1995; Pereira-Leal and Seabra, 2000). Indicated with a § above each row are the positions of amino acids that form part of the hydrophobic core between switch regions of the Rab3A and Rab5C GTPases (Merithew et al., 2001) and that are predicted to be key specificity determinants as their packing in turn dictates the particular conformation of the invariant hydrophobic triad (see A).

presentation in a dendrogram (Pereira-Leal and Seabra, 2001) and is complementary to such tree building methods. It should also be stressed that the results obtained with search algorithms are not static and need to be reevaluated with the ever-expanding datasets resulting from ongoing sequencing efforts.


Work in the author's laboratory is supported by the U.S. National Science Foundation and U.S. National Institutes of Health. E. Williams is thanked for helpful discussions and I. Berke for critical reading of the manuscript. This article is dedicated to P. Salvodelli.


Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403-410. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402. Bialek, W., and Botstein, D. (2004). Introductory science and mathematics education for

21st-century biologists. Science 303, 788-790. Casari, G., Sander, C., and Valencia, A. (1995). A method to predict functional residues in proteins. Nat. Struct Biol. 2, 171-178. Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T. J., Higgins, D. G., and Thompson, J. D. (2003). Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31, 3497-3500. Merithew, E., Hatherly, S., Dumas, J. J., Lawe, D. C., Heller-Harrison, R., and Lambright, D. G. (2001). Structural plasticity of an invariant hydrophobic triad in the switch regions of Rab GTPases is a determinant of effector recognition. J. Biol. Chem. 276, 13982-13988. Moore, I., Schell, J., and Palme, K. (1995). Subclass-specific sequence motifs identified in Rab

GTPases. Trends Biochem. Sci. 20, 10-12. Pereira-Leal, J. B., and Seabra, M. C. (2000). The mammalian Rab family of small GTPases: Definition of family and subfamily sequence motifs suggests a mechanism for functional specificity in the Ras superfamily. J. Mol. Biol. 301, 1077-1087. Pereira-Leal, J. B., and Seabra, M. C. (2001). Evolution of the Rab family of small GTP-

binding proteins. J. Mol. Biol. 313, 889-901. Schultz, J., Milpetz, F., Bork, P., and Ponting, C. P. (1998). SMART, a simple modular architecture research tool: Identification of signaling domains. Proc. Natl. Acad. Sci. USA 95, 5857-5864.

Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F., and Higgins, D. G. (1997). The CLUSTAL_X windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876-4882.

0 0

Post a comment