Yptlp" group

Sec4p" group

Fig. 3. (A) Alignment of the Loop2 region of Sec4p and Ypt1p. The identity of 7 amino acids differs out of a total of 12 possible positions. (B) Alignment of the Loop2 region of Sec4p and Ypt1p together with other members of the Sec4p and Ypt1p groups. Amino acids considered potentially significant for Sec4p function are shaded blue and those significant for Ypt1p function are shaded red and are indicated with an asterisk; the algorithm will subsequently rank these positions according to conservation and divergence distances. (C) Qualitative overview of position ranking. The Loop2 region of Sec4p is aligned together with Sec4p-related sequences. The position of potentially significant residues is indicated with shading; of these positions, F45 and I55 (shaded blue and indicated with arrows) are strictly conserved and P47, F49, and T51 are less conserved. The algorithm will rank these residues and others over the entire Sec4p sequence according to conservation, chemical identity, and divergence from the Ypt1p sequence.

4. The identification of potentially relevant residues reduces the complexity in the search for residues defining specific function. However, even for the example shown of the Loop2 region, the complexity is only reduced from 7 to 5 potentially relevant positions. The algorithm then proceeds to rank the potentially relevant residues in order of predicted importance. The complete Clustal X (1.83) alignment file includes Ypt1p and Sec4p-related sequences whose ability to functionally complement either Sec4p or Ypt1p is not known. These sequence orthologs are used to ascertain the conservation/divergence distances of the residues classified as potentially significant from the core group. This homology file is imported into the PHYLIP package in order to calculate a matrix of relatedness values between sequences using the ProtDist program (Retief, 2000). We calculate the conservation and divergence distances by sorting all of the known related proteins available in the database (identified through BLAST searches) according to their sequence homology to the first core group and by sorting a second time according to their homology to the second core group. The conservation distance is then measured for each amino acid position by identifying the first sequence in the list where that sequence and the two sequences below contain a divergent amino acid at this position (defined as the ProtDist value between this first divergent protein and any member of the first core group). Similarly, the divergence distance for each position was determined by using the sequences sorted according to their relatedness to the second core group and by finding the first sequence with a conserved amino acid (again measured as a ProtDist value). The algorithm then assigns a quantitative value to these positions by making the assumptions that a given position is likely relevant for a selective function when (1) both the conservation to the core group and divergence distance from the second core group are large, (2) when conserved residues exist at the same position in both and not only one of the core groups, and (3) if the change in amino acids between the two groups involves a change in charge or aromatic amino acid. These assumptions can be addressed by varying among input parameters assigned by the user. The ranking process is qualitatively illustrated in Fig. 3C. Residues identified as potentially significant for Sec4p function in Fig. 3B are inspected in the global alignment of Sec4p-related sequences to establish their conservation. What stands out from a perusal of the alignment is that Sec4p residues F45 and I55 are identical in all the Sec4p-related Rab protein sequences and a different amino acid is identical in every Ypt1p homolog (not shown). Even though the substitutions between Sec4p and Ypt1p in these positions would be considered conservative ones, the alignment of sequences from orthologous organisms, together with the experimental knowledge of their function, suggests that the differences are potentially significant ones.

Algorithm Implementation

Clearly manual implementation of the algorithm is possible for any particular protein pair using applications such as MatLab (Mathworks Inc.). However, a more automated implementation is desirable to reduce the tedium of individual manipulations. The algorithm has been automated with an implementation in Java (CodeWarrior, Metrowerks), the ultimate goal being to port the algorithm as an independent application on different platforms. Once coding, debugging, and testing are completed, the plan is to share the application according to established guidelines (http://www . nap.edu/books/0309088593/html/4.html), for potential use in the study of other large protein families and domains.

Algorithm Output

Figure 4 shows the algorithm output for the example of Sec4p and Ypt1p. Four out of 10 positions with the highest values are located in the Loop2, or effector region that has been previously identified to be important for Sec4p function (Brennwald and Novick, 1993; Dunn et al., 1993). In addition, we did not find any residues with high specificity determining values contained within Loop7, which experimental data suggest is not important for the specific function of Sec4p (M. Nussbaum and R. Collins, unpublished data). These results indicate the predictive power of the algorithm to identify functionally important residues. However, one implication of this method is that the functionally important residues are ''transplantable,'' namely, that these residues can be swapped between protein pairs to generate ''switch-of-function'' proteins, which we have not found to be the case for Sec4p and Ypt1p. The evolutionary basis of the algorithm assumes that proteins have evolved to acquire specific functionalities; however, it is also probable that some proteins have differentiated in order to lose or avoid functions; thus, transplantation to provide ''switch-of-function'' proteins may not work in all instances.

Comments and Cautions

When evaluating output, it is important to bear in mind that any algorithm is reliant on the quality of its assumptions, the number of parameters, and the quality of the input data. Are the Rab sequence representations in the database biased toward one of a particular experimental pair? Clearly the sequences that comprise the homology alignments of the two groups must be evenly matched. The algorithm is also extremely sensitive to the quality of the overall homology alignment. A structural

Fig. 4. Algorithm output showing predicted values of top scoring specificity determining amino acid positions generated according to sequence distances calculated either with ProtDist or % sequence identity methods.

analysis of Rab proteins reveals that conserved hydrophobic triad residues point their side chains at different angles in relation to the strands of the core (Merithew et al., 2001). These angle shifts create very distinct surfaces of related GTPases. This implies that although Rabs have a related overall shape and fold, the angles of internal packing are used to create distinct recognition surfaces that may not be easily predicted through primary amino acid sequence homology-based methods. Improvements in the predictive power of methodologies such as described here will arise from incorporation of our understanding of three-dimensional structural inputs and other experimental data. Structural snapshots are also helpful to understand and provide a molecular explanation of how amino acid changes contribute to unique functionalities. Another source of variation is in the method used to generate the matrix containing values for the sequence relationships of the protein orthologs. We currently use the ProtDist program from PHYLIP to calculate these values; however, implement ation of the algorit hm with a different method of calculat ion (% identity) does yield slightly varying outputs (Fig. 4). The algorithm also ignores potential posttranslational modifications. For example, the serine residue 48 of Sec4p was not included as potentially significant even though the residue in the equivalent position for Ypt1p is aspartic acid, because other sequences with Ypt1p function also contain serine in this position. However, a hypothetical possibility is that the residue at this position is selectively phosphorylated for proteins with Ypt1p, but not Sec4p function and therefore does contribute to specific Ypt1p function. Another assumption is that conserved residues existing at the same position in both and not only one of the core groups are more significant than conserved residues within a single group. However, different, nonoverlapping regions of the protein may be dominant for different functions, making this assumption perhaps too simplistic. In summary, experimental verification of an algorithm's output is the critical (an often rate-determining) step in determining the predictive power of a particular procedure. The algorithm we describe uses a combination of homology searches and experimental inputs to provide quantitative evaluations and provides a useful starting point for experimentation.


This work is supported by grants from the U.S. National Science Foundation and U.S. National Institutes of Health. M. Nussbaum acknowledges the support of the Research Apprenticeship in the Biological Sciences program for high school students.


Bauer, B., Mirey, G., Vetter, I. R., Garcia-Ranea, J. A., Valencia, A., Wittinghofer, A., Camonis, J. H., and Cool, R. H. (1999). Effector recognition by the small GTP-binding proteins Ras and Ral. J. Biol. Chem. 274, 17763-17770. Brennwald, P., and Novick, P. (1993). Interactions of three domains distinguishing the Ras-

related GTP-binding proteins Ypt1 and Sec4. Nature 362, 560-563. Casari, G., Sander, C., and Valencia, A. (1995). A method to predict functional residues in proteins. Nat. Struct Biol. 2, 171-178. Clement, M., Fournier, H., de Repentigny, L., and Belhumeur, P. (1998). Isolation and characterization of the Candida albicans SEC4 gene. Yeast 14, 675-680. del Sol Mesa, A., Pazos, F., and Valencia, A. (2003). Automatic methods for predicting functionally important residues. J. Mol. Biol. 326, 1289-1302. Dietmaier, W., Fabry, S., Huber, H., and Schmitt, R. (1995). Analysis of a family of ypt genes and their products from Chlamydomonas reinhardtii. Gene 158, 41-50.

Dumas, B., Borel, C., Herbert, C., Maury, J., Jacquet, C., Balsse, R., and Esquerre-Tugaye, M. T. (2001). Molecular characterization of CLPT1, a SEC4-like Rab/GTPase of the phytopathogenic fungus Colletotrichum lindemuthianum which is regulated by the carbon source. Gene 272, 219-225.

Dunn, B., Stearns, T., and Botstein, D. (1993). Specificity domains distinguish the Ras-related GTPases Ypt1 and Sec4. Nature 362, 563-565.

Fabry, S., Jacobsen, A., Huber, H., Palme, K., and Schmitt, R. (1993). Structure, expression, and phylogenetic relationships of a family of ypt genes encoding small G-proteins in the green alga Volvox carteri. Curr. Genet. 24, 229-240.

Haubruck, H., Prange, R., Vorgias, C., and Gallwitz, D. (1989). The ras-related mouse ypt1 protein can functionally replace the YPT1 gene product in yeast. EMBO J. 8, 1427-1432.

Haubruck, H., Engelke, U., Mertins, P., and Gallwitz, D. (1990). Structural and functional analysis of ypt2, an essential ras-related gene in the fission yeast Schizosaccharomyces pombe encoding a Sec4 protein homologue. EMBO J. 9, 1957-1962.

Heo, W. D., and Meyer, T. (2003). Switch-of-function mutants based on morphology classification of Ras superfamily small GTPases. Cell 113, 315-328.

Merithew, E., Hatherly, S., Dumas, J. J., Lawe, D. C., Heller-Harrison, R., and Lambright, D. G. (2001). Structural plasticity of an invariant hydrophobic triad in the switch regions of Rab GTPases is a determinant of effector recognition. J. Biol. Chem. 276, 13982-13988.

Pertuiset, B., Beckerich, J. M., and Gaillardin, C. (1995). Molecular cloning of Rab-related genes in the yeast Yarrowia lipolytica. Analysis of RYL1, an essential gene encoding a SEC4 homologue. Curr. Genet. 27, 123-130.

Retief, J. D. (2000). Phylogenetic analysis using PHYLIP. Methods Mol. Biol. 132, 243-258.

Saloheimo, M., Wang, H., Valkonen, M., Vasara, T., Huuskonen, A., Riikonen, M., Pakula, T., Ward, M., and Penttila, M. (2004). Characterization of secretory genes ypt1/yptA and nsf1/nsfA from two filamentous fungi: Induction of secretory pathway genes of Trichoderma reesei under secretion stress conditions. Appl. Environ. Microbiol 70, 459-467.

Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F., and Higgins, D. G. (1997). The CLUSTAL_X windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876-4882.

Cure Your Yeast Infection For Good

Cure Your Yeast Infection For Good

The term vaginitis is one that is applied to any inflammation or infection of the vagina, and there are many different conditions that are categorized together under this ‘broad’ heading, including bacterial vaginosis, trichomoniasis and non-infectious vaginitis.

Get My Free Ebook

Post a comment