B. Predictive Methods Based on Bioinformatics

Several groups have proposed to predict protein function by applying computational methods.

1. Annotation by sequence similarity to known proteins is based on the assumption that homology in structure results in conservation of activity [14]. At least three factors limit such an approach however; sequence similarity has to be continuous over a sufficient stretch of DNA, and has to be readily recognized, homologues with known function should already be available for comparison, and finally, activity of the protein has to be correlated with the parts that are structurally conserved.

2. Prediction of protein interaction based on gene fusion events was proposed by Enright et al. [15] as a more sophisticated version of sequence comparison, by focusing on homologies between DNA sequences that encode single proteins in one genome, and apparently ''fused'' domains in another genome. The assumption is made that fused domains necessarily interact with each other, and therefore that independent proteins homologous to these domains should interact with a high probability. Enright et al. have actually documented their approach by ''linking'' 215 proteins in three different prokaryotic genomes, with very few estimated false positives [14]. This approach addresses one of the main criticism of the ''random'' sequence comparison but does not replace actual analysis of function.

3. Correlative functional linking through a combination of methods was described by Marcotte et al. [16]. These authors purport to replace biochemical analysis of protein function by integrating results from correlated evolution, correlated messenger RNA expression patterns, and patterns of domain fusion. Marcotte et al. showed that the number of links proposed by each of the methods applied to the yeast genome. Most types of links used by Marcotte et al. rely of course on a number of assumptions that will need to be verified experimentally [16]. Correlated evolution between several genomes assumes that proteins sharing a similar phylogenetic profile are expected to be functionally linked. Correlated mRNA expression patterns were obtained for the yeast genome by comparing 97 publicly available DNA chip data sets corresponding to changes of expression levels of mRNA (not proteins!) during a variety of physiological conditions. Here the assumptions are first, that all mRNA levels accurately and similarly reflect protein levels, and second, that apparently correlated expression levels reflect functional linkage. The resulting proposed functions are still very broadly defined: protein functions are thus referred to as involved in ''metabolism'' or ''transcription,'' which would probably describe a large proportion of randomly selected proteins anyway. Nevertheless, this kind of approach, as imprecise as it may be, does present the considerable advantage of providing information and hypotheses for at least the yeast genome. It is easy to validate or invalidate, by actually performing experiments such as genome-wide protein interaction mapping, as proposed by Fromont-Racine et al. [9], or by systematic gene knockout analysis, as presently performed by the group of Ronald Davis at Stanford and independently by the European Yeast Genome Sequencing Consortium (EUROFAN).

0 0

Post a comment