encodes the alpha subunit of hemoglobin and another homologous gene that encodes the beta subunit of hemoglobin. These two genes arose because an ancestral gene underwent duplication and the resulting two genes diverged through evolutionary time, giving rise to the alpha- and beta-subunit genes; these two genes are paralogs. Homologous genes (both orthologs and paralogs) often have the same or related functions; so, after a function has been assigned to a particular gene, it can provide a clue to the function of a homologous gene.
Databases containing genes and proteins found in a wide array of organisms are available for homology searches. Powerful computer programs have been developed for scanning these databases to look for particular sequences. A commonly used homology search program is BLAST (Basic Local Alignment Search Tool). Suppose a geneticist sequences a genome and locates a gene that encodes a protein of unknown function. A homology search conducted on databases containing the DNA or protein sequences of other organisms may identify one or more orthologous sequences. If a function is known for one of these sequences, that function may provide information about the function of the newly discovered protein.
In a similar way, computer programs can search a single genome for paralogs. Eukaryotic organisms often contain families of genes that have arisen by duplication of a single gene. If a paralog is found and its function has been previously assigned, this function can provide information about a possible function of the unknown gene. However, paralogs often evolve new functions; so information about their functions must be used cautiously. Of the genes newly identified through genomic-sequencing projects, 50% are significantly similar to orthologs and paralogs whose function has already been described. The 50% of newly identified genes that cannot be assigned a function on the basis of homology searches will undoubtedly decrease in number as functions are assigned to more and more genes and as more genomes are sequenced.
Other sequence comparisons Complex proteins often contain regions that have specific shapes or functions called protein domains. For example, certain DNA-bind-ing proteins attach to DNA in the same way; these proteins have in common a domain that provides the DNA-binding function. Each protein domain has an arrangement of amino acids common to that domain. There are probably a limited, though large, number of protein domains, which have mixed and matched through evolutionary time to yield the protein diversity seen in present-day organisms.
Many protein domains have been characterized, and their molecular functions have been determined. The sequence from a newly identified gene can be scanned against a database of known domains. If the gene sequence encodes one or more domains whose functions have been previously determined, the function of the domain can provide important information about a possible function of the new gene.
Another computational method for predicting protein function is a phylogenetic profile. In this method, the pres-ence-and-absence pattern of a particular protein is examined across a set of organisms whose genomes have been sequenced. If two proteins are either both present or both absent in all genomes surveyed, the two proteins may be functionally related. For example, the two proteins might function as consecutive steps in a biochemical pathway. The idea is that the two proteins depend on each other and will evolve together. One protein cannot function without the other, and they will either both be present or both be absent.
Consider the following proteins in four bacterial species (I Figure 19.16a):
E. coli: protein 1, protein 2, protein 3, protein 4, protein 5, protein 6
Species A: protein 1, protein 2, protein 3, protein 6
Species B: protein 1, protein 3, protein 4, protein 6
Species C: protein 2, protein 4, protein 5
We can create a phylogenetic profile by constructing a table comparing the presence ( + ) or absence ( —) of the proteins in the four bacterial species (I Figure 19.16b). The phylogenetic profile reveals that proteins 1, 3, and 6 are either all present or all absent in all species; so these proteins might be functionally related.
Examining fusion patterns among proteins is another method for predicting functional relations; this technique is sometimes called the Rosetta Stone method. Functionally related, separate proteins in one organism sometimes exist as a single, fused protein in another organism. Thus, the presence of a fused A + B protein in one species suggests that separate proteins A and B in another organism may be functionally related.
Yet another method for determining the function of an unknown gene is gene neighbor analysis (IFigure 19.17). Genes that encode functionally related proteins are often closely linked in bacteria. For example, if two genes are consistently linked in the genomes of several bacteria, they might be functionally related. Functionally related genes are sometimes also linked in eukaryotes; examples are the hox genes, which play an important role in embryonic development (Chapter 21).
It is important to recognize that functions suggested by computational methods such as homology searches, phylogenetic profiling, fusion proteins, and neighbor analysis do not define a protein's function; rather these computational methods provide hints about possible
419.16 Phylogenetic profiling can be used to infer protein function. (Micrographs from: top, CNRI/SPL/Photo Researchers; middle left and center, Gary Gaugler/Visuals unlimited; middle right, M. Abbey/Visuals unlimited.)
functions that can be pursued through detailed analyses of the biochemistry and cellular location of the protein. Nevertheless, these computational methods and others like them have proved to be invaluable in determining the functions of genes revealed in genomic studies.
Was this article helpful?