Intein Motifs and Conserved Residues

No set convention has been adopted for numbering of residues. One method (Fig. 1) that simplifies comparison of inteins in different host proteins (Noren et al. 2000) consists of separately numbering each part of the precursor: (1) intein residues are numbered sequentially from amino to carboxy terminus of the intein beginningwith 1; (2) N-extein residues are negative numbers starting with -1 at the carboxy terminus of the N-extein, counting towards the precursor amino terminus; (3) C-extein residues include a plus sign beginning with +1 at the amino terminus of the C-extein, counting towards the precursor carboxy terminus.

Analysis of even the first few inteins indicated that sequences surrounding the splice sites were conserved. Three inteins were flanked by cysteines, making it unclear whether the intein began or ended with cysteine (Hirata et al. 1990; Kane et al. 1990; Davis et al. 1991, 1992). The Thermococcus lito-ralis DNA polymerase second intein (Tli Pol-2) was predicted to begin with serine and have a threonine at the +1 position based on the conserved DNA polymerase motif into which the intein was inserted (Perler et al. 1992). This

Fig. 1. Organization of a protein-splicing precursor. Intein motifs are depicted above and conserved amino acids are depicted below, as are amino acid numbers. In this precursor, a LAGLIDADG family homing endonuclease is shown with its standard motifs (C, D, E, and H). The amino-terminal splicing region (motifs A, N2, B, and N4) associates with the car-boxy-terminal splicing region (motifs F and G) to form the intein-splicing domain. The conserved nucleophiles and assisting groups are in uppercase, while residues present in non-canonical inteins are in lowercase

Fig. 1. Organization of a protein-splicing precursor. Intein motifs are depicted above and conserved amino acids are depicted below, as are amino acid numbers. In this precursor, a LAGLIDADG family homing endonuclease is shown with its standard motifs (C, D, E, and H). The amino-terminal splicing region (motifs A, N2, B, and N4) associates with the car-boxy-terminal splicing region (motifs F and G) to form the intein-splicing domain. The conserved nucleophiles and assisting groups are in uppercase, while residues present in non-canonical inteins are in lowercase was confirmed by sequencing the excised Tli Pol-2 intein (Perler et al. 1992). Similar results were obtained with the See VMA intein, which began with cysteine (Cooper et al. 1993). Although threonine has not been observed at the intein amino terminus, substitution of serine by threonine in the Tli Pol-2 intein yielded a functional intein (Hodges et al. 1992). If the intein began with cysteine or serine, then it must end with the conserved asparagine, since comparison to intein-less homologues required one of the cysteine, serine or threonine residues flanking the intein to be part of the extein.

Intein motifs contain groups of similar amino acids at specific positions interspersed with non-conserved positions, making it more difficult to identify inteins by simple sequence comparison. Sophisticated sequence comparison systems including a hidden Markov model have been devised to find new inteins (Dalgaard et al. 1997b; Pietrokovski 1998a). Two motif nomenclatures are widely used. The earlier system used blocks A-H (Fig. 1), where only blocks A, B, F, and G are in the splicing domain (Pietrokovski 1994; Perler et al. 1997a). A second method uses "N' for N-extein motifs, "EN' for homing endonuclease motifs, and "C for C-extein motifs (Pietrokovski 1998a). Equivalent motifs are: A=N1, B=N3, C=EN1, D=EN2, E=EN3, H=EN4, F=C2, and G=C1. Motifs N2 and N4 are characterized by an acidic residue (Pietrokovski 1998a). Motif sequences are listed in InBase. Block A begins with the intein amino terminus. Block B is usually 60-90 aa from the intein amino terminus and often contains a Thr-x-x-His motif. This histidine is the most conserved intein residue and is only absent from one putative intein. A similar motif is also found in proteases. Block F directly precedes block G, which contains the intein car-boxy-terminal dipeptide His-Asn and the C-extein +1 nucleophile.

The serine, cysteine and threonine at the carboxy-terminal side of both splice junctions are nucleophiles in the protein-splicing reaction. The third nucleophile is the intein carboxy-terminal asparagine. The two conserved his-tidines assist these nucleophiles. The remaining conserved residues either directly assist the chemical reactions or are important for folding of the intein. A pattern of residues with similar chemical functionalities instead of conservation of a particular amino acid is a hallmark of inteins. This may reflect the fact that many different amino acids can increase nucleophilicity and elec-trophilicity. However, each intein appears to have evolved a network of residues to facilitate the nucleophilic displacements mediated by the specific combination of nucleophiles present in that intein, since splicing is often reduced or blocked by conservative substitution of any of these three nucleophiles.

0 0

Post a comment