TevI as the Model Giyyig Enzyme Structure and Function

I-TevI is the most extensively studied GIY-YIG endonuclease. It is encoded by a group I intron found in the thymidylate synthase (td) gene of bacteriophage T4. As described in other chapters in this volume, homing endonucleases are site-specific enzymes that recognize fairly lengthy DNA targets with some degree of sequence tolerance. Even given this description, I-TevI is extremely unusual in that the 28-kDa protein recognizes a 37-bp target site as a monomer (Mueller et al. 1995) and that it can tolerate mutations at any base pair in its substrate and still effect cleavage (Bryk et al. 1993). The protein interacts with its td homing site by binding to the DNA at two sites that are separated by a 5-bp segment. The primary binding site is approximately 20 bp long and spans the intron insertion site (IS). The second site is about 12 bp long and includes the cleavage site (CS), which is 23-25 bp upstream of the IS (Fig. IB). In addition to tolerating nucleotide substitutions, the enzyme can cope with larger perturbations in its recognition site. It can bind and cleave substrates that have large insertions or deletions between the IS and CS, in many cases reaching out or pulling back to cleave at the natural CS (Bryk et al. 1995). This flexibility in recognition of a two-domain DNA target is achieved through the flexible organization of the two-domain protein. Limited proteolysis and mutagenesis experiments defined I-TevI as having two stable functional domains, an N-terminal catalytic domain and a C-terminal DNA-binding domain, separated by a proteolytically sensitive linker (Derbyshire et al. 1997; Fig. IB). The isolated C-terminal domain binds to the homing site with the same affinity as does the full-length protein, suggesting that most, if not all, binding energy originates from this region of the protein. Although the isolated N-terminal domain is unable to bind to the homing site, the full-length enzyme has preferred cleavage sites and the catalytic domain must play a role in selecting these sites. In particular, the enzyme prefers sites with a G-C bp at position -23 relative to the IS and a C-G at position -27 (Bryk et al. 1995; Edg-ell et al. 2004a,b). This selectivity has important implications for I-TevI function (see Sect. 3.4).

While the three-dimensional structure of full-length I-TevI bound to its substrate has not yet been reported, crystal structures of the catalytic domain (Van Roey et al. 2002) and of the DNA-binding domain in complex with the primary binding site (Van Roey et al. 2001) allow interpretation of the biochemical data to give a detailed image of the recognition properties of the enzyme.

3.1 Catalytic Domain

The crystal structures of two derivatives of the N-terminal catalytic domain of I-TevI, corresponding to the first 97 residues of the protein, have been determined (Van Roey et al. 2002). The two distinct catalytic mutants (R27A and E75A) adopt the same molecular conformation, and their structures allowed the construction of a composite wild-type molecule. The first 92 amino acids, corresponding precisely to the GIY-YIG module (Fig. 1A), form a globular domain with a unique fold that consists of a twisted, three-stranded, antiparallel P-sheet, flanked by two a-helices on one side, and by the third a-helix on the other (Fig. 2A). The GIY and YIG sequences are located on the first and second P-strands, respectively. The role of the GIY sequence appears to be strictly structural, as the segment is found at the core of the molecule, whereas the YIG segment (YVG, residues 17-19, in the case of I-TevI) is both structurally and functionally important. The side chain of the hydrophobic residue Val 18 is part of the hydrophobic core of the molecule, while Tyr 17 and Gly 19 are external and centrally located in a shallow concave surface of the molecule

Fig. 2. Structure of the catalytic domain of I-Tevl. a Ribbon diagram showing secondary structure elements, conserved residues (red), and positively charged residues of a-helix 1, which is proposed to interact with the DNA substrate, b Electrostatic potential map of the substrate-interaction surface, a-helix 1 is on the right with catalytic residues Glu 75 and Arg 27 in the center of the shallow concave surface, c Overlap of key catalytic residues of I-Tevl (green) and I-Ppol (orange). The labels refer to the I-Tevl residue numbers. The figure illustrates the coincidence of the divalent cation binding sites in the two enzymes (I-Tevl, magenta; I-Ppol, cyan)

that is formed primarily by P-strand 2 and a-helices 1 and 3 (Fig. 2A). This surface also includes sections of all four of the other conserved motifs (B, C, D, and E) as well as the most important conserved residues Tyr 17, Gly 19, Arg 27, Glu 75, and Asn 90. Accordingly, this surface was proposed to be the site of substrate interaction, a-helix 1, which forms one edge of this surface, is highly positively charged. The side chains of Arg 30, Arg 27, and His 31 stack together above the surface while the side chains of Lys 22, Lys 26, Lys 29, Lys 33, and Lys 37 align along the outer rim (Fig. 2A, B). This pattern of positively charged residues suggests a role for a-helix 1 in interaction with the DNA through phosphate backbone contacts.

The importance of Arg 27 and Glu 75 in the catalytic mechanism has been established through mutagenesis studies (Derbyshire et al. 1997; Kowalski et al. 1999). Furthermore, the role of Glu 75 in a metal-ion-based catalytic mechanism was supported by structural data, from crystals soaked in MnCl2, that revealed the presence of the cation bound to this residue (Van Roey et al. 2002). Additionally, the location of the divalent cation, 3.8 A from the Ca of Gly 19, clarifies the necessity of the Gly at this position, since any side chain would sterically hinder cation binding.

Comparisons with other nucleases revealed a level of structural correspondence between key residues of I-TevI and those of I-Ppol, an unrelated homing endonuclease in the His-Cys box family (Galburt et al. 1999). Like I-TevI, I-Ppol contains a single residue (Asn 119) that binds the divalent cation responsible for DNA cleavage and an arginine (Arg 61) that is important in the mechanism. A third residue of I-Ppol (His 98) is required for activation of a water molecule that serves as the nucleophile in the reaction mechanism. Superposition of Asn 119, Arg 61, and His 98 of I-Ppol onto Glu 75, Arg 27 and Tyr 17 of I-TevI, respectively, reveals strong similarity in the three-dimensional arrangement of these residues (Fig. 2C). In addition, Glu 75 of I-TevI and Asn 119 of I-Ppol are similarly located on helices at one edge of the molecular surface, and the Mn2+ site of I-TevI is nearly identical to the site of the metal ion in I-Ppol ternary complexes. This analysis suggests similar functions for the matched residues, especially Glu 75/Asn 119 and Tyr 17/His 98, in a related catalytic mechanism. However, it should be noted that I-Ppol effects double-stranded DNA cleavage as a dimer, with the two active sites each being responsible for cleavage of one strand. At this time it is unclear how I-TevI, as a monomer, is able to cleave both strands. Also, while there is coincidence of key catalytic residues between I-Ppol and I-TevI, the local fold of I-TevI does not correspond to the ppa-Me structural motif that is shared by the His-Cys box and HNH homing endonucleases. This divergence is consistent with the the idea that the GIY-YIG enzymes form an independent family.

3.2 DNA-Binding Domain

The structure of the C-terminal DNA-binding domain of I-TevI, in complex with a 20-bp duplex DNA that contains the primary binding region of the homing site, has been determined (Van Roey et al. 2001). The protein used for these crystallization experiments consisted of amino acids 130-245, based on limited proteolysis experiments that defined this as a stable domain (Derbyshire et al. 1997). However, only the residues beyond position 149 were visible in the structure (Fig. 3A). This domain adopts an extended conforma-





Fig. 3. Structure of the DNA-binding domain of I-TevI in complex with its primary binding site, a Ribbon diagram showing arrangement of subdomains. b As in a, but viewed along the DNA axis, c Summary of protein-DNA contacts along the DNA. P Phosphate-backbone contacts; H hydrophobic contacts; B hydrogen-bonding contacts to the bases; IS intron insertion site tion, winding along the full length of the substrate and contacting the phosphate backbone throughout. Unlike the single-domain structure of the catalytic domain, the DNA-binding domain is composed of three subdomains that are connected by segments that lack defined secondary structure. The subdomains are a Zn-finger, a minor groove-binding a-helix, and an unusually small helix-turn-helix (H-T-H) domain. With the exception of the H-T-H domain, which places a helix in the major groove, the protein occupies the minor groove and does not greatly perturb the DNA conformation (Fig. 3B). The extended nature of the molecule helps to account for how a small enzyme recognizes such a long substrate.

The H-T-H subdomain sits in the major groove but forms no hydrogen bonds with the bases. Instead, the domain makes extensive contacts with the phosphate backbone, and its second helix is inserted into the major groove, where it makes hydrophobic contacts with a series of thymine bases, thereby imparting considerable specificity to this sequence-tolerant enzyme (Fig. 3C). The a-helix inserted in the minor groove also has a hydrophobic surface contacting the bases, and it only forms hydrogen bonds with the phosphate backbone. Contacts between the Zn-finger and the DNA are limited to two hydrogen bonds with the phosphate backbone. The few base contacts that are actually made are with residues in the joining segments between the subdomains (Fig. 3C). These hydrogen-bonding contacts provide the only, albeit limited, specificity of the interaction between the protein and DNA, apart from the specificity resulting from the selectivity for an AT-rich region by the H-T-H domain. This is entirely consistent with the observed sequence tolerance of I-Tevl. Clearly, many base substitutions can be tolerated because so few bases in the homing site are actually contacted directly. In addition, it appears that many contacts are redundant. Thus, the loss of one or two does not significantly affect the enzyme's ability to bind to its target.

It is perhaps not surprising that I-Tevl contacts the DNA largely in the minor groove, given that T4 phage DNA is highly modified, containing gluco-sylated 5-hydroxymethyl cytosine residues, with the bulky adduct occupying the major groove. Logically, the region of the DNA that is contacted by the H-T-H subdomain in the major groove is devoid of cytosines.

3.3 Flexible Linker and Distance Determination

The crystal structures define I-Tevl as comprising a catalytic domain and a tripartite DNA-binding domain connected by a long linker. These data are also consistent with evidence that a large region in the center of the enzyme is sensitive to protease digestion and that deletions of two to five amino acids can be tolerated by the enzyme. This suggests that this portion of the mole cule acts as a flexible spacer between the two functional domains (Kowalski et al. 1999). However, it is now clear that the Zn-finger and parts of the linker actually play a more complex role in I-TevI cleavage activity (Dean et al. 2002 and unpubl. results).

In order to determine the role in binding affinity for each of the subdomains in the DNA-binding domain, a number of derivatives were made by systematically removing segments of the protein (Dean et al. 2002). Deletion of the H-T-H domain at the C-terminus was immediately deleterious, but removal of residues from the N-terminus was well tolerated. Most remarkably, mutants missing the Zn-finger were able to bind to the homing site as well as (if not better than) the full domain, leading to the conclusion that the Zn-fin-ger does not contribute significantly to DNA binding.

The Zn-finger of I-TevI is highly unusual in that it is extremely small and has a non-canonical CXC(X)10CX2C sequence. To date, no other GIY-YIG protein has been identified that contains a Zn-finger motif. Similar sequences have only been found in the P'-subunits of bacterial RNA polymerases, but those Zn-fingers are as yet structurally and functionally uncharacterized (Campbell et al. 2001). Interestingly, Zn-finger mutants in the context of full-length I-TevI were only minimally compromised in their ability to bind and cleave homing-site substrates (Dean et al. 2002). However, experiments carried out to map the exact CS for each protein on wild-type and mutant substrates, with insertions or deletions between the IS and CS, highlighted the function of the Zn-finger (Fig. 4A). The wild-type protein cleaves a wild-type substrate at the CS, 23-25 bp upstream of the IS. For substrates having modest insertions and deletions it can reach forward or pull back to find the natural cleavage sequence, but for larger insertions and deletions it prefers to cleave at the natural distance, rather than at the natural sequence. For the Zn-finger mutants, the picture is quite different. These proteins prefer to cleave all forms of the substrate at the natural sequence. Even a derivative in which the full Zn-finger (18 amino acids) is deleted reaches out to find the natural cleavage sequence in DNA substrates containing insertions.

These results indicate that the loss of the Zn-finger correlates with the loss of a distance determinant for cleavage. Two distinct models for this role of the Zn-finger, both based on protein-protein interactions with another part of the enzyme, have been proposed (Dean et al. 2002). Such interactions could involve either the Zn-finger as a "catalytic clamp" locking down the catalytic domain, and reducing its ability to stray too far, or, alternatively, as an "organizer" for the extremely long linker so that, once again, the catalytic domain is constrained and thus cleaves at the appropriate distance. More recent data suggest that parts of the linker also contribute to distance determination. Consequently, a combination of the two models may be correct (Fig. 4B; Liu et al., unpubl.).

Fig. 4. The Zn-finger as a distance determinant, a Cartoon representation of I-TevI showing cleavage activity of wildtype and Zn-finger mutants (I-TevIAZn) on wild-type and mutant homing-site substrates, b Model for Zn-finger function, showing putative protein-protein interactions among the Zn-finger, the linker, and the catalytic domain

Fig. 4. The Zn-finger as a distance determinant, a Cartoon representation of I-TevI showing cleavage activity of wildtype and Zn-finger mutants (I-TevIAZn) on wild-type and mutant homing-site substrates, b Model for Zn-finger function, showing putative protein-protein interactions among the Zn-finger, the linker, and the catalytic domain

3.4 I-TevI Endonuclease Is Bifunctional,

Also Serving As a Transcriptional Autorepressor

Expression of I-TevI in phage is tightly regulated. I-Tevl's host gene, thymi-dylate synthase, is expressed from a T4 middle promoter. I-TevI itself, however, is not expressed from that transcript, because its ribosome-binding site and start codon are sequestered in a stem-loop structure (Gott et al. 1988). This results in tight translational control, presumably to protect the host and phage from this somewhat promiscuous enzyme. I-TevI is expressed later in infection from a T4 late promoter without translational repression, because the stem-loop cannot be formed in that transcript.

There is sequence similarity between a region that overlaps the I-TevI promoter and the homing site of the enzyme (Fig. 5A). Notably, in vivo experiments have shown that I-TevI can repress its own expression, representing a second level of control, now at the transcriptional level (Edgell et al. 2004a). In vitro experiments subsequently showed that the DNA-binding domain of I-TevI binds to this operator sequence with the same affinity as to the homing site, even though there are 6-bp substitutions in the 20-bp primary recognition sequence. In contrast, I-TevI cleaves the operator site very poorly (approximately 100-fold less efficiently), because there is no sequence similarity upstream of the primary binding site. In particular, there is no sequence that corresponds to the CS. The low-level cleavage that does occur is at a site 14 bp upstream from the sequence analogous to the intron IS, where a preferred cleavage context is fortuitously located.

To determine how the protein can bind such a variant sequence, the crystal structure of the I-TevI DNA-binding domain bound to a 20-bp duplex corresponding to the operator sequence was solved (Fig. 5B). The overall structure of this complex is very similar to that of the homing-site complex, with a root-mean-square deviation (RMSD) for the Ca of the protein of 0.67 A. The DNA structure is largely unchanged, except for small differences in the regions where the DNA interacts with the extended regions between the subdomains.

Fig. 5. a Cartoon of I-TevI (red) binding to its homing site and to its promoter site. Sequence identities between the two are shaded in gray, b Superposition of I-TevI DNA-binding domain complexes with homing-site (red) and operator-site (green) DNA duplexes

In general, nucleotide substitutions are accommodated by a small number of changes in the conformation of the side chains of amino acids, resulting in a small number of changes in contacts to the bases in the unstructured regions, thus yielding alternate hydrogen-bond interactions.

I-TevI has a great deal of flexibility in its ability to interact with DNA. This flexibility likely contributes to the enzyme's ability to home to new sites. In addition, because of its defined sequence preferences at the cleavage site, I-TevI can function as a repressor, binding to its operator site to control expression, but with a 100-fold reduction in cleavage. This endows the enzyme with a second distinct biological function (Fig. 5A).

Was this article helpful?

0 0

Post a comment