Mechanisms of DNA Target Site Recognition and Specificity

As catalysts of the genetic mobility of introns and inteins, LAGLIDADG homing endonucleases (as well as other enzymes with the same biological func tion) must balance two somewhat contradictory requirements: they need to be highly sequence-specific, in order to promote precise intron transfer in their host genomes which are most often a chloroplast or mitochondrial genome, and yet must retain sufficient site recognition flexibility to allow successful lateral transfer in the face of sequence variation in genetically divergent hosts. LAGLIDADG homing endonucleases appear to solve these apparently contradictory problems by using a flexible homing site recognition strategy in which a well-defined, but limited, number of individual polymorphisms are tolerated by the enzyme without significant loss of binding affinity or cleavage efficiency. The biochemical basis of this flexible recognition strategy is to make phased, undersaturating DNA-protein contacts across long DNA target sites (Moure et al. 2002, 2003; Chevalier et al. 2003). The length of the interface provides overall high specificity, while formation of a broadly distributed set of phased, subsaturating contacts across the interface facilitates the recognition and accommodation of specific polymorphisms at individual target site positions (Fig. 3). The overall specificity of the LAGLIDADG endonucleases is not well established, but is generally thought to range from 1 in 108 to 109 random sequences for an average length of 20-22 base pairs (Chevalier et al. 2003).

In the protein-DNA interfaces visualized at high resolution (2.5-1.9 A) for the LAGLIDADG family (I-Crel, I-Msol, I-Scel and H-Drel), a set of four antiparallel P-strands in each enzyme domain provide direct and water-mediated contacts between residue side chains and nucleotide atoms in the major groove of each DNA half-site (Fig. 3). These contacts extend from base pairs ±3 to base pairs ±11 (the central four base pairs from -2 to +2, which are flanked by the scissile phosphate groups, are not in contact with the protein). Typically, strands pi and 02 extend the entire length of this interface in each half-site, while strands 03 and 04 provide additional contacts to base pairs ±3,4 and 5 in each complex. The LAGLIDADG endonucleases typically make contacts to approximately 65-75% of possible hydrogen-bond donors and acceptors of the base pairs in the major groove, make few or no additional contacts in the minor groove, and also contact approximately one-third of the backbone phosphate groups across the homing site sequence. These contacts are split evenly between direct and water-mediated interactions. A schematic of contacts formed by I-Crel to its pseudo-palindromic target site is shown in Fig. 3.

In the structures listed above, the DNA target is gradually bent around the endonuclease binding surface, giving an overall curvature across the entire length of the site of approximately 45°. In the homodimeric enzyme-DNA complexes with I-Crel and I-Msol, the DNA is locally overwound between bases -3 to +3 (twist rising to -50°), with a corresponding deformation in the base pair propeller twist and buckle angles for those same bases, leading to a

Fig. 3. Structural mechanism of DNA recognition by LAGLIDADG enzymes. A Structure of the p-sheet from a subunit of I-Crel in complex with its corresponding DNA target half-site. Note that every other side chain from a p-strand is pointed into the DNA major groove, and that the residues from adjacent p-strands are staggered in their positions to permit contact to several sequential bases. B Schematic of all observed contacts (both direct and water-mediated) between the I-Crel subunits and both DNA target halfsites, which differ in sequence at several positions (the full-length site is a pseudo-palindrome). The blue circles represent ordered water molecules. Indentations on bases represent H-bond acceptor groups; bulges on the bases represent H-bond donors. Red lines are direct contacts, blue lines are water-mediated, and green lines are contacts to backbone atoms of the DNA. Dashed lines represent'double indirect' contacts to bases via two sequential bridging water molecules

Fig. 3. Structural mechanism of DNA recognition by LAGLIDADG enzymes. A Structure of the p-sheet from a subunit of I-Crel in complex with its corresponding DNA target half-site. Note that every other side chain from a p-strand is pointed into the DNA major groove, and that the residues from adjacent p-strands are staggered in their positions to permit contact to several sequential bases. B Schematic of all observed contacts (both direct and water-mediated) between the I-Crel subunits and both DNA target halfsites, which differ in sequence at several positions (the full-length site is a pseudo-palindrome). The blue circles represent ordered water molecules. Indentations on bases represent H-bond acceptor groups; bulges on the bases represent H-bond donors. Red lines are direct contacts, blue lines are water-mediated, and green lines are contacts to backbone atoms of the DNA. Dashed lines represent'double indirect' contacts to bases via two sequential bridging water molecules narrowing of the minor groove at the site of DNA cleavage. The bending of the DNA is symmetric (Jurica et al. 1998). In the DNA complex with the monomeric enzymes, the central four base pairs of the cleavage sites generally display negative roll values, which translate into a similar narrowing of the minor groove. As a result, in all of these structures, the scissile phosphates are positioned approximately 5-8 A apart and are located near bound metal ions in the active sites.

The distributions of related target site sequences that are recognized and cleaved by individual LAGLIDADG enzymes have been previously described using a variety of site preference screens (Argast et al. 1998; Gimble et al. 2003). In those experiments, target site variants that are recognized by the native enzyme are recovered from a randomized homing site library and se-

quenced. Using these data, the information content (specificity) at each base pair of the target site can be calculated using a computational method that accounts for the probability of each possible base being found at each position across the site (Schneider et al. 1986). The determination of crystallograph-ic structures of the corresponding enzyme-DNA complexes, with the explicit visualization of direct and water-mediated contacts, facilitates an analysis of the correlation between the number and type of intermolecular contacts made to each base pair with the information content at each of these positions. Three general conclusions from these analyses are: (i) the specificity of base pair recognition to structurally unperturbed DNA sequence is proportional to the number of H-bond contacts to each base pair; (ii) the degree of specificity is not significantly attenuated by the use of solvent molecules as chemical bridges between nucleotide atoms and protein side chains; and (iii) information content is increased at individual base pairs, particularly near the center of the cleavage site, by indirect recognition of DNA conformational preferences.

0 0

Post a comment