Databases: Genome, mRNA, and Protein Sequences
Monitoring the avalanche of primary sequence data entering archives such as GenBank and EMBL is a daunting task; however, the biological community is increasingly well prepared with secondary databases that extract and annotate subsets from the primary data. The RefSeq resource maintains a nonredundant set of human mRNAs, which are mapped onto genomic sequence via LocusLink or the UCSC genome browser (Table I). A nonredundant catalog of human proteins is maintained in the SwissProt/TrEMBL protein database (Table I, Swall) or in the RefSeq database of mRNA translation products. In addition, protein families and conserved domains (including amino acid sequence alignments and consensus motifs) are found in the InterPro database (Table I), which provides automated annotation of all SwissProt/TrEMBL proteins, including the human proteome (i.e., the gene product inventory of the human genome) [10,11]. Although the human proteome set is some way from completion, the Ensembl project (Table I) provides a source of homology-based predicted gene products based on the current human genome assembly. On the European Bioinformatics Institutes website (Table I, EBI), the current predicted protein index stood at 29,076 sequences as of May 2002. In contrast, the nonredundant mRNA total in RefSeq now stands at 15,846. This means that approximately half of the human genes are already represented as full-length coding sequences if considering only one gene-one-mRNA-one protein.
Establishment of a nonredundant collection of amino acid sequences is a key first step for a comprehensive bioin-formatics analysis of any protein family, and a simple text search (Table I, SRS) in the above protein databases retrieves the majority of human PTPs (e.g., the query "tyrosine" and "phosphatase" retrieves 28 human PTPs, including links to their cDNAs). The conserved catalytic domains of these well-characterized cDNAs were next used in a BLAST search to retrieve the entire set of PTP-like sequences deposited in GenBank . Initially, more than 3500 database hits were discovered. Exclusion of expressed sequence tags (ESTs) and high-throughput genome (HTG) sequences reduced this number to 254 sequences, and, after further exclusion of PTP splice variants, partially overlapping clones, and duplicate entries, a total of 113 distinct vertebrate PTP catalytic domains were identified, 37 of which were human. A database of full-length proteins corresponding to this larger set of unique human PTP cDNAs forms the basis for our bioinformatics analyses (Fig. 2). In addition, because most PTPs deposited in GenBank have been identified through low-stringency hybridization and PCR-based cloning techniques, the above review of all PTP accession numbers allowed assignment of synonyms to PTP sequences which were characterized in independent studies and consequently given different names in the literature.
Primary Sequence Alignment: Classification of PTPs and Phylogenetic Trees
Multiple sequence alignments (generated with ClustalW ) allow assessment of the degree of residue conservation among homologous proteins and hence provide a powerful basis for their classification. For the classic PTPs, a phyloge-netic analysis of 113 aligned vertebrate PTP-domain amino acid sequences (available at http://science.novonordisk.com/ ptp/ or http://ptp.cshl.edu) reveals that this group of enzymes can be divided into 17 subtypes: 9 nontransmembrane subtypes (NT-PTPs or intracellular PTPs) and 8 receptor-like PTPs (RPTPs) subtypes (Fig. 1).
The nine intracellular PTP subtypes identified by the phylogenetic analysis correlate well with a classification based on regulatory or targeting domains residing outside the catalytic domains (Fig. 1):NT1: PTP1B and TC-PTP; NT2: SHP-1 and SHP-2; NT3: MEG2; NT4: PEST, LyPTP, BDP1; NT5: MEG1 and PTPH1; NT6: PTPD1 and PTPD2; and NT7: PTPBAS. Four of these subtypes each consist of only one enzyme: NT3: MEG2; NT7: PTPBAS; NT8: PTPTyp; and NT9: HDPTP.
For tandem-domain RPTPs, the membrane-proximal PTP domains (D1 sequences) cluster into one major trunk of the phylogenetic tree, whereas the single-domain RPTPs define three distinct subtypes (R3, R7, and R8). When including 49 membrane-distal PTP domains (D2 sequences) from vertebrate tandem-domain PTPs in the analysis, these sequences define a separate subfamily that is phylogeneti-cally distinct from the subfamilies defined by the PTP catalytic D1 domains, thus indicating structural and perhaps functional conservation among the D2 sequences .
Of note, one RPTP subtype contains both receptor-like and nontransmembrane PTPs (R7), and two RPTP genes also encode cytoplasmic variants as a result of alternative splicing (GLEPP1; mouse RPTP^ ) or alternative promotor usage (PTPe ). Except for these discrepancies, the phylogenetic classification based on PTP domain sequence homology is in overall accordance with previous topological classifications based on the presence or absence of an extracellular region.
Conserved Regions and Residues: A Three-Dimensional Comparison
Already during the first years of PTP research, when only a few different PTP cDNAs had been isolated, it became apparent that certain motifs were conserved. At that time, neither the structural nor functional significance of these motifs was known. However, they served as useful priming
sites for identification of novel PTPs using degenerate oligonucleotide primers and helped significantly in advancing the field in a timely fashion. Equally impressive has been the speed of determining the structures of different PTPs. Thus, following the seminal study on the structure of PTP1B , X-ray protein crystallographic structures are currently available for nine different PTP catalytic domains: PTP1B, TC-PTP, SHP-1, SHP-2, PTP^, PTPa, PTP-LAR, PTP-SL, and Yersinia PTP (Table I, Entrez). Moreover, a remarkable number of studies have reported PTPs in complex with various peptide substrates and inhibitors (see below). We and others have superimposed these PTP domains and found a striking conservation of the tertiary structure . This structural conservation allows the combination of primary sequence analysis and low-resolution homology modeling and thus identification of conserved regions and residues at the three-dimensional level. The catalytic domains of PTPs consist of about 280 residues arranged as a three-layer, a-P-a core domain with a central P-sheet sandwiched between a-helices (see Chapter 108).
Primary sequence alignment of the catalytic domains of PTPs reveals 10 discrete, highly conserved motifs (M1-10, detailed at http://science.novonordisk.com/ptp/) and 7 single conserved residues (Glu19, Glu115, Arg156, Arg169, Leu192, Arg254, and Arg257; hPTP1B numbering, which is used throughout this chapter). Several of these motifs play critical roles in maintaining the stability of the PTP domain (e.g., extensive hydrophobic packing is observed for motifs M3-M7), while other motifs and conserved single residues are essential for catalysis. The most highly conserved area within the PTP tertiary structure is defined by the PTP signature motif, HisCysSerXxxGlyXxxArg[Thr/Ser]Gly (M9), and the structural motif [Phe/Tyr]IleAlaXXxGlnGlyPro (M4).
While the molecular mechanisms underlying PTP mediated catalysis are treated in detail elsewhere (see Chapter 108), the primary sequence and three-dimensional analyses allow a first glimpse into the intricate catalytic machinery. The PTP signature motif ValHisCysSerXxxGlyXxxGlyArg-[Thr/Ser]Gly (residues 213-223 in PTP1B) defines the PTP family and represents one of Nature's elegant designs of a highly efficient binding pocket for phosphate. The PTP
motif, also called the P-loop or PTP-loop, forms a half-circle with the main chain nitrogens pointing toward the cysteine (Cys215 in PTP1B), which is positioned almost in the center. At physiological pH, this cysteine residue is deprotonated and acts as a nucleophile accepting phosphate transiently during catalysis . Two of the highly conserved single residues help stabilize the PTP-loop (Glu115 and Arg257), which is positioned at the bottom of an approximately 9-A-deep pocket (i.e., corresponding to the length of the side chain of phosphotyrosine, pTyr, but not the shorter side chains of phosphoserine and phosphothreonine). The phenyl ring of the pTyr substrate interacts with the aromatic Phe182 and Tyr46 residues and the hydrophobic residues Val49, Ala217, and Ile219. Upon binding of pTyr substrates, a major conformational change takes place that moves the WPD loop to close the active site pocket, literally trapping the substrate [18-20]. The WPD loop closure brings Asp181 close to the scissile oxygen of pTyr, where it is in a favorable position to function as a general acid during the first step of catalysis. In the second step, the highly conserved Gln262 positions a catalytic water molecule (for hydrolysis), thereby releasing phosphate from Cys215 .
Conserved Surface-Exposed Areas in Tandem PTPs: The D1/D2 Interface
The invariant residues in domain D1, which show considerable substitution in the D2 domains, are positioned close to the active site. In some RPTPs (e.g., PTPa, PTPe, LAR) only two point mutations in D2 domains are required to restore robust catalytic activity against conventional PTP substrates, whereas critical substitutions present in D2 domains of other RPTPs (e.g., CD45, PTPZ, PTPy) indicate that these domains are truly inactive. While low-resolution homology modeling (so-called Ca-regiovariation score analysis) of the catalytic domains of intracellular PTPs and the membrane-proximal D1 domains of RPTPs shows that the conserved residues converge around the active site, much greater variation is observed in the vicinity of the putative active sites in the D2 domains of receptor-type PTPs . Thus, this analysis supports the notion that most of the catalytic activity is found in proximal D1 domains in tandem RPTPs [21,22], whereas the membrane-distal D2 domains, at least for some RPTPs, seem to play regulatory roles, as has been demonstrated for CD45 . The D2 domains could act as phos-photyrosine recognition units, similar to Src homology 2 (SH2) and phosphotyrosine binding (PTB) domains .
So far, the crystal structure of tandem PTP domains has only been reported for PTP-LAR . The relative orientation of the LAR D1 and D2 domains, constrained by a short linker, is stabilized by extensive interdomain interactions. In the present context, it seems significant that the Ca-regiovariation analysis has identified conserved areas on both the D1 and D2 domains that correspond to the interaction area in LAR. In addition, the sequences corresponding to the linker sequence in LAR were found to be conserved in the D2 domains, but not in the D1 domain . Thus, it seems likely that the overall structure of receptor-like PTPs is well represented by the X-ray structure of PTP-LAR.
Nonconserved Residues in the Vicinity of the Active
Site: Implications for a Bioinformatics Approach to Structure-Based Drug Design
At this point, the prototype PTP1B has in particular attracted the attention of the pharmaceutical industry. Mice in which the PTP1B gene has been removed (i.e., knocked out) show increased insulin sensitivity and resistance to diet-induced obesity [26,27]. Hence, inhibitors of PTP1B could potentially be useful for the treatment of type 2 diabetes and obesity.
The highly conserved structure of the PTPs and the consequential apparent difficulties related to developing selective active site-directed inhibitors initially discouraged the pharmaceutical industry from considering this group of enzymes as useful drug targets. A similar myth kept the industry away from the kinase field as it was considered impossible to develop kinase inhibitors with the requisite specificity . However, basic research on protein kinases was leading the way for the PTP field, and applied pharmaceutical research has shown that even subtle differences or combinations of differences can be utilized in structure-based design of highly selective inhibitors that bind to the conserved ATP binding pockets in kinases. The PTP inhibitor field is now rapidly catching up, and both academic and industrial laboratories have convincingly demonstrated that selective, active site-directed PTP inhibitors can indeed be made [29-31]. In our laboratory, we used the above Ca-regiovariation score analysis to identify residues or combinations of residues in the vicinity of the active site that would uniquely identify a particular PTP and thus could potentially be used for structure-based design of selective inhibitors. Because the intention is to develop orally active inhibitors, a number of compound characteristics must be taken into account; hence, poor absorption and cell permeation are often observed when the "rule of five" is violated, including exceeding a molecular weight of 500 . Therefore, to allow design of low-molecular-weight, active, site-directed inhibitors, it is a requirement that selectivity be achieved by addressing residues in the vicinity of the active site.
Using the above low-resolution homology modeling approach revealed that four residues (47, 48, 258, and 259) were especially important for the design of selective PTP inhibitors. None of these residues is unique for one specific PTP, but the combination of these four residues constitutes a selectivity-determining region, a signature motif. Even closely related members within one PTP subfamily often differ in this region (e.g., PTPa and PTPe). We and others have used this selectivity-determining region for structure-based design of selective PTP1B inhibitors [33-36] (see Chapter 111 for further details).
From a drug discovery point of view, bioinformatics analyses can contribute significantly to avoiding problems due to lack of specificity which otherwise might show up late in a development phase as adverse or toxicological effects. Thus, when developing selective PTP inhibitors it is essential to have complete structural knowledge of all members of this enzyme family. As indicated, the conserved three-dimensional fold of PTPs allows relatively accurate structural comparisons of catalytic domains, even of PTPs for which no X-ray structures have yet been obtained. As an example, we have used Asp48 as an important interaction point (for salt-bridge formation) to develop selective PTP1B inhibitors . Using combined structural and genomic analyses, which we have termed structural bioinformatics, we have identified all PTPs with an aspartic acid in the equivalent position. By introducing these PTPs as counter screens at an early stage in preclinical development of PTP1B inhibitors, we expect to avoid selectivity problems within the PTP family. In other words, complete mapping and analyses of all PTPs (and all other potential drug target families) are a must in the postgenomic era.
Was this article helpful?
Make a plan If you want to lose weight, you need to make a plan for it. Planning involves setting your goals both short term and long term ones. With proper planning, you would be able to have an effective guide on the steps that you want to take, towards losing pounds of weight. Aside from that, it would also keep you motivated.