Identification of the Genomic Complement of PTPs

A. Chromosomal Localization of PTP Genes

With access to the human genome assembly (currently "Build" 28 at the National Institutes of Health), it is an easy task to identify the chromosomal localization of the 37 known human PTPs using LocusLink or the BLAST search platform in the UCSC genome browser (Table II). Importantly, the chromosomal localization could be mapped for all published PTPs consistent with the estimated 95% coverage of the human genome. This chromosomal assignment is important to resolve whether homologous proteins are splice variants derived from the same gene or are recently duplicated genes or pseudo-genes. In addition, the current refinement of PTP chromosomal localization allows for disease association studies.

Mapping of Intron/Exon Structures: Alignment of mRNA with Genomic Sequences

To visualize PTPs in a genomic context, mRNA or protein sequences were fine-mapped (i.e., unequivocally aligned at the exon level) onto the assembled genome via LocusLink, Ensemble, or the UCSC genome browser (Fig. 2). NCBI provides a convenient tool for aligning mRNA or EST sequences to a genomic sequence (Table I, Spidey) in cases where PTP genome sequences are not present in the genome assembly but are found only in the raw sequences from the sequencing centers.

To distinguish between potentially novel PTPs and PTP pseudogenes in the humane genome, knowledge of the exon/intron structures for the known PTPs is essential. In addition, the genomic organization of a particular PTP is important when analyzing alternative splicing events or promoter elements. Finally, our mapping of PTP exon/intron structure revealed that the above classification of the PTP protein family into 17 PTP subtypes, based exclusively on

Table II Genomic Annotation of PTPsa

PTP name

Chromosome

Gene symbol

OMIM ID.

LyPTP

1p13.2

PTPN22

606986

LAR

1p34.2

PTPRF

179590

PTPlamda

1p35.2

PTPRU

602454

CD45

1q31.3

PTPRC

151460

HePTP

1q32.1

PTPN7

176889

OST-PTP

1q32.1

N.A

N.A

PTPD2

1q32.3

PTPN14

603155

MEG1

2q14.2

PTPN4

176878

BDP1

2p21.1

PTPN18

606587

IA2

2q35

PTPRN

601773

PTPgamma

3p14.2

PTPRG

176886

HDPTP

3p21.31

PTPN23

606584

PTPBAS

4q21.3

PTPN13

600267

PTPkappa

6q22.33

PTPRK

602545

PEST

7q11.23

PTPN12

600079

PTPzeta

7q31.31

PTPRZ1

176891

IA2beta

7q36.3

PTPRN2

601698

PTPdelta

9p24.1

PTPRD

601598

PTPH1

9q31.3

PTPN3

176877

PTPTyp

10q11.22

PTPN20b

N.A

PTPepsilon

10q26.2

PTPRE

600926

DEP1

11p11.2

PTPRJ

600925

STEP

11p15.1

PTPN5

176879

GLEPP1

12p12.3

PTPRO

600579

SHP1

12p13.31

PTPN6

176883

PCPTP1

12q15

PTPRR

602853

PTPbeta

12q15

PTPRB

176882

PTPS31

12q21.31

PTPGMC1b

603317

SHP2

12q24.13

PTPN11

176876

PTPD1

14q31.3-q32.11

PTPN21

603271

MEG2

15q24.2

PTPN9

600768

TCPTP

18p11.21

PTPN2

176887

PTPmu

18p11.23

PTPRM

176888

PTPsigma

19p13.3

PTPRS

601576

SAP1

19q13.42

PTPRH

602510

PTPalpha

20p13

PTPRA

176884

PTPrho

20q12-q13.11

PTPRT

N.A

PTP1B

20q13.13

PTPN1

176885

aThis table is based on "Build" 31 and thus contains the most recent information available.

bNot a HUGO-approved name.

aThis table is based on "Build" 31 and thus contains the most recent information available.

bNot a HUGO-approved name.

catalytic domain amino acid sequence homology, is consistent with a classification based on PTP gene structure.

Searching the Human Genome for PTP-Like Sequences

The collection of unique full-length human PTP protein sequences formed the basis for mining the human genome for novel PTPs and pseudogenes. A strategy for a BLAST-based homology search [37] with PTP D1 and D2 domain sequences as the origin is shown schematically in Fig. 2. To improve the specificity and reduce noise, the conserved PTP domain sequences were used as origin in this search, rather than the full-length sequences. Using the tblastn algorithm (part of the BLAST package) with default settings, D1 and D2 sequences from each of the 37 human protein sequences were searched against the six-frame translation of the human genome (raw sequences from the sequencing centers Human Genome Centers; see Table I). Each of the searches produced a results file with a substantial number of sometimes overlapping hits to various entries in the human genome database (HGraw). A nonredundant list of 295 genome entries was compiled from these files, and each entry was examined by hand for PTP catalytic domains using a number of iterative searches and predictions, including searches against the full-length protein sequences, hswall, and the EST database. In addition, selected genome sequences were analyzed with NetGene2 to predict splice sites, and putative PTP motifs were identified with the fuzztran program from the EMBOSS package. To allow quick navigation in the genome entries and the associated search results files, everything was organized in a web environment. To visualize see the Novo Nordisk website, NN-PTP; see Table I. To visualize the genome entries and their associated features (e.g., homology to full-length protein sequences, areas with EST hits, motifs), the results files were parsed to produce feature annotation in GFF format. Artemis provides a user-friendly graphical viewer of genome entries that can be loaded with the associated GFF files.

Using the web environment and the Artemis tool, each genome entry was carefully inspected by hand. This procedure reduced the 295 overlapping hits to: (1) the 37 published PTPs, (2) 9 intron-less PTP pseudogenes, (3) 4 or more potential novel PTPs with exon/intron structure, (4) 11 dsPTPs, and (5) 14 false-positive entries (low complexity hits). Although definitive classification of these potentially novel PTPs awaits a combined analysis of the finished genome sequence and experimental verification of cDNAs, it seems clear that the total number is far below earlier expectations in this protein family [38].

To complement the tedious manual data analyses with brute-force SQL queries, all hits between the origin protein sequences and the genome entries were uploaded to a relational database (MySql), which also assists in (1) keeping track of hits, (2) generating statistics, and (3) more in-depth database mining in the future.

PTP Pseudogenes

As indicated previously, the human genome contains at least nine intronless PTP-like sequences that are closely related (>94% nucleotide identity) to the mRNA of TCPTP (two sequences), SHP2 (five), MEG1 (one), or RPTPa (one). Closer inspection of these sequences revealed multiple inframe stop codons due to insertions and deletions. Only two of the nine sequences had perfect matches with ESTs.

As an example, in addition to the TCPTP gene on chromosome 18 (Table 2), two TCPTP-like sequences were identified on chromosomes 1 and 13 (TCPTP-P1 and TCPTP-P13) which share 94 to 95% nucleotide identity over 1440 bp with the TCPTP cDNA. The lack of introns and the presence of polyadenylated tails indicate that these sequences are pseudogenes that arose by retrotransposition. Both TCPTP sequences harbor frameshift mutations and premature stop codons. If transcribed, this would generate short PTP-unfunctional polypeptides of 41 or 149 amino acids, respectively. Of note, TCPTP-P13 is associated with an EST sequence (aw401979), thus it may be expressed, although the function of such an mRNA/truncated protein is unknown.

When reviewing the in situ hybridization data published in the pregenomic era, we discovered that both the TCPTP and SHP2 pseudogenes identified in this study in silico also had been detected experimentally [39-41]. In the case of TCPTP, Johnson and coworkers [39] compared the specificity of genomic and cDNA probes and demonstrated that, under identical conditions, the genomic TCPTP probes (containing both exon and intron sequence) readily identified a single specific site of hybridization (18p11.3-p11.2), whereas the TCPTP cDNA probe identified sites of both the gene and its pseudogenes (1q22-q24 and 13q12-q13). Likewise, Jirik and coworkers [40], when using a SHP2 cDNA probe, found hybridization signals over 4q21 and 5p14 as well as to a lesser degree over chromosomes 3q1-3q13.2, 6q23-q24, and 8q12, in addition to 12q24.1 (the SHP2 gene; see Table 2) [40]. In light of today's genome assembly, we conclude that these signals correspond to the exact localization of five intronless SHP2 pseudogenes.

Novel PTPs

In addition to the intronless PTP pseudogenes, our analysis revealed a few novel PTP-like sequences and fragments that have an exon/intron structure consistent with the genomic organization of the PTP gene family. Some of these sequences are only fragments, and the final verification of these putative novel PTP genes awaits completion of the human genome sequence or experimental demonstration of their expression. However, one novel human PTP could be assigned to chromosome 1q32 (Fig. 3). The PTP gene located here has approximately 80% homology to both rat osteotesticular PTP (OST-PTP) and mouse embryonic stem cell phosphatase (PTP-ESP). Together with fluorescence in situ hybridization studies, which map OST-PTP to mouse chromosome 1 (region F-G, a region syntenic to human chromosome 1q32-q33), the similarity in gene structure suggests that this novel PTP is the human ortholog of OST-PTP. Because the human OST-PTP has not yet been cloned, the identification of its genomic sequence will facilitate future characterization of this PTP. As a first step, the human genome browser (Table 1, UCSC) allowed us to retrieve a hypothetical amino acid sequence for this PTP, as predicted by the Ensembl project. Only two EST sequences map to this region of chromosome 1 (Fig. 3), suggesting a highly

Figure 3 Identification of a novel human PTP (OST-PTP) and its genomic context in a selected browser (UCSC, Table I). From top to bottom the features include the nucleotide base position that refers to the coordinates in the NIH genome assembly, with the cytogenetic band immediately underneath, and a graphic view of clone coverage (gaps and overlap) in the assembly and accession numbers of the raw sequence. The identifier 'YourSeq' is the exon-intron structure we have predicted for human OST-PTP and underneath are homology-supported gene models (Acembly, Ensembl, Fgenesh and GenScan). Below the automated gene predictions are mRNA and ESTs sequences from human (black) and rodents (grey) aligned to the genome. Finally, additional display options can be selected, e.g. location of repeats, SNP, sequence-tagged-sites (STS), genetic markers and microarray expression data.

Figure 3 Identification of a novel human PTP (OST-PTP) and its genomic context in a selected browser (UCSC, Table I). From top to bottom the features include the nucleotide base position that refers to the coordinates in the NIH genome assembly, with the cytogenetic band immediately underneath, and a graphic view of clone coverage (gaps and overlap) in the assembly and accession numbers of the raw sequence. The identifier 'YourSeq' is the exon-intron structure we have predicted for human OST-PTP and underneath are homology-supported gene models (Acembly, Ensembl, Fgenesh and GenScan). Below the automated gene predictions are mRNA and ESTs sequences from human (black) and rodents (grey) aligned to the genome. Finally, additional display options can be selected, e.g. location of repeats, SNP, sequence-tagged-sites (STS), genetic markers and microarray expression data.

regulated and specific expression pattern similar to that observed for its mouse and rat counterparts [42].

Diabetes Sustenance

Diabetes Sustenance

Get All The Support And Guidance You Need To Be A Success At Dealing With Diabetes The Healthy Way. This Book Is One Of The Most Valuable Resources In The World When It Comes To Learning How Nutritional Supplements Can Control Sugar Levels.

Get My Free Ebook


Post a comment