Introduction to Bioinformatics

Removal of phosphate from phosphotyrosine residue (pTyr) on cellular proteins plays a key role in many different signaling pathways and is catalyzed by three classes of enzymes [1-6]: (1) classic protein tyrosine phosphatases (PTPs), (2) dual-specificity phosphatases (dsPTPs), and (3) low-molecular-weight phosphatases (LMW-PTPs). The classic (tyrosine-specific) PTPs, which are the focus of this bioinformatics analysis, have traditionally been classified into receptor-like and intracellular PTPs based on the presence or absence of a transmembrane-spanning region (Fig. 1).

Bioinformatics is a relatively novel scientific discipline that combines several areas of research [7]. As it is also a rapidly developing field, there is currently not even general agreement on the definition of the word bioinformatics, which nonetheless has gained huge popularity as a buzzword intimately connected with the assembly and analysis of the human genome [8,9]. The definition of bioinformatics depends on the context in which the word is used. As a consequence, the English language has recently been enriched with a number of new terms (e.g., structural genomics, toxicogenomics, oncogenomics, metabolomics, proteomics, pharmacogenomics, chemogenomics; see Table 1). However, a unifying feature in bioinformatics is the collection and analysis of large assemblies of biological datasets, most often depending heavily on powerful computers and development of software tools. Despite tremendous progression in the collection and annotation of sequence information at a few central sites, it is still a major challenge to review and compile large datasets from various sources that are updated with different speeds.

In this chapter, we present a template approach to database mining and bioinformatics analyses of classic PTPs that include the following elements: (1) compilation of a comprehensive and nonredundant database of PTP cDNA and protein sequences; (2) utilization of this database to create a homology-based classification of PTP proteins based on amino acid sequence alignments and phylogenetic analysis (neighborhood-joining trees); (3) low-resolution homology modeling to identify conserved regions (PTP structure-function) and nonconserved selectivity-determining regions (substrate specificity and inhibitor design); (4) identification of the genomic complement of the PTP protein family by mapping all PTP-like sequences in the human genome;

(5) determination of the chromosomal location and genomic structure of each PTP and use of this information to group novel PTPs as either pseudogenes or true novel PTPs;

(6) establishing an initial framework for future disease association studies and studies of the genetic elements controlling PTP expression and regulation.

It is our aim to introduce the reader to the most important bioinformatics databases and analytical tools as we delineate the structural and evolutionary relationships among PTP domains, analyze the PTP family in a genomic context, and finally provide some initial tools for functional analyses of PTPs in health and disease. Although in-house-developed software tools have been employed, we believe that the

R1 R2A R2B R3 R4 R5 R7 R8

lg-!ike FN Ill-like MAM Heavily Carbonic RDGS domain domain glycosylated anhydrase-like motif

R1 R2A R2B R3 R4 R5 R7 R8

NT 1 NT2 NT 3 NT4 NT 5 NT6 NT 7 NTS NT 9

TCPTP SHP1 MEG2 PTP1B SHP2

PTPH1 PTPD1 PTP HDPTP PTPtyp MEG1 PTPD2 BAS

CD45 LAR PTPn PTPP FTP« PTP/ RCPTP1 1A2 PTP5 PTPk DEP1 PTPe PTPÇ HePTP IA2ß PTPo PTPX SAP1 PIPp GLEPP1 PTPS31 OST-PTP

srep

TCPTP SHP1 MEG2 PTP1B SHP2

PTPH1 PTPD1 PTP HDPTP PTPtyp MEG1 PTPD2 BAS

P IP domain

P&ST-like

BDP PEST LyPTP

SH2 domain Q PDZ domains c

Cellular retinaldehyde binding protein-like

PERM domain

Cadherm-likE

His domain

BRO-1 Homology

Figure 1 Schematic representation of the PTP family based on sequence similarity among catalytic domains. NT1-NT9: Non-transmembrane or intracellular PTPs; R1-R8: receptor-like PTPs.

Table I Bioinformatics Links

Databases EMBL-EBI

GenBank HGraw hswall

Human genome NCBI

InterPro

Mouse genome

Pfam

RefSeq

Swall

SwissProt

http://www.ebi.ac.uk/Databases/ http://www.ebi.ac.uk/embl/index.html

http://www.ncbi.nlm.nih.gov/

ftp://ftp.ncbi.nih.gov/genbank/genomes/H_sapiens/ (Genome Project sequences, regardless of chromosome, that have been extracted from GenBank)

ftp://ftp.expasy.org/databases/sp_tr_nrdb/ (human entries from swall and tremble)

ftp://ncbi.nlm.nih.gov/genomes/H_sapiens

http://www.ebi.ac.uk/interpro/

ftp://genome.cse.ucsc.edu/goldenPath/mmFeb2002

http://www.sanger.ac.uk/Software/Pfam/index.shtml

http://www.ncbi.nlm.nih.gov/LocusLink/refseq.html

ftp://ftp.expasy.org/database/sp_tr_nrdb/ (a nonredundant concentration of swissprot and tremble) http://www.ebi.ac.uk/swissprot/

Genome Browsers and Disease/Phenotype Databases

Ensembl Fly Base

Gene Expression Atlas (GNF) Human Genetic Disease LocusLink

http://www.ensembl.org

http://www.flybase.org

http://expression.gnf.org/cgi-bin/index.cgi

http://life2.tau.ac.il/GeneDis/

http://www.ncbi.nlm.nih.gov/LocusLink/

(continues)

(continued)

Mouse knockouts OMIM

Rat Genome Database UCSC (human and mouse) Worm Base http http http http http

//research.bmn.com/mkmd //www.ncbi.nlm.nih.gov/Omim/ //www.rgd.mcw.edu/ //www.genome.ucsc.edu/ //www. wormbase.org

Tools and Software

Alignments Clustalw Spidey

Gene2Est Blast Artemis BLAST

EMBOSS

Entrez

Intron/exon predictions Metagene NetGene2 HmmGene Genie

Alternative splicing

MySql

http://www-igbmc.u-strasbg.fr/BioInfo/ClustalX/Top.html http://www.ncbi.nlm.nih.gov/IEB/Research/Ostell/Spidey/ http://woody.embl-heidelberg.de/gene2est/

http://www.sanger.ac.uk/Software/Artemis/ (A DNA sequence viewer and annotation tool)

http://www.ncbi.nlm.nih.gov/BLAST/ (BLAST [Basic Local Alignment Search Tool]: a set of similarity search programs designed to explore sequence databases regardless of whether the query is protein or DNA)

http://www.emboss.org/ (a package of high-quality FREE Open Source software for sequence analysis)

http://www.ncbi.nlm.nih.gov/Entrez/ (a retrieval system for searching several linked databases)

http://www.sanger.ac.uk/Software/formats/GFF/ (an exchange format for feature description)

http://www.rgd.mcw.edu/METAGENE/

http://www.cbs.dtu.dk/services/NetGene2/

http://www.cbs.dtu.dk/services/HMMgene/

http://www.fruitfly.org/seq_tools/genie.html

http://www.bit.uq.edu.au/altExtron

http://www.mysql.org (relational database)

http://srs.ebi.ac.uk/ (a powerful data integration platform, providing rapid and user-friendly access to the large volumes of diverse and heterogeneous Life Science data stored in more than 400 internal and public domain databases)

Drug Discovery and Structural Genomics

Drug discovery

Molecular recognition Binding database Biomolecule interaction Interacting proteins

Structural genomics CATH FSSP

Nature Structural Genomics

SCOP

StrucGen

Target

Other Sites and Links

Celera

Collection of biolinks

Genomics Institute Incyte

NCBI site map Nomenclature

http://www.cgen.com/science/armc-2001.htm

http://www.bindingdb.org/bind/index.jsp

http://www.bind.ca/

http://dip.doe-mbi.ucla.edu/

http://www.biochem.ucl.ac.uk/bsm/cath_new/ (protein structure classification)

http://www2.ebi.ac.uk/dali/fssp/ (fold classification based on structure-structure alignment of proteins) http://www.nature.com/nsb/structural_genomics/

http://scop.mrc-lmb.cam.ac.uk/scop/index.html (structural classification of proteins) http://www.rcsb.org/pdb/strucgen.html#Resources (structural genomics overview; worldwide project list) http://targetdb.pdb.org/ (targetDB is a target registration database for structural genomics)

http://123genomics.homestead.com/files/home.html http://www. expasy.org/alinks.html#Proteins http://www.ebi.ac.uk/ http://web.gnf.org/ http://www.incyte.com

http://www.ncbi.nlm.nih.gov/Sitemap/index.html (important overview with brief description of all NCBI resources)

http://www.genomicglossaries.com/content/omes.asp http://www.gene.ucl.ac.uk/nomenclature/

Transgenic/mutation/gene knockouts http://tbase.jax.org/

http://www.bioscience.org/knockout/knochome.htm http://www.informatics.jax.org/ http://pkr.sdsc.edu/html/index.shtml http://science.novonordisk.com/ptp/

Tyrosine kinases Tyrosine phosphatases approach is generally applicable and that it (with some patience) can be utilized for bioinformatics analyses of other protein families.

Diabetes Sustenance

Diabetes Sustenance

Get All The Support And Guidance You Need To Be A Success At Dealing With Diabetes The Healthy Way. This Book Is One Of The Most Valuable Resources In The World When It Comes To Learning How Nutritional Supplements Can Control Sugar Levels.

Get My Free Ebook


Post a comment