Extending sequencing far from the primers

We routinely use the sequence-extending mixture (USB) to obtain sequences beyond 600 to 800 bases from the primer with high resolution. Using multiple loadings, more than 1500 to 2000 bases from the primer can be extended with ease. The protocols of extending sequencing are quite similar to those described in the previous protocols except for the following steps.

1. The labeling mixture should not be undiluted or it may be diluted twofold, instead of using fivefold dilution.

2. Increase the amount of biotinylated nucleotide or [35S]dATP (1000 to 1500 Ci/mmol) from the regular volume of 2 to 4 ml (biotinylated nucleotide) or 1 to 2 ml (isotopic nucleotide).

3. Extend the labeling reaction from the regular time of 2 to 5 min to 4 to 8 min at room temperature.

4. In the termination reaction, instead of adding 2.5 ml of ddNTPs to the appropriate tubes (A,G,T,C), add 2.5 ml of a mixture of the appropriate sequence extending mixture (USB) and the appropriate termination mixture (USB) at ratios of 2:0.5, 2:1.0, and 1.5:1 (v/v), depending on the particular DNA sequence, to the appropriate tubes.

5. Utilize a relatively long gel (80 to 100 cm) and extend electrophoresis for 12 to 16 h using a high quality sequencing apparatus.


Primer walking refers to stepwise sequencing starting from the 5' end of the target DNA using a series of primers. The first primer is usually a universal primer that is built into the plasmid vector, such as T3, T7 or SP6 primers. Following each DNA sequencing, a new primer will be designed based on the very last sequence. For instance, if the first sequencing has revealed 450 bases with 3'-end sequences of 5'GCATGTAGACGATAGGATACAG-3', to continue sequencing of the DNA, the

GCATGTAGACGATAGGATACAG can be chosen as the walking primer for the next round of sequencing. In the same way, new walking primers can be selected for subsequent sequencing until the entire target DNA has been completely sequenced.


It takes a longer time to sequence large DNA molecules compared with regular stepwise or primer-walking sequencing methods. However, using exonuclease III, the gene or large DNA fragment can be unidirectionally deleted in order to generate a series of shorter fragments with overlapping ends. In this way, these progressively deleted fragments can then be simultaneously sequenced in a short time. The general principles and procedures are outlined in Figure 5.3.

1. Recombinant plasmid, phagemid, or bacteriophage M13 replicate form of DNA that contains the cloned DNA of interest is first linearized with two appropriate restriction enzymes. Both enzymes should cut DNA between one end of the target DNA and the binding site for the universal sequencing primer. One enzyme should cleave near the target DNA and generate either a recessed 3' end (or 5' protruding end) or a blunt end. The other enzyme will cleave the DNA near the binding site for the universal sequencing primer and must produce a 4-base protruding 3' end or be filled in with a-phosphorothioate dNTPs.

2. The linearized DNA is then progressively deleted with exonuclease III that only digests the DNA from the blunt or 5' protruding terminus, leaving the 3' protruding (overhang) or a-phosphorothioate-filled end intact. The digestion proceeds unidirectionally from the site of cleavage to the target DNA sequence. The digestion is terminated by taking an appropriate volume of the samples at appropriate time intervals, thus generating a series and progressive deletions of shorter DNA fragments.

3. The exposed single-stranded DNAs are then cleaved by nuclease S1 or mung-bean nuclease. This produces blunt termini at both ends of the DNA fragments to be sequenced.

4. The shortened DNA is then recircularized using T4 DNA ligase and transformed into an appropriate bacterial host. Transformants can be selected with appropriate antibiotics in the culture medium and the recombinant plasmids will be purified and subjected to DNA sequencing.

Plasmid-based vectors for subcloning the DNA to be sequenced are commercially available. These include pGEM-5Zf or pGEM-7Zf (Promega Incorporation) whose multiple cloning sites contain two unique restriction sites lying between the end of the DNA insert to be deleted and the binding site for the universal sequencing primer. One of the two unique restriction sites should be near the end of the insert and must generate a blunt or 5'-overhang end that is necessary for exonuclease

Priming site

Priming site

DNA insert to be sequenced Plasmid vector

Restriction enzyme digestion I

DNA insert _^ Exonuclease III

5' overhanging or blunt end

S1 nuclease treatment 5'

Aliquots of deletions

4t4DNA ligase

4t4DNA ligase

Transformation of E. coli FIGURE 5.3 Diagram of progressive deletions of DNA insert for sequencing.

III deletion. The other enzyme should cut near the sequencing primer binding site and must produce a 3'-overhang end to protect the end from exonuclease III deletion. Thus, the exonuclease digestion will be unidirectional and proceed to the insert DNA sequence. The unique restriction enzymes that can be used are listed below.

Unique Enzymes for Generating 5'-Protruding or Blunt Ends


Restriction enzyme Recognition sequence Enzyme sequence


Restriction enzyme Recognition sequence Enzyme sequence

Not I

5'..GC*GGCC GC..3'

Sma I


3'..CG CCGG.CG..5'


Xba I

5'..T*CTAG A..3'

Xho I

5'..CT*CGA G..3'

3'..A GATC*T..5'

3'..GA GCT*C..5'

Sal I

5'..G*TCGA C..3'

3'..C AGCT*G..5'

Note: * indicates the cutting site. In spite of the fact that Hind III, EcoR I and BamH I can also generate 5' overhangs, they are not usually unique because most of the DNA inserts contain the recognition sites for these enzymes.

Note: * indicates the cutting site. In spite of the fact that Hind III, EcoR I and BamH I can also generate 5' overhangs, they are not usually unique because most of the DNA inserts contain the recognition sites for these enzymes.

Unique Enzymes That Produce Exonuclease III-Resistant 3-Protruding Termini


Recognition sequence


Recognition sequence

Sph I

5'..G CATG*C.. 3'

Pvu I

5'..CG AT*CG.. 3'

3'..C*GTAC G.. 5'

3'..GC*TA GC.. 5'

Sac I

5'..G AGCT*C.. 3'

Kpn I

5'..G GTAC*C.. 3'

3'..C*TCGA G.. 5'

3'..C*CATG G.. 5'

Aat II

5'..G ACGT*C.. 3'

Bgl I

5'..GCCN NNN*NGGC.. 3'

3'..C*TGCA G.. 5'

3'..CGGN*NNN NCCG.. 5'

Note: *indicates the restriction enzyme cutting site. Plasmid DNA purification and restriction enzyme digestion are described in appropriate chapters of this book.

Note: *indicates the restriction enzyme cutting site. Plasmid DNA purification and restriction enzyme digestion are described in appropriate chapters of this book.


1. Label 15 to 30 microcentrifuge tubes (0.5 ml) depending on the size of insert to be deleted, and add 7.5 to 8.0 fl of nuclease S1 mixture to each tube. Store on ice prior to use.

2. Dissolve or dilute 6 to 8 fg of linearized DNA in a total 60 fl of 1X exonuclease III buffer.

3. Warm the sample to an appropriate temperature and start digestion by adding 250 to 550 units of exonuclease III to the sample. Mix as quickly as possible. Transfer 2.5 to 3.0 fl of the reaction mixture at 0.5-min intervals to the labeled tubes containing nuclease S1 prepared at step 1. Quickly mix by pipetting up and down several times and place on ice before using.

Notes: (1) The digestion rate depends on the reaction temperature.

Temperature (°C) Digestion rate (bp/min)


25 to 30


80 to 85


90 to 100


200 to 220


455 to 465


600 to 630

(2) It is not necessary to change the buffer between exonuclease III and nuclease S1 because the S1 buffer contains zinc cations and has a low pH, which can inactivate the exonuclease III. However, some exonuclease III buffer will not inhibit the activity of nuclease S1.

4. After all of the samples have been taken, place all the tubes at room temperature and carry out the nuclease S1 digestion of single strands by incubating at room temperature for 30 min.

5. Terminate the reaction by adding 1 ml of S1 stopping buffer and heat at 70°C for 10 min.

6. Check the efficiency of digestion by preparing a 1% agarose gel using 2 ml of sample from each time point.

7. Blunt termini by adding 1 ml of Klenow mixture to each tube and incubating at 37°C for 4 min followed by adding 1 ml of dNTP mixture to each tube and incubate at 37°C for 5 to 6 min.

8. Recircularize the DNA by adding 40 ml of ligase mixture to each tube, mixing and incubating at 16°C overnight. Carry out transformation of the plasmid DNA into an appropriate bacterial host as described in the appropriate chapter in this book. The series of plasmids with progressively reduced sizes can now be simultaneously sequenced.

Reagents Needed

10X Exonuclease III Buffer 0.66 M Tris-HCl, pH 8.0 6.6 mM MgCl2

S1 Buffer

2.5 M NaCl

300 mM Potassium acetate, pH 4.6 10 mM ZnSO4 50% Glycerol

Nuclease S1 Mixture (Fresh) 54 ml S1 buffer 0.344 ml dd. H2O 120 units Nuclease S1

S1 Stopping Buffer 300 mM Tris base 50 mM EDTA, pH 8.0

Klenow Buffer

20 mM Tris-HCl, pH 8.0 100 mM MgCl2

Klenow Mixture

6 to 12 units Klenow DNA polymerase 60 ml Klenow buffer dNTP Mixture

0.13 mM dATP 0.13 mM dGTP 0.13 mM dCTP 0.13 mM dTTP

10X T4 DNA Ligase Buffer 0.5 M Tris-HCl, pH 7.6 0.1 M MgCl2 10 mM ATP

T4 DNA Ligase Mixture

0.1 ml 10X Ligase buffer 0.79 ml dd.H2O 0.1 ml 50% PEG 10 ml of 0.1 M DTT 5 units of T4 DNA ligase


DNA molecules or fragments or RT-PCR products (single-stranded, double-stranded and plasmid DNA) can be directly sequenced by coupling the polymerase chain reaction (PCR) technology and the dideoxynucleotide chain termination method. This is a particularly powerful technique when one wants to simultaneously amplify and sequence a DNA fragment of interest.

Preparation of PCR and Sequencing Reactions

1. Label four 0.5-ml microcentrifuge tubes as A, G, T, or C for each set of sequencing reactions for each PCR primer. A, G, T, and C represent ddATP, ddGTP, ddTTP and ddCTP, respectively.

2. Transfer 0.5 ml of 2X stock mixture of dNTPs/ddATP, dNTPs/ddGTP, dNTPs/ddTTP and dNTPs/ddCTP to the labeled tubes A, G, T and C, respectively. To each tube, add 0.5 ml dd.H2O, generating 1X working mixture solution. Cap the tubes and store on ice prior to use.

3. Prepare the following mixture for each set of four sequencing reactions for each primer (forward or reverse primer) in a 0.5-ml microcentrifuge tube on ice.

Forward or reverse primer, 2 to 5 pmol (15 to 27 ng) depending on size of the primer (15- to 27-mer, 10 to 30 ng/ml) DNA template, 100 to 1000 ng depending on the size of the template (0.4

to 7 kb, 10 to 100 ng/ml) [a-35S]dATP, 1 to 1.2 ml (>1000 Ci/mmol) or [a-32P]dATP, 0.5 ml (800 Ci/mmol) 5X Sequencing buffer 4.0 ml Add dd.H2O to a final volume of 17 ml.

4. Add 1 m (5 units/ml) of Taq DNA polymerase to the mixture at step (c). Gently mix and transfer 4 ml of the primer-template-enzyme mixture to each tube containing 1 ml of dNTPs and appropriate ddNTP prepared at step (b).

5. Overlay the mixture in each tube with approximately 20 ml of mineral oil to prevent evaporation of the samples during the PCR amplification. Place the tubes in a thermal cycler and perform PCR as follows:


Profile Predenaturation Denaturation Annealing Extension Last

Template (<4 kb) 94°C, 3 min 94°C, 1 min 52°C, 1 min 70°C, 1.5 min 4°C

Primer is <24 bases or with <40% G-C content

Template is >4 kb 95°C, 3 min 95°C, 1 min 60°C, 1 min 72°C, 2 min 4°C Primer is >24-mer or <24 bases with >50% G-C content

6. When the PCR cycling is complete, carefully remove the mineral oil from each tube using pipette tips (optional) and add 3.5 ml of stop solution to inactivate the enzyme activity. Proceed to DNA gel sequencing as described earlier in this chapter.

Reagents Needed

2X dNTPs/ddATP Mixture dATP, 80 mM dCTP, 80 mM dTTP, 80 mM 7-Deaza dGTP, 80 mM ddATP, 1.4 mM

2X dNTPs/ddTTP Mixture dATP, 80 mM dCTP, 80 mM dTTP, 80 mM 7-Deaza dGTP, 80 mM ddTTP, 2.4 mM

2X dNTPs/ddGTP Mixture dTTP, 80 mM dATP, 80 mM dCTP, 80 mM 7-Deaza dGTP, 80 mM ddGTP, 120 mM

2X dNTPs/ddCTP Mixture dTTP, 80 mM dCTP, 80 mM dATP, 80 mM 7-Deaza dGTP, 80 mM ddCTP, 800 mM

Amount of Primer per pmol

15-mer or 15 bases, 5 ng

5X Sequencing Buffer

0.25 M Tris-HCl, pH 9.0 at room temperature 10 mM MgCl2

Stop Solution

10 mM NaOH 95% Formamide 0.05% Bromophenol blue 0.05% Xylene cyanole


Like DNA, RNA can also be sequenced by using nonisotopic or isotopic sequencing methods. A reverse transcriptase, avian myeloblastosis virus (AMV), is a DNA polymerase that catalyzes the polymerization of nucleotides using RNA or DNA template. In the labeling reaction, a biotinylated or isotopic nucleotide is incorporated into the cDNA strands transcribed from specific mRNA. The power of this technique is that it allows searching for the potential existence of the mRNA of interest or the mRNA expression in a specific cell or tissue type. Specific primers can be designed based on DNA sequences or amino acid sequences of interest. Total RNA can be directly utilized in specific primer-mRNA template annealing reactions.


1. For each sequencing reaction (one set of four lanes), add the following components in a 0.5ml microcentrifuge tube.

Primer, 12 pmol Total RNA template, 5 mg or mRNA, 100 ng

Add DEPC-dd.H2O to final volume of 10 ml

2. Cap the tube and denature RNA secondary structures at 90°C for 3 min and allow slow cooling to 30°C over a period of 30 min. Briefly spin down and place the tube on ice prior to use.


1. While the annealing reaction is cooling, label four microcentrifuge tubes for each template-primer mixture, A, G, T and C, that respectively represent ddATP, ddGTP, ddTTP and ddCTP.

2. Transfer 4 ml of the termination mixture of ddATP, ddGTP, ddTTP and ddCTP to the labeled tubes A, G, T, and C, respectively. Cap the tubes and keep at room temperature until use.

3. Dilute labeling mixture to fivefold as a working concentration and store on ice prior to use. For example, 2 ml of labeling mixture is diluted to total 10 ml with dd.H2O.

4. Add the following components in a 0.5-ml microcentrifuge tube for each annealing reaction.

5X AMV RT buffer, 5 ml Diluted (1:5) labeling mixture, 2.5 ml

Biotinylated dCTP or dATP, 1 ml or la-32PI dATP or dCTP, 1 ml Add DEPC-dd.H2O to final volume of 10 ml.

5. When the annealing is complete, combine the annealing reaction mixture with the labeling reaction. Then, add 1 ml of AMV reverse transcriptase to the combined mixtures.

6. Carry out reverse transcription by incubating the tube at 42°C for 8 min.


1. Carefully and quickly transfer 4 ml of the labeling mixture to each of the termination tubes (A, G, T, C) prewarmed at 42°C for 1 min. Mix and continue to incubate the reaction at 42°C in a heating block for 10 min.

2. Add 4 ml of stopping solution to each tube, mix and cap the tubes. Store at -20°C until electrophoresis is carried out. Proceed to gel electrophoresis and detection or autoradiography as described earlier in this chapter.

Reagents Needed

AMV Reverse Transcriptase

5X AMV Reverse Transcription Buffer 0.25 M Tris-HCl, pH 8.3 40 mM MgCl2 0.25 M NaCl 5 mM DTT

DTT Stock Solution

100 mM Dithiothreitol

5X Labeling Solution for dGTP (Nonisotopic) 7.5 mM dGTP 7.5 mM dTTP 7.5 mM dCTP or dATP Biotinylated dATP or dCTP

5X Labeling Mixture for dITP (Nonisotopic) 7.5 mM dITP 7.5 mM dTTP 7.5 mM dCTP or dATP Biotinylated dATP or dCTP

ddATP Termination Mixture for dGTP 0.16 M dATP 0.2 M dGTP 0.2 M dCTP 0.2 M dTTP 0.04 M ddATP

ddGTP Termination Mixture for dGTP 0.2 M dATP 0.16 M dGTP 0.2 M dCTP 0.2 M dTTP 0.04 M ddGTP

ddCTP Termination Mixture for dGTP 0.2 M dATP 0.2 M dGTP 0.16 M dCTP 0.2 M dTTP 0.04 M ddCTP

ddTTP Termination Mixture for dGTP 0.2 M dATP 0.2 M dGTP 0.2 M dCTP 0.16 M dTTP 0.04 M ddTTP

Sequence Extending Mixture for dGTP 0.4 M dATP 0.4 M dGTP 0.4 M dCTP 0.4M dTTP

Enzyme Dilution Buffer

Stop Solution

20 mM EDTA, pH 8.0 95% (v/v) Formamide 0.05% Bromophenol blue 0.05% Xylene cyanol FF


1. Gel melts away from the comb during electrophoresis.

Cause: Too much heat is built up.

Solution: Be sure to set the power supply at a constant power or a constant current. Do not set the voltage at some level; otherwise, the current undergoes changes during electrophoresis, producing high temperatures that melt the surface of the gel from the top toward the bottom. When that occurs, multiple loadings of the samples are out of the question.

2. No bands appear at all on the developed x-ray film.

Possible causes: (a) The quality of primer is poor and cannot be annealed with DNA or RNA template. (b) Double-stranded DNA is not well denatured, so that the primer fails to anneal to the template. (c) Some components are missed during the labeling reaction. (d) Sequenase version 2.0 T7 DNA polymerase has lost its activity.

Solution: Make sure to denature the DNA template completely and add all components mandated for the labeling reaction. Try to use control template and primer provided.

3. Bands are fuzzy.

Possible causes: (a) Urea is not washed away from the wells prior to loading the samples. (b) Labeled samples are overheated during denatur-

ation. (c) It takes too long to finish loading all the samples, generating some reannealing of DNA.

Solutions: (a) Make sure to rinse the surface of the gel prior to trying the prerun and repeat washing after the prerun prior to loading the samples into the wells using a pipette. (b) Control the time for denaturing the labeled samples between 2 to 3 min and immediately load the sample into the wells. In the case of many samples, the loading should be carried out quickly so that all the samples are loaded within 2 min.

4. No clear bands appear except for a smear in each lane.

Possible causes: (a) Preparation of DNA template is poor. (b) Labeled DNA samples are not well denatured at 75°C prior to being loaded into the gel. (c) Gel polymerizes too rapidly (10 to 15 min) due to excess 10% APS added. (d) The gel is electrophoresed at too cold or too hot a temperature.

Solutions: (a) Make sure the DNA template is very pure without any nicks. (b) Use 0.5 ml of freshly prepared APS per 100 ml of gel mixture and make sure the gel mixture is cooled to room temperatue prior to pouring into the glass sandwich. (c) Keep the labeling reaction time to 2 to 5 min for regular sequencing and 4 to 7 min for extending sequencing. (d) Make sure to denature the labeled samples at 75 to 80°C for 2 to 3 min before loading into the gel. (e) Dry the gel at 75 to 80°C under vacuum but not above 80°C.

5. All the bands are weak.

Possible causes: (a) Primer concentration is too low or the annealing of primer and template does not work well. (b) Double-stranded linear DNA and double-stranded plasmid DNA are too large due to the presence of a large DNA insert, resulting in difficulty in denaturation. (c) Biotinylated or isotopic nucleotide has lost its activity. (d) Labeled DNA samples are not completely denatured before loading into the gel.

Solutions: (a) Heat the primer and double-stranded DNA template at 65°C for 3 to 4 min and slowly cool to room temperature over 20 to 35 min. (b) Use the alkaline-denaturing method to denature large-size DNA template. If this still does not work well, try to fragment the DNA insert to be sequenced and subclone for further sequencing. (c) Make sure the labeling reaction is carried out properly and denature the labeled sample at 75 to 80°C for 3 to 4 min prior to loading into the gel. (d) Try to use fresh biotinylated or isotopic nucleotide with high activity.

6. Bands occur across all four lanes in some areas that are called compressions.

Possible cause: Target DNA sequences with strong secondary structure or G-C rich.

Solution: Use an appropriate amount of dITP to replace dGTP and an appropriate amount of pyrophosphatase in the labeling reaction, or try for-mamide gel sequencing.

7. Bands are faint near the primer.

Possible cause: Insufficient DNA template or insufficient primer.

Solution: (a) Use 1 to 1.5 mg single-stranded M13 DNA or 3 to 5 mg of plasmid DNA per reaction. (b) Increase the molar ratio of primer:DNA template from 1:1 to 1:4 or 1.5. (c) Use 1 ml of Mn buffer per regular labeling reaction.

8. Bands are faint or blank in one or two lanes.

Possible cause: Some components may have been improperly added or missed in the samples loaded in the appropriate lanes. Solution: Be sure that all the components are added properly.

9. No bands are observed in PCR-directed DNA sequencing, including the positive control.

Possible cause: Taq DNA polymerase has lost polymerization activity or primer is missing.

Solution: Try fresh Taq DNA polymerase and ensure that the primer is included in the annealing reactions.

10. No bands in RNA sequencing, including the positive RNA control lane.

Cause: AMV reverse transcriptase is missing or has lost its activity in the labeling reaction. Solution: Make sure that the reverse transcriptase is functional.

11. No bands occur in sample RNA sequencing but visible bands are shown in the positive RNA control lane.

Cause: The RNA template is degraded or the primer is missing. Solution: Make sure that the RNA is of good quality and that the primer is added in the annealing reaction.

12. High background occurs or no detection at all is obtained in noniso-topic sequencing.

Cause: (a) Blocking is not efficient. (b) Detection reagents have lost their activity.

Solution: Make sure that the nonspecific binding sites are efficiently blocked and try fresh detection reagents.


1. Cullmann, G., Hubscher, U., and Berchtold, M.W., A reliable protocol for dsDNA and PCR product sequencing, BioTechniques, 15, 578, 1993.

2. Sanger, F., Nicken, S., and Coulson, A.R., DNA sequencing with chain termination inhibitors, Proc. Natl. Acad. Sci., USA, 74, 5463, 1977.

3. Church, G.M. and Gilbert, W., Genomic sequencing, Proc. Natl. Acad. Sci., USA, 81, 1991, 1984.

4. Bishop, M.J. and Rawlings, C.J., Nucleic Acid and Protein Sequence Analysis: A Practical Approach, IRC Press, Oxford, 1987.

5. Wiemann, H.V., Grothues, D., Sensen, C., Zimmermann, C.S., Stegemann, H.E., Rupp, T., and Ansorge, W., Automated low-redundancy large-scale DNA sequencing by primer walking, BioTechniques, 15, 714, 1993.

6. Reynolds, T.R., Uliana, S.R.B., Floeter-Winter, L.M., and Buck, G.A., Optimization of coupled PCR amplification and cycle sequencing of cloned and genomic DNA, BioTechniques, 15, 462, 1993.

7. Wu, (W.)L., Song, I., Karuppiah, R., and Kaufman, P.B., Kinetic induction of oat shoot Pulvinus invertase mRNA by gravistimulation and partial cDNA cloning by the polymerase chain reaction, Plant Molecular Biol., 21(6), 1175, 1993.

8. Wu, W., DNA sequencing, in Handbook of Molecular and Cellular Methods in Biology and Medicine, Kaufman, P.B., Wu, W., Kim, D., and Cseke, L., pp. 211-242, CRC Press, Boca Raton, FL, 1995.

9. Wu, (W.)L., Mitchell, J.P., Cohn, N.S., and Kaufman, P.B., Gibberellin (GA3) enhances cell wall invertase activity and mRNA levels in elongating dwarf pea (Pisum sativum) shoots, Int. J. Plant Sci., 154(2), 278, 1993.

6 Information

Superhighway and Computer Databases of Nucleic Acids and Proteins



Part A. Communication with GenBank via the Internet Submission of a Sequence to GenBank Sequence Similarity Searching Using the BLAST Programs BLASTN BLASTX BLASTP

Part B. Computer Analysis of DNA Sequences by the GCG Program Entry and Editing of a Sequence Using GCG Sequence Entry

Sequence Editing or Modification Review of Sequence Output Combination or Assembly of Multiple Fragments into a Single Sequence Generating a New Project File Using the GelStart Program Enter Sequences to be Assembled into the Project File Generated in A (e.g., FRAGMENT) Using the GelEnter Program Compare and Identify Overlap Points of Entered Fragments Using the GelMerge or GelOverlap Program

Assemble and Review the Combined Sequence by Using the GelAssemble Program

Identification of Restriction Enzyme Digestion Sites, Fragment Sizes, and Potential Protein Translations of a DNA Sequence

Exhibition of Restriction Enzymes above Both Strands of a DNA Sequence and Possible Protein Translation below the Sequence Using the Map Program

Identification of Specific Restriction Enzyme Cutting Sites, and Sizes of Fragments by Using the MapSort Program

Comparison of Similarity between Two Sequences

Translation of Nucleic Acid Sequences into Amino Acid Sequences or an Amino Acid Sequence into a Nucleic Acid Sequence Translate

BackTranslate (Using the Sequencel.pep as an Example) Identification of Enzyme Digestion Sites within a Peptide or Protein Obtaining Nucleotide and Amino Acid Sequences from GenBank References


We are living in an era of information exposure. To many molecular biologists, the flood of information is particularly overwhelming because gene cloning, gene mapping, human genome projects, DNA, RNA and protein sequences are growing so rapidly. The fundamental questions concern how such an enormous volume of information is systematically organized and how to find the information desired from a sea of information. Obviously, development of superpower computers and smart computer programs plays a major role in the informatics. The birth of the Internet, particularly, has brought together scientists worldwide, creating the "information superhighway."

The present chapter provides investigators with an introduction to GenBank, the "headquarters of sequence information."1-5 With the help of the Internet, it is easy to send or receive sequences and look for information inside headquarters without the office or laboratory. The topics covered in this chapter include the use of NCBI programs to deposit a sequence to the database, retrieve sequences from the database and carry out database searching from GenBank.3 The second part of the chapter provides an introduction to computer analysis of DNA or protein sequences via a widely used program: the genetics computer group, or the GCG package, originally developed by the Department of Genetics at the University of Wisconsin.4



Nucleic acid and protein sequences can be submitted to GenBank, the EMBL Data Library, or the DNA Database of Japan (DDBJ), regardless of whether they have been published. In fact, most journals require that the sequence to be published first be submitted to whichever of the three libraries is the most convenient. GenBank, the EMBL and DDBJ are international database partners. Data submitted to one site will be exchanged with another site on a daily basis. Data to be submitted can be saved on a floppy disk or printed out and sent to:

National Center for Biotechnology Information (NCBI)

National Library of Medicine

National Institutes of Health

Building 38A, Room 8N-803

8600 Rockville Pike

The fastest and easiest way of submitting DNA, RNA and protein sequences to the three database sites is via e-mail using the following addresses.

GenBank: [email protected]

EMBL: [email protected]

DDBJ: [email protected]

Special forms for sequence submission are available from these sites by mail or e-mail or from appropriate journals. In general, the questions on the forms are straightforward, including the name and features of the sequence. The common questions are:

1. What is the name of the sequence? (The name is given by the authors, e.g., Rat HSP27 cDNA.

2. What organism is the source of the DNA?

3. Is it an mRNA (cDNA) or genomic DNA sequence?

4. If it is a genomic DNA, are the intron sequences determined?

5. Has the sequence been published? What is the title of the publication? What is the journal? etc.

A sequence to be submitted should be prepared according to the detailed instructions on the forms. The newest way to prepare and submit a sequence is to obtain access to the World Wide Web. Here is how it is done:

1. Access the World Wide Web via Netscape.

2. Type NCBI World Wide Web home page at http://www.ncbi.nlm.nih.gov. Press the Enter key. You are in the NCBI home page. Several programs are exhibited on the screen such as Entrez, BLAST, BankIt, OMIM, Taxonomy and Structure.

3. Click BankIt. The following information will be displayed on the screen, including "To prepare a New GenBank submission, enter the size in nucleotides of your DNA sequence here [ ] and click New," and "To update an existing GenBank submission, press Update," depending on a specific sequence.

4. Click Continue. The electronic submission forms will show up on the screen.

5. Follow the instructions and fill in the appropriate answers in the blanks. Submit the sequence.

For each sequence submitted, a unique accession number will be given, e.g., M86389. The accession number is very important because it is considered to be the name of the sequence in the database. After submission, authors are encouraged to update the sequence, including corrections and publication.


Searches for sequence similarity or homology is performed hundreds and thousands of times every day worldwide. It provides a powerful tool for scientists to determine whether a newly isolated gene, DNA or protein is novel. If it is a known DNA or protein sequence, what is the percentage of similarity or homology compared to other species? How large is the gene family? These interesting questions can be answered by searching the GenBank database. Fortunately, the search does not cost anything as long as a computer and access to the World Wide Web are available. Based on our experience, we will introduce BLAST, which is by no means the most powerful or fastest program. BLAST stands for basic local alignment search tool and represents a family of programs for database searching. This section primarily focuses on three programs: BLASTN, BLASTX and BLASTP.


In this program, the sequence submitted for search is called the query sequence; it can be submitted in a single strand of nucleotides or in both strands. The database will search for any similarity among nucleotide sequences and display similar alignments on the computer screen. If there is not too much of an "information traffic jam," the searching speed is unbelievably fast. The entire search and exhibition take a few seconds or less than 1 min.

Let us take an example of how to carry out a database search. Assume that we have a nucleotide sequence named DNA X and that we wish to know whether any nucleotide sequences in the database are similar to DNA X. The search can be performed using the following procedures.

1. Access the World Wide Web system via Netscape. In the location box, type http://www.ncbi.nlm.nih.gov/ and press the Enter key. Several programs are displayed on the computer screen, including BLAST and NCBI Services.

2. Find NCBI Services and click »Blast Sequence Similarity Searching.

3. Click »Basic Blast search or »Advanced Blast search (advanced is recommended). We are now in the NCBI BLAST program. From the top to the bottom, there are a number of blank boxes to be filled in or prechosen as default.

4. Choose blastn for the program and nr or GenBank for the database. A relatively large box in the middle of the screen is for entering the sequence to be submitted. The nucleotide sequence should be in the FASTA format or written in a Courier font.

5. In the sequence box, type the name of the sequence on the first line. The > must be included immediately before the first letter. Otherwise, the database will not recognize the name and treat it as an unknown sequence. For example, >DNA X is the name here. Starting from the second line, type the sequence or paste a sequence cut from another file. An example is given below (only part of the sequence is shown here for brevity):






6. Edit the sequence as desired. Choose an appropriate option as illustrated below (bold letters):

Advanced options for the Blast service: Expect defaultCut off defaultMatrix default Strand bothFilter default Description default Alignments default

7. Check In HTML format to display the search results on the screen or check send reply to send to an e-mail address. However, the format of the results on the screen in an e-mail system may be strange. We recommend that checking In HTML format for a perfect display.

8. Click Submit Query. A warning sign of a possible review of the sequence by a third party will show up and ask whether to continue or to cancel.

9. Click Continue and the search is carried out. Once it is complete, the sequence alignments will be exhibited, two by two, on the screen.

10. The results can be copied and then pasted into a regular Microsoft Word™ file. The font of Courier should be used to be compatible to the searching format. The sequences shown next are examples. The submitted sequence is called Query and the aligned sequence from the database is Sbjct. For brevity, only a part of the sequences is shown here. The first part of the results is a summary of different sequences in the database that show similarity to the Query sequence. The second part is the similarity alignments of individual sequences. If the name at the beginning of each sequence is clicked, detailed features of the searched sequences appear, including name, features, journal and title and full length of DNA, cDNA/mRNA and protein sequences. One can readily copy the sequences into an MS Word file or other files for further analysis. The last part of the results is mainly statistical analysis.


(787 letters)

Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences

243,087 sequences; 337,947,647 total letters.


High Probability

Sequences producing High-scoring Segment Pairs Score P(N) N

gb|M86389|RATHSP27A Rat heat shock protein (Hsp27) mRNA,...3935 0.0 1

gb|S67755|S67755 hsp 27=heat shock protein 27 [rats,... 1440 4.9e-301 4

gb|L11610|MUSHSP25PS Mus musculus heat shock 25 (HSP25)p... 2665 1.3e-266 4 emb|X51747|CLSHSP Cricetulus longicaudatus mRNA for sm...1711 9.3e-264 4 emb|X14686|MURSPH Murine mRNA(pP25a) for 25-kDa mammal...1361 5.0e-250 4

gb|M86389|RATHSP27A Rat heat shock protein (Hsp27) mRNA, complete cds. Length = 787

Plus Strand HSPs:

Identities = 787/787 (100%), Positives = 787/787 (100%), Strand = Plus / Plus


0 0

Post a comment