Worked Problems

1. A molecule of double-stranded DNA that is 5 million base pairs long has a base composition that is 62% G + C. How many times, on average, are the following restriction sites likely to be present in this DNA molecule?

(a) BamHI (recognition sequence = GGATCC)

(b) HindIII (recognitions sequence = AAGCTT)

(c) HpaII (recognition sequence = CCGG)

The percentages of G and C are equal in double-stranded DNA; so, if G + C = 62%, then %G = %C = 62%/2 = 31%. The percentage of A + T = (100% - G + C) = 48%, and %A = %T = 48%/2 = 24%. To determine the probability of finding a particular base sequence, we use the multiplicative rule, multiplying together the probably of finding each base at a particular site.

(a) The probability of finding the sequence GGATCC = 0.31 X 0.31 X 0.24 X 0.24 X 0.31 X 0.31 = 0.00053. To determine the average number of recognition sequences in a 5-million-base-pair piece of DNA, we multiply 5,000,000 bp X 0.00053 = 2659.5 recognition sequences.

(b) The number of AAGCTT recognition sequences is 0.24 X 0.24 X 0.31 X 0.31 X 0.24 X 0.24 X 5,000,000 = 1594 recognition sequences.

(c) The number of CCGG recognition sequences is 0.31 X 0.31 X 0.31 X 0.31 X 5,000,000 = 46,176 recognition sequences.

2. A protein has the following amino acid sequence:

Met-Leu-Arg-Ser-Arg-Met-Tyr-Trp-Asp-His-Glu-Thr

You wish to make a set of probes to screen a cDNA library for the sequence that encodes this protein. Your probes should be at least 18 nucleotides in length.

(a) Which amino acids in the protein should be used so that the smallest number of probes is required? (Consult the genetic code in Figure 15.12.)

(b) How many different sequences must be synthesized to be certain that you will find the correct cDNA sequence that specifies the protein?

We first write out all the codons that can specify all the amino acids in the protein, using the genetic code in Figure 15.12 (see table below).

(a) The 18-bp region encoding amino acids 6 through 11 should be used, because this region has the fewest number of possible codons.

(b) For amino acids 6 through 11, there is one possible codon for Met, two for Tyr, one for Trp, two for Asp, two for His, and two for Glu. Thus 1 X 2 X 1 X 2 X 2 X 2 = 16 possible sequences must be synthesized to locate the gene.

0 0