Presentation is loading. Please wait.

Presentation is loading. Please wait.

PCR AND DNA SEQUENCING MBG-487 Işık G. Yuluğ.

Similar presentations


Presentation on theme: "PCR AND DNA SEQUENCING MBG-487 Işık G. Yuluğ."— Presentation transcript:

1 PCR AND DNA SEQUENCING MBG-487 Işık G. Yuluğ

2 Polymerase Chain Reaction (PCR)
DNA melting Primer annealing Nobel Prize in Chemistry 1993, at age 48 DNA elongation Kary Mullis (invented PCR in 1983)

3 Exponential nature of PCR amplification

4 PCR Every cycle results in a doubling of the number of strands DNA present After the first few cycles, most of the product DNA strands made are the same length as the distance between the primers The result is a dramatic amplification of a the DNA that exists between the primers. The amount of amplification is 2 raised to the n power; n represents the number of cycles that are performed. After 20 cycles, this would give approximately 1 million fold amplification. After 40 cycles the amplification would be 1 x 1012

5

6 Try for equal Tm for both primers

7 Avoid primer dimer formation
Marginally problematic primer

8 Use Software to avoid of such problems

9 Typical PCR gel (Every PCR should by gel-verifyed)

10 PCR can be very tricky Optimizing PCR protocols
While PCR is a very powerful technique, often enough it is not possible to achieve optimum results without optimizing the protocol Critical PCR parameters: - Concentration of DNA template, nucleotides, divalent cations (especially Mg2+) and polymerase - Error rate of the polymerase (Taq, Vent exo, Pfu) - Primer design

11 Primer design General notes on primer design in PCR
Perhaps the most critical parameter for successful PCR is the design of primers Primer selection Critical variables are: - primer length - melting temperature (Tm) - specificity - complementary primer sequences - G/C content - 3’-end sequence Primer length - specificity and the temperature of annealing are at least partly dependent on primer length - oligonucleotides between 20 and 30 (50) bases are highly sequence specific - primer length is proportional to annealing efficiency: in general, the longer the primer, the more inefficient the annealing - the primers should not be too short as specificity decreases

12 Primer design Specificity
Primer specificity is at least partly dependent on primer length: there are many more unique 24 base oligos than there are 15 base pair oligos Probability that a sequence of length n will occur randomly in a sequence of length m is: Example: the mtDNA genome has about 20,000 bases, the probability of randomly finding sequences of length n is: n Pn x 10-2 x 10-5 P = (m – n +1) x (¼)n

13 Primer design Complementary primer sequences
primers need to be designed with absolutely no intra-primer homology beyond 3 base pairs. If a primer has such a region of self-homology, “snap back” can occur - another related danger is inter-primer homology: partial homology in the middle regions of two primers can interfere with hybridization. If the homology should occur at the 3' end of either primer, primer dimer formation will occur G/C content ideally a primer should have a near random mix of nucleotides, a 50% GC content there should be no PolyG or PolyC stretches that can promote non-specific annealing 3’-end sequence - the 3' terminal position in PCR primers is essential for the control of mis-priming - inclusion of a G or C residue at the 3' end of primers helps to ensure correct binding (stronger hydrogen bonding of G/C residues)

14 Primer design Melting temperature (Tm) - the goal should be to design a primer with an annealing temperature of at least 50°C - the relationship between annealing temperature and melting temperature is one of the “Black Boxes” of PCR - a general rule-of-thumb is to use an annealing temperature that is 5°C lower than the melting temperature - the melting temperatures of oligos are most accurately calculated using nearest neighbor thermodynamic calculations with the formula: Tm = H [S+ R ln (c/4)] – °C log 10 [K+] (H is the enthalpy, S is the entropy for helix formation, R is the molar gas constant and c is the concentration of primer) - a good working approximation of this value can be calculated using the Wallace formula: Tm = 4x (#C+#G) + 2x (#A+#T) °C - both of the primers should be designed such that they have similar melting temperatures. If primers are mismatched in terms of Tm, amplification will be less efficient or may not work: the primer with the higher Tm will mis-prime at lower temperatures; the primer with the lower Tm may not work at higher temperatures.

15 Fidelity of PCR is often an issue

16 Proof-reading activity enzymes

17

18 If complete copies is amplified

19

20

21 LIST OF PCR REACTION CONDITIONS THAT MUST BE OPTIMIZED
PRIMER ANNEALING TEMPERATURE: Increase in temperature: Increases specificity of primer annealing by destabilizing base pair mismatches. Decrease in temperature: Increases the sensitivity (and yield) of the reaction by stabilizing base pairing.

22 LIST OF PCR REACTION CONDITIONS THAT MUST BE OPTIMIZED
DNA POLYMERASE: Enzyme concentration: Enzyme concentrations affect the sensitivity and specificity; too little enzyme produces insufficient product and too much enzyme decreases specificity. Type of DNA polymerase: Taq enzyme is the most efficient enzyme but it has also the highest error rate; in contrast, pfu has a decreased error rate but synthesizes the least amount of product.

23 LIST OF PCR REACTION CONDITIONS THAT MUST BE OPTIMIZED
MAGNESIUM CONCENTRATION: Varying the (MgCl2): Low MgCl2 increases specificity, high MgCl2 stabilizes primer annealing and increases sensitivity, but can also decrease primer specificity.

24 LIST OF PCR REACTION CONDITIONS THAT MUST BE OPTIMIZED
CYCLE PARAMETERS: Denaturation temperature: Elevated denaturation temperature can increase sensitivity by allowing complete template denaturation, especially of G+C rich targets; however, Taq polymerase activity decreases rapidly above 93oC. Duration time of primer extension: Longer primer extension times increase sensitivity in long distance PCR. Cycle number: Assay sensitivity is determined by both the efficiency of the enzyme reaction and the initial number of DNA target molecules; it should be necessary to increase sycle number beyond 35 if the reaction contains <103 initial target molecules.

25

26

27 Non-specific PCR and how to improve it
+ G L Y M A R K E Increase in Mg concentraton 5% D M S O Just PCR

28 -- generates PCR products with single A overhangs on the 3´-ends
PCR enzymes Taq DNA polymerase, the first enzyme used for PCR, is still the most popular. -- high processivity and is the least expensive choice Halflife at 95C is 1.6 hours -- generates PCR products with single A overhangs on the 3´-ends (Suitable for TOPO-cloning) “Topo” cloning system (Invitrogen)

29 The technology behind TOPO Cloning
The key to TOPO Cloning is the enzyme, DNA topoisomerase I, which functions both as a restriction enzyme and as a ligase. Its biological role is to cleave and rejoin DNA during replication. Vaccinia virus topoisomerase I specifically recognizes the pentameric sequence 5’-(C/T)CCTT-3’ and forms a covalent bond with the phosphate group of the 3’ thymidine. It cleaves one DNA strand, enabling the DNA to unwind. The enzyme then religates the ends of the cleaved strand and releases itself from the DNA. To harness the religating activity of topoisomerase, TOPO vectors are provided linearized with topoisomerase I covalently bound to each 3’ phosphate. This enables the vectors to readily ligate DNA sequences with compatible Ends. In only 5 minutes at room temperature, the ligation is complete and ready for transformation into E. coli.

30 From Invitrogen

31 Tth polymerase Mg 2+ Mn 2+ Thermus thermophilus strain HB8.
RNA-dependent DNA-polymerase activity in the presence of Mn2+ ions. Mg 2+ Mn 2+ DNA-dependent DNA-polymerase activity in the presence of Mg2+ ions. The fragment should be ideally smaller 1 kb.

32 Pfu polymerase more expensive (from Pyrococcus furiosus).
Proofreading or high fidelity DNA polymerases (from Pyrococcus furiosus). approx.1 / 2, 000,000 nucleotides before making an error. In comparison Taq DNA polymerase makes an error in approx. every 1/ 10,000 nucleotides. can tolerate temperatures exceeding 95°C, enabling it to PCR amplify GC-rich targets. more expensive

33 Vent (From Thermococcus litoralis)
also known as Tli polymerase Very termostable: Halflife at 95oC is approximately 7 hours 3'->5' exonuclease activity presents Vent error rate is intermediate between Taq and Pfu. 2-5 x 10-5 errors/bp Other polymerases: Deep Vent (Pyrococcus species GB-D) (New England Biolabs) New England Biolabs claims fidelity is equal to or greater than that of Vent. Replinase (Thermus flavis) 1.03 x 10-4 errors/base

34 Long-Range PCR Use of two polymerases:
a non-proofreading polymerase Taq is the main polymerase in the reaction, a proofreading polymerase (3' to 5' exo) Pwo is present at a lower concentration. 22-24 kb PCR products achieved on Qiagen and Eppendorf PCR mixes Taq+ Pwo (Pyrococcus woesei) ; Pwo is very stable, 2 hrs at 100 C

35 DNA SEQUENCING

36 DNA sequencing: Importance
Basic blueprint for life Gene and protein Function Structure Evolution Genome-based diseases- “inborn errors of metabolism” Genetic disorders Genetic predispositions to infection Diagnostics Therapies

37 DNA sequencing methodologies: 1977!
Maxam-Gilbert base modification by general and specific chemicals. depurination or depyrimidination. single-strand excision. not amenable to automation Sanger DNA replication. substitution of substrate with chain-terminator chemical. more efficient Automation *

38 Maxam-Gilbert ‘chemical’ method

39 “bio” based methods Sanger dideoxynucleotides

40 DNA chemistry

41 DNA biochemistry: replication fork

42 SEQUENCING: (Sanger method)
Frederick Sanger (Nobel prize 1980 with Paul Berg and Walter Gilbert)

43 DNA replication: biochemistry
5’ purine or pyrimidine P O OH P O OH P O OH HO C N O purine or pyrimidine O N O P O C O OH 3’ OH

44 Dideoxynucleotide blocks chain elongation

45 DNA sequencing: Sanger-II
purine or pyrimidine P O OH P O OH P O OH HO C N O purine or pyrimidine O chain termination method N O P O C O OH H

46 Sanger method

47 Methods of sequence visualization:
1. Labeled primer 2. Labelled DNA chain (randomly) 3. Labeled terminators

48 Labelled nucleotide (radioactively)

49 Fluorescent DNA labeling with BigDye

50 Applied Biosystems Inc
Applied Biosystems Inc., have designed an automated method that combines the PCR and actual sequencing <

51 DNA sequencing: chemistry
* * * * * * * * * * * * * *

52 DNA sequencing: in practice
template + polymerase + 1 dCTP dTTP dGTP dATP ddATP primer 2 dCTP dTTP dGTP dATP ddGTP primer 3 dCTP dTTP dGTP dATP ddTTP primer 4 dCTP dTTP dGTP dATP ddCTP primer electrophoresis A•T G•C T•A C•G extension

53 DNA sequencing: upgrade, second iteration, terminator-label
Disadvantages of primer-labels: four reactions tedious limited to certain regions, custom oligos or limited to cloned inserts behind ‘universal’ priming sites. Advantages: Solution: fluorescent dye terminators

54 DNA sequencing: chemistry
template + polymerase + dCTP dTTP dGTP dATP ddATP ddGTP ddTTP ddCTP electrophoresis A•T G•C T•A C•G extension

55 DNA sequencing: photochemistry

56 DNA sequencing: Computation

57 DNA sequencing: Computation

58 Nucleotides for Sequencing
Standard nucleotides (A,T,C, G) Modified versions of these nucleotides Labeled so they fluoresce Structurally different so that they stop DNA synthesis when they are added to a strand

59 Reaction Mixture Copies of DNA to be sequenced Primer DNA polymerase
Standard nucleotides Modified nucleotides

60 Reactions Proceed Nucleotides are assembled to create complementary strands When a modified nucleotide is included, synthesis stops Result is millions of tagged copies of varying length

61 Recording the Sequence
T C C A T G G A C C T C C A T G G A C Recording the Sequence T C C A T G G A T C C A T G G T C C A T G T C C A T T C C A electrophoresis gel T C C DNA is placed on gel Fragments move off gel in size order; pass through laser beam Color each fragment fluoresces is recorded on printout T C one of the many fragments of DNA migrating through the gel T one of the DNA fragments passing through a laser beam after moving through the gel T C C A T G G A C C A

62 DNA Sequencing Goal: Find the complete sequence of A, C, G, T’s in DNA
Challenge: There is no machine that takes long DNA as an input, and gives the complete sequence as output Can only sequence ~500 letters at a time

63 DNA sequencing – vectors
Shake DNA fragments Known location (restriction site) Vector Circular genome (bacterium, plasmid) + =

64 Different types of vectors
Size of insert Plasmid 2,000-10,000 Can control the size Cosmid 40,000 BAC (Bacterial Artificial Chromosome) 70, ,000 YAC (Yeast Artificial Chromosome) > 300,000 Not used much recently

65 DNA sequencing – gel electrophoresis
Start at primer (restriction site) Grow DNA chain Include dideoxynucleoside (modified a, c, g, t) Stops reaction at all possible points Separate products with length, using gel electrophoresis

66 Electrophoresis diagrams

67 Challenging to read answer

68 Challenging to read answer

69 Challenging to read answer

70 Reading an electropherogram
Filtering Smoothening Correction for length compressions A method for calling the letters – PHRED PHRED – PHil’s Read EDitor (by Phil Green) Based on dynamic programming Several better methods exist, but labs are reluctant to change

71 Output of PHRAP: a read A read: 500-700 nucleotides
A C G A A T C A G …A …21 Quality scores: -10log10Prob(Error) Reads can be obtained from leftmost, rightmost ends of the insert Double-barreled sequencing: Both leftmost & rightmost ends are sequenced

72 Method to sequence longer regions
genomic segment cut many times at random (Shotgun) Get one or two reads from each segment ~500 bp ~500 bp

73 Reconstructing the Sequence (Fragment Assembly)
reads Cover region with ~7-fold redundancy (7X) Overlap reads and extend to reconstruct the original genomic region

74 Definition of Coverage
Length of genomic segment: L Number of reads: n Length of each read: l Definition: Coverage C = n l / L How much coverage is enough? Lander-Waterman model: Assuming uniform distribution of reads, C=10 results in 1 gapped region /1,000,000 nucleotides

75 Challenges with Fragment Assembly
Sequencing errors ~1-2% of bases are wrong Repeats false overlap due to repeat

76 Repeats Bacterial genomes: 5% Mammals: 50% Repeat types:
Low-Complexity DNA (e.g. ATATATATACATA…) Microsatellite repeats (a1…ak)N where k ~ 3-6 (e.g. CAGCAGTAGCAGCACCAG) Transposons SINE (Short Interspersed Nuclear Elements) e.g., ALU: ~300-long, 106 copies LINE (Long Interspersed Nuclear Elements) ~500-5,000-long, 200,000 copies LTR retroposons (Long Terminal Repeats (~700 bp) at each end) cousins of HIV Gene Families genes duplicate & then diverge (paralogs) Recent duplications ~100,000-long, very similar copies

77 Strategies for whole-genome sequencing
Hierarchical – Clone-by-clone yeast, worm, human Break genome into many long fragments Map each long fragment onto the genome Sequence each fragment with shotgun Online version of (1) – Walking rice genome Start sequencing each fragment with shotgun Construct map as you go Whole Genome Shotgun fly, human, mouse, rat, fugu One large shotgun pass on the whole genome

78 Hierarchical Sequencing

79 Hierarchical Sequencing Strategy
a BAC clone map genome Obtain a large collection of BAC clones Map them onto the genome (Physical Mapping) Select a minimum tiling path Sequence each clone in the path with shotgun Assemble Put everything together

80 Methods of physical mapping
Goal: Map the clones relative to one another Use the map to select a minimal tiling set of clones to sequence Methods: Hybridization Digestion

81 1. Hybridization p1 pn Short words, the probes, attach to complementary words Construct many probes p1, p2, …, pn Treat each clone Ci with all probes Record all attachments (Ci, pj) Same words attaching to clones X, Y  overlap

82 2. Digestion Restriction enzymes cut DNA where specific words appear Cut each clone separately with an enzyme Run fragments on a gel and measure length Clones Ca, Cb have fragments of length { li, lj, lk }  overlap Double digestion: Cut with enzyme A, enzyme B, then enzymes A + B

83 Online Clone-by-clone The Walking Method

84 The Walking Method Build a very redundant library of BACs with sequenced clone-ends (cheap to build) Sequence some “seed” clones “Walk” from seeds using clone-ends to pick library clones that extend left & right

85 Walking: An Example

86 Advantages & Disadvantages of Hierarchical Sequencing
ADV. Easy assembly DIS. Build library & physical map; redundant sequencing Whole Genome Shotgun (WGS) ADV. No mapping, no redundant sequencing DIS. Difficult to assemble and resolve repeats The Walking method – motivation Sequence the genome clone-by-clone without a physical map The only costs involved are: Library of end-sequenced clones (cheap) Sequencing

87 Walking off a Single Seed
Low redundant sequencing Many sequential steps

88 Walking off a single clone is impractical
Cycle time to process one clone: 1-2 months Grow clone Prepare & Shear DNA Prepare shotgun library & perform shotgun Assemble in a computer Close remaining gaps A mammalian genome would need 15,000 walking steps !

89 Walking off several seeds in parallel
Efficient Inefficient Few sequential steps Additional redundant sequencing In general, can sequence a genome in ~5 walking steps, with <20% redundant sequencing

90 Using Two Libraries Most inefficiency comes from closing a small ocean with a much larger clone Solution: Use a second library of small clones

91 Whole-Genome Shotgun Sequencing

92 Whole Genome Shotgun Sequencing
cut many times at random plasmids (2 – 10 Kbp) forward-reverse paired reads known dist cosmids (40 Kbp) ~500 bp ~500 bp


Download ppt "PCR AND DNA SEQUENCING MBG-487 Işık G. Yuluğ."

Similar presentations


Ads by Google