Download presentation
Presentation is loading. Please wait.
1
DNA Sequencing
2
DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT…
3
Which representative of the species? Which human? Answer one: Answer two: it doesn’t matter Polymorphism rate: number of letter changes between two different members of a species Humans: ~1/1,000 Other organisms have much higher polymorphism rates Population size!
4
Why humans are so similar A small population that interbred reduced the genetic variation Out of Africa ~ 40,000 years ago Out of Africa Heterozygosity: H H = 4Nu/(1 + 4Nu) u ~ 10 -8, N ~ 10 4 H ~ 4 10 -4 N
5
DNA Sequencing Goal: Find the complete sequence of A, C, G, T’s in DNA Challenge: There is no machine that takes long DNA as an input, and gives the complete sequence as output Can only sequence ~900 letters at a time
6
DNA Sequencing – vectors + = DNA Shake DNA fragments Vector Circular genome (bacterium, plasmid) Known location (restriction site)
7
Different types of vectors VECTORSize of insert Plasmid 2,000-10,000 Can control the size Cosmid40,000 BAC (Bacterial Artificial Chromosome) 70,000-300,000 YAC (Yeast Artificial Chromosome) > 300,000 Not used much recently
8
DNA Sequencing – gel electrophoresis 1.Start at primer(restriction site) 2.Grow DNA chain 3.Include dideoxynucleoside (modified a, c, g, t) 4.Stops reaction at all possible points 5.Separate products with length, using gel electrophoresis
9
Current Sanger Sequencing Capacity 3730xl: 690 Kbp/day, 900bp reads … 1550 Kbp/day, 700bp reads … 2100 Kbp/day, 550bp reads
10
Reconstructing the Sequence (Fragment Assembly) Cover region with high redundancy Overlap & extend reads to reconstruct the original genomic region reads
11
Repeats Bacterial genomes:5% Mammals:50% Repeat types: Low-Complexity DNA (e.g. ATATATATACATA…) Microsatellite repeats (a 1 …a k ) N where k ~ 3-6 (e.g. CAGCAGTAGCAGCACCAG) Transposons SINE (Short Interspersed Nuclear Elements) e.g., ALU: ~300-long, 10 6 copies LINE (Long Interspersed Nuclear Elements) ~4000-long, 200,000 copies LTR retroposons (Long Terminal Repeats (~700 bp) at each end) cousins of HIV Gene Families genes duplicate & then diverge (paralogs) Recent duplications ~100,000-long, very similar copies
12
Sequencing and Fragment Assembly AGTAGCACAGA CTACGACGAGA CGATCGTGCGA GCGACGGCGTA GTGTGCTGTAC TGTCGTGTGTG TGTACTCTCCT 3x10 9 nucleotides 50% of human DNA is composed of repeats Error! Glued together two distant regions
13
What can we do about repeats? Two main approaches: Cluster the reads Link the reads
14
What can we do about repeats? Two main approaches: Cluster the reads Link the reads
15
What can we do about repeats? Two main approaches: Cluster the reads Link the reads
16
Pyrosequencing on a chip $10K 400,000 reads/day 200-300 bp/read => ~100Mbp/day $10K 1M reads/day 500 bp/read => ~500Mbp/day
17
Single Molecule Array for Genotyping—Solexa ~$10K ~1.5 Mbp, 36bp reads 3 days
18
Polony Sequencing
19
Nanopore Sequencing http://www.mcb.harvard.edu/branton/index.htm
20
Molecular Inversion Probes
21
Illumina Genotype Arrays
22
Summary – Sequencing in 2008 Mb /day Read Length Paired?Cost Sanger 1.5900-? Pyrosequencing 100 (400) 200 (400) ½ length$10K Solexa/ABI 50036Yes$3K Polony 200025?No$10K Genotyping Array 1M SNP /chip N.A. $300
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.