Download presentation
Presentation is loading. Please wait.
1
DNA Sequencing
2
DNA sequencing How we obtain the sequence of nucleotides of a species
…ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT…
3
Human Genome Project 1990: Start 2000: Bill Clinton: 2001: Draft
“most important scientific discovery in the 20th century” 2000: Bill Clinton: 2001: Draft 3 billion basepairs 2003: Finished $3 billion now what?
4
Which representative of the species?
Which human? Answer one: Answer two: it doesn’t matter Polymorphism rate: number of letter changes between two different members of a species Humans: ~1/1,000 Other organisms have much higher polymorphism rates Population size! Just beneath the surface of the ocean floats a tiny egg. Within it's hull of follicle cells the embryo of Ciona, a sea squirt or Ascidian, is beginning to develop. The shape of the egg does not give any clue of the creature that it is going to be. Now about a third of a millimeter it is waiting to become a larva, and this larva is a rather surprising creature. The larva is developing within a few weeks. A head and a tail are evolving. When you look carefully you can see a rod that stiffens the tail. We know such a device as a notochord. A trained zoologist would identify the animal as belonging to the Phylum Chordata, and that is the same group we humans belong to. (See footnote). Also the internal organs are beginning to show. But the image is a bit deceiving. What seems to be an eye is in fact a device for equilibrium called the 'Otolith'. Above it we find the light sense organ, the 'Ocellus' So there is something suspicious about these so-called sea squirts. Freed from it's shell a familiar form has appeared. Clearly the resemblance with a tadpole larva is seen. The little creature swims for a few hours to find a good spot somewhere on a solid surface. Then a surprising thing happens. This larva is not going to evolve into a fish, amphibian or anything like that. With the front of it's head it attaches itself to a surface. Within minutes resorption of the larva tail commences and the sea squirt will stay on that same spot for all it's life
5
Why humans are so similar
Out of Africa N A small population that interbred reduced the genetic variation Out of Africa ~ 40,000 years ago Heterozygosity: H H = 4Nu/(1 + 4Nu) u ~ 10-8, N ~ 104 H ~ 410-4
6
There is never “enough” sequencing
Somatic mutations (e.g., HIV, cancer) 100 million species Sequencing is a functional assay 7 billion individuals
7
Sequencing Growth Cost of one human genome 2004: $30,000,000
2004: $30,000,000 2008: $100,000 2010: $10,000 2014: “$1,000” (???) ???: $300 How much would you pay for a smartphone?
8
Ancient sequencing technology – Sanger Vectors
DNA Shake DNA fragments Known location (restriction site) Vector Circular genome (bacterium, plasmid) + =
9
Ancient sequencing technology – Sanger Gel Electrophoresis
Start at primer (restriction site) Grow DNA chain Include dideoxynucleoside (modified a, c, g, t) Stops reaction at all possible points Separate products with length, using gel electrophoresis
10
Fluorescent Sanger sequencing trace
Lane signal (Real fluorescent signals from a lane/capillary are much uglier than this). A bunch of magic to boost signal/noise, correct for dye-effects, mobility differences, etc, generates the ‘final’ trace (for each capillary of the run) Trace Recombinant DNA: Genes and Genomes. 3rd Edition (Dec06). WH Freeman Press. Slide Credit: Arend Sidow
11
Making a Library (present)
shear to ~500 bases put on linkers Left handle: amplification, sequencing “Insert” Right handle: amplification, sequencing eventual forward and reverse sequence PCR to obtain preparative quantities size selection on preparative gel Final library (~600 bp incl linkers) after size selection Slide Credit: Arend Sidow
12
Library Library is a massively complex mix of -initially- individual, unique fragments Library amplification mildly amplifies each fragment to retain the complexity of the mix while obtaining preparative amounts (how many-fold do 10 cycles of PCR amplify the sample?) Slide Credit: Arend Sidow
13
Fragment vs Mate pair (‘jumping’)
(Illumina has new kits/methods with which mate pair libraries can be built with less material) Slide Credit: Arend Sidow
14
Illumina cluster concept
Slide Credit: Arend Sidow
15
Cluster generation (‘bridge amplification’)
Slide Credit: Arend Sidow
16
Clonally Amplified Molecules on Flow Cell
Slide Credit: Arend Sidow
17
Illumina Sequencing: Reversible Terminators
fluorophore Detection O HN N 3’ DNA Incorporate Ready for Next Cycle O DNA HN N 3’ free 3’ end OH Deblock and Cleave off Dye cleavage site O HN O N PPP O 3’ 3’ OH is blocked Slide Credit: Arend Sidow
18
Sequencing by Synthesis, One Base at a Time
3’- …-5’ G T C T A G A G A C T T C T G Cycle 1: Add sequencing reagents First base incorporated Remove unincorporated bases Detect signal Cycle 2-n: Add sequencing reagents and repeat Slide Credit: Arend Sidow
19
Read Mapping Slide Credit: Arend Sidow
20
Variation Discovery Slide Credit: Arend Sidow
21
Amount of variation – types of lesions
Mutation Types 1000 Genomes consortium pilot paper, Nature, 2010 3,000,000 “we’re heterozygous in every thousandth base of our genome” Slide Credit: Arend Sidow
22
Method to sequence longer regions
genomic segment cut many times at random (Shotgun) Get one or two reads from each segment ~900 bp ~900 bp
23
Two main assembly problems
De Novo Assembly Resequencing
24
Reconstructing the Sequence (De Novo Assembly)
reads Cover region with high redundancy Overlap & extend reads to reconstruct the original genomic region
25
Definition of Coverage
Length of genomic segment: G Number of reads: N Length of each read: L Definition: Coverage C = N L / G How much coverage is enough? Lander-Waterman model: Prob[ not covered bp ] = e-C Assuming uniform distribution of reads, C=10 results in 1 gapped region /1,000,000 nucleotides
26
Repeats Bacterial genomes: 5% Mammals: 50% Repeat types:
Low-Complexity DNA (e.g. ATATATATACATA…) Microsatellite repeats (a1…ak)N where k ~ 3-6 (e.g. CAGCAGTAGCAGCACCAG) Transposons SINE (Short Interspersed Nuclear Elements) e.g., ALU: ~300-long, 106 copies LINE (Long Interspersed Nuclear Elements) ~4000-long, 200,000 copies LTR retroposons (Long Terminal Repeats (~700 bp) at each end) cousins of HIV Gene Families genes duplicate & then diverge (paralogs) Recent duplications ~100,000-long, very similar copies
27
Sequencing and Fragment Assembly
AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT 3x109 nucleotides 50% of human DNA is composed of repeats Error! Glued together two distant regions
28
What can we do about repeats?
Two main approaches: Cluster the reads Link the reads
29
What can we do about repeats?
Two main approaches: Cluster the reads Link the reads
30
What can we do about repeats?
Two main approaches: Cluster the reads Link the reads
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.