Sequencing Technologies 2nd Generation (“NextGen”) Sequencing Technologies “Fantastic” bizarre or exotic; seeming more appropriate to a fairy tale than to reality or practical use
Read Length is Not As Important For Resequencing Jay Shendure
Paired End Reads are Important! Known Distance Read 1 Read 2 Repetitive DNA Unique DNA Paired read maps uniquely Single read maps to multiple positions
emulsion PCR emPCR Margulies M et al., (2006) Genome sequencing in microfabricated high-density picolitre reactors Nature 437, 376-380
Roche 454 Margulies M et al., (2006) Genome sequencing in microfabricated high-density picolitre reactors Nature 437, 376-380
OH P P P P P P P EE Slawson Tempel, © WUSTL
OH P P P P P P P EE Slawson Tempel, © WUSTL
OH P P P OH P P P OH P P P OH P P P OH P P P P P P P EE Slawson Tempel, © WUSTL
P P P OH P P P OH P P P P P P P EE Slawson Tempel, © WUSTL
P 3 2 1 OH P P P P P P P P P P EE Slawson Tempel, © WUSTL
P Pyrophosphate OH P P P P P P P P EE Slawson Tempel, © WUSTL
P OH P P P P P P P P EE Slawson Tempel, © WUSTL
ATP + luciferin P OH P P P P P P P P EE Slawson Tempel, © WUSTL
ATP + luciferin P OH P P P P P P P P EE Slawson Tempel, © WUSTL
P OH P OH P OH P OH OH P P P P P P P P EE Slawson Tempel, © WUSTL
P OH P OH P OH P OH OH P P P P P P P P EE Slawson Tempel, © WUSTL
P OH P OH P OH P OH OH P P P P P P P P EE Slawson Tempel, © WUSTL
P OH P OH P OH P P OH P P P P P P P P EE Slawson Tempel, © WUSTL
P OH P OH P P OH P OH P P P P P P P P EE Slawson Tempel, © WUSTL
P OH P OH P P P P OH P P P P P P P P P EE Slawson Tempel, © WUSTL
P OH P P P OH P OH P P P P P P P P P EE Slawson Tempel, © WUSTL
P OH P P P P OH P P P P P P P P P P EE Slawson Tempel, © WUSTL
P P P P OH P P P P P P P P P P EE Slawson Tempel, © WUSTL
ATP + luciferin P P P P OH P P P P P P P P P P EE Slawson Tempel, © WUSTL
ATP + luciferin P P P P OH P P P P P P P P P P EE Slawson Tempel, © WUSTL
Brightness of flash is proportional to number of nucleotides added Flash is too bright 4-mer 3-mer Flash brightness 2-mer 1-mer TCACTTCAAGGGT… EE Slawson Tempel, © WUSTL
A T G C ~ 0.5 Gb/run Read length 350-400 bp 200 cycles Roche 454 EE Slawson Tempel, © WUSTL
Illumina Nebulizer ~ 400 bp EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
Flow cell 8 channels (“lanes”) Surface of flow cell coated with a lawn of oligo pairs
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
Each piece has a unique sequence EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
“bridge PCR” EE Slawson Tempel, © WUSTL
thousands of strands/cluster each cluster (“polony”) has a unique sequence EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
EE Slawson Tempel, © WUSTL
STOP P P P P P P P P P P P P EE Slawson Tempel, © WUSTL
Metzger M (2009) Nature Reviews Genetics 11: 31-46
STOP P P P STOP P P P STOP P P P STOP P P P EE Slawson Tempel, © WUSTL
P P P P OH P P P P P P P P P EE Slawson Tempel, © WUSTL STOP STOP STOP
P P P P OH P P P P P P P P P EE Slawson Tempel, © WUSTL STOP STOP STOP
P STOP P STOP P P P P P P P P P P EE Slawson Tempel, © WUSTL
P STOP P P P P P P P P P P EE Slawson Tempel, © WUSTL
STOP P P P P P P P P P P EE Slawson Tempel, © WUSTL
STOP P P P P P P P P P P EE Slawson Tempel, © WUSTL
STOP OH P P P P P P P P P P EE Slawson Tempel, © WUSTL
P P P P P P P P OH P P P P P P P P P P EE Slawson Tempel, © WUSTL STOP
P P P P P P P P OH P P P P P P P P P P EE Slawson Tempel, © WUSTL STOP
P STOP P STOP STOP P P P P P P P P P P P P EE Slawson Tempel, © WUSTL
P STOP P P P P P P P P P P P EE Slawson Tempel, © WUSTL
STOP P P P P P P P P P P P EE Slawson Tempel, © WUSTL
STOP P P P P P P P P P P P EE Slawson Tempel, © WUSTL
STOP OH P P P P P P P P P P P EE Slawson Tempel, © WUSTL
G… © Illumina, EEST, © WUSTL
GC… © Illumina, EEST, © WUSTL
GCT… © Illumina, EEST, © WUSTL
GCTG… © Illumina, EEST, © WUSTL
GCTGA… © Illumina, EEST, © WUSTL
100+ Million Clusters Per Flow Cell 100 Microns
Camera time is the limiting step! Flowcell 8 lanes For picture taking: Each lane is broken up into 100 tiles, each fluor is imaged separately – 2400 pictures taken per cycle EE Slawson Tempel, © WUSTL
Chemistry problem 1: terminator is retained out of phase STOP P P P P P P P P P P EE Slawson Tempel, © WUSTL
Chemistry problem 2: fluor is retained OH P P P P P P P P P P EE Slawson Tempel, © WUSTL
Chemistry problem 2: fluor is retained STOP P P P P P P P P P P EE Slawson Tempel, © WUSTL
Chemistry problem 2: fluor is retained STOP P P P P P P P P P P EE Slawson Tempel, © WUSTL
Illumina >100 Gb/run HiSeq ~ 3 – 30 Gb/run GAII Read length 90x106 reads/lane * 102 bp/read = 9x109 bp/lane * 16 lanes/run = 144 Gb/run ~ 3 – 30 Gb/run GAII Read length 30 – 120 bp
ABI SOLiD Support Oligonucleotide Ligation Detection emPCR
ABI SOLiD Mardis ER. (2008) Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet. 2008;9:387-402.
ABI SOLiD Mardis ER. (2008) Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet. 2008;9:387-402.
ABI SOLID Mardis ER. (2008) Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet. 2008;9:387-402.
ABI SOLiD Mardis ER. (2008) Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet. 2008;9:387-402.
Mardis ER. (2008) Next-generation DNA sequencing methods Mardis ER. (2008) Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet. 2008;9:387-402.
Ion Torrent
Nature 475:348 (2011) ~100 bp reads 30 Mb/run
Ion Torrent read quality
454, 7.4X, 24.5 Gb cost < $1M 3.3 million SNPs 10,654 cause aa substitution (7,648 different from Venter) 222,718 indels (2 to 40kb) 18 CNVs (26 kb to 1.6 Mb) carrier of 10 highly penetrant disease alleles
Illumina, 73X, 173 Gb contig N50 = 40 kb scaffold N50 = 1.3 Mb PMID: 20010809 Illumina, 73X, 173 Gb contig N50 = 40 kb scaffold N50 = 1.3 Mb
ABI SOLiD, 30X coverage 35 interchromosomal translocations PLoS Genetics 6: e1000832 (2010) ABI SOLiD, 30X coverage 107.5 Gb of raw data 55.51 Gb mapped to genome 35 interchromosomal translocations 1,315 structural variations (>100 bp) 191,743 small (<21 bp) indels 2,384,470 SNVs 512 genes homozygously mutated
a recessive EMS-induced mutation affecting egg shell morphology Genetics 2009 182: 25–32 a recessive EMS-induced mutation affecting egg shell morphology Illumina, 8X coverage 103 SNP differences between mutant and wt 9 non-synonomous 2 nonsense >> one in encore, an obvious candidate
Illumina 5.1 Gb of sequence 76 bp reads 40X coverage 30 volume 42 | number 1 | january 2010 Illumina 5.1 Gb of sequence 76 bp reads 40X coverage 4 affected individuals
RNA-Seq Pepke S, Wold B & Mortazavi A. (2009) Nature Methods 6:S22
ChIP-Seq Lefrançois P et al. (2009) Efficient yeast ChIP-Seq using multiplex short-read DNA sequencing. BMC Genomics 10:37
Plant Physiology, July 2009, Vol. 150, pp. 1541–1555
“Fabulous” 3rd Generation (“Next2Gen”) Sequencing Technologies having no basis in reality; mythical
. A T G C . A T G C . A T G C . A T G C
Helicos Single-molecule sequencing
Gupta PK. (2008) Single-molecule DNA sequencing technologies for future genomics research. Trends Biotechnol. 26:602-11
Metzger M (2009) Nature Reviews Genetics 11: 31-46
Helicos 105 to 140 Megabases per hour ~ 35 bp average read length
(2009) Volume 27: 847 Helicos, 28X coverage, 84 Gb 2.8M SNPs 752 CNVs
Ion Torrent Single-molecule sequencing
Single-molecule sequencing - + Gupta PK. (2008) Single-molecule DNA sequencing technologies for future genomics research. Trends Biotechnol. 26:602-11
Nanopore sequencing - + Gupta PK. (2008) Single-molecule DNA sequencing technologies for future genomics research. Trends Biotechnol. 26:602-11
Nanopore sequencing Gupta PK. (2008) Single-molecule DNA sequencing technologies for future genomics research. Trends Biotechnol. 26:602-11
Pacific Biosciences Single-molecule sequencing Eid et al 2008
detection volume 20 zeptoliters (10-21 liters). emission excitation ZMW: a hole, tens of nanometers in diameter, fabricated in a 100nm metal film deposited on a silicon dioxide substrate detection volume 20 zeptoliters (10-21 liters). PacBio technology backgrounder: http://www.pacificbiosciences.com/index.php?q=technology-introduction
PacBio technology backgrounder: http://www. pacificbiosciences
When the DNA polymerase encounters the nucleotide complementary to the next base in the template, it is incorporated into the growing DNA chain. During incorporation, the enzyme holds the nucleotide in the ZMWs detection volume for tens of milliseconds, orders of magnitude longer than the average diffusing nucleotide. The system detects this as a flash of bright light because the background is very low. The polymerase advances to the next base and the process continues to repeat PacBio technology backgrounder: http://www.pacificbiosciences.com/index.php?q=technology-introduction
multiple reads of the same molecule PacBio technology backgrounder: http://www.pacificbiosciences.com/index.php?q=technology-introduction
Eid J et al. (2009) Molecules Real-Time DNA Sequencing from Single Polymerase Molecules. Science 323, 133 PMID: 19023044
Does it work? 150 bp circular template ~93% raw accuracy 15x coverage 99.3% accuracy Eid et al., 2009
~ 2-5 bp/sec PacBio claims that, by 2013, the technology will be able to give a ‘raw’ human genome sequence in less than 3 min, and a complete high-quality sequence in 15 min. (http://www.bio-itworld.com/BioIT_Content.aspx?id=71746andamp;terms=Feb+12+2008+Pacific+Biosciences). Gupta PK. (2008) Single-molecule DNA sequencing technologies for future genomics research. Trends Biotechnol. 26:602-11
F. Sanger, S. Nicklen, and A. R. Coulson, Proc Natl Acad Sci U S A