The past, present, and future of DNA sequencing Dan Russell.

Slides:



Advertisements
Similar presentations
Next-Generation Sequencing: Methodology and Application
Advertisements

The past, present, and future of DNA sequencing
The Past, Present, and Future of DNA Sequencing
Next-generation sequencing
The 454 and Ion PGM at the Genomics Core Facility Dr. Deborah Grove, Director for Genetic Analysis Genomics Core Facility Huck Institutes of the Life Sciences.
SEQUENCING-related topics 1. chain-termination sequencing 2. the polymerase chain reaction (PCR) 3. cycle sequencing 4. large scale sequencing stefanie.hartmann.
Microbial Genetics As we have looked at, information for all cellular function is from DNA, mRNA carries that info to ribosomes, rRNA codes for proteins.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
The SOLiD System: Next-Generation Sequencing Overview of the SOLiD System –  Scalable  Accurate Ultra High Throughput  Flexible  Mate Pairs.
Emily Buckhouse. Nitrogenous Bases Nucleosides  Base linked to a 2-deoxy-D-ribose at 1’ carbon Nucleotides Nucleosides with a phosphate at 5’ carbon.
CS 6293 Advanced Topics: Current Bioinformatics
Update on Next-Generation Sequencing
Molecular Biology Dr. Chaim Wachtel April 4, 2013.
DNA Sequencing Today, laboratories routinely sequence the order of nucleotides in DNA. DNA sequencing is done to: Confirm the identity of genes isolated.
Finishing the Human Genome
De-novo Assembly Day 4.
Analyzing your clone 1) FISH 2) “Restriction mapping” 3) Southern analysis : DNA 4) Northern analysis: RNA tells size tells which tissues or conditions.
From Haystacks to Needles AP Biology Fall Isolating Genes  Gene library: a collection of bacteria that house different cloned DNA fragments, one.
Chemistry and Biology of DNA in the Age of Personalized Medicines Dr. Ishtiaq Ahmad Khan Assistant Professor Dr. Panjwani Center for Molecular Medicine.
High Throughput Sequencing Methods and Concepts
Introduction to next generation sequencing Rolf Sommer Kaas.
MES Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.
Genomics – Next-Gen sequencing and Microarrays
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Ion Torrent and Minion Relatively low cost ‘next generation’ sequencing Wendy Smith School of Computing Science, Alan Ward Newcastle University, UK.
Bioinformatics and Sequencing Relevant to SolCAP
High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
Error model for massively parallel (454) DNA sequencing Sriram Raghuraman (working with Haixu Tang and Justin Choi)
Molecular Biology Dr. Chaim Wachtel May 28, 2015.
Genomics.
GENE SEQUENCING. INTRODUCTION CELL The cells contain the nucleus. The chromosomes are present within the nucleus.
Bioinformatics & Biotechnology Lecture 1 Sequencing BLAST PCR Gel Electrophoresis.
DNA Sequencing.
Ultra-High Throughput DNA Sequencing on the 454/Roche GS-FLX
Chapter 10: Genetic Engineering- A Revolution in Molecular Biology.
Locating and sequencing genes
SEQUENCING DNA Jos. J. Schall Biology Department University of Vermont.
De Novo Genome Assembly - Introduction
FOOTHILL HIGH SCHOOL SCIENCE DEPARTMENT Chapter 13 Genetic Engineering Section 13-2 Manipulating DNA.
Third Generation Sequencing. Today Illumina – Solexa sequencing technology 454 Life sciences – 454 sequencer Applied Biosystem – SOLiD system Tomorrow.
 Cell that does no have a nucleus or other membrane bound organelles.
CAMPBELL BIOLOGY Reece Urry Cain Wasserman Minorsky Jackson © 2014 Pearson Education, Inc. TENTH EDITION CAMPBELL BIOLOGY Reece Urry Cain Wasserman Minorsky.
Title: Studying whole genomes Homework: learning package 14 for Thursday 21 June 2016.
Topic Cloning and analyzing oxalate degrading enzymes to see if they dissolve kidney stones with Dr. VanWert.
Introduction to Illumina Sequencing
Cse587A/Bio 5747: L2 1/19/06 1 DNA sequencing: Basic idea Background: test tube DNA synthesis DNA polymerase (a natural enzyme) extends 2-stranded DNA.
DNA Sequencing First generation techniques
Next-generation sequencing technology
Research Techniques Made Simple: Next-Generation Sequencing:
Next generation sequencing
Microbial Genomes and techniques for studying them.
Sequencing technologies
DNA Sequencing -sayed Mohammad Amin Nourion -A’Kia Buford
DNA Technologies (Introduction)
Genomics Sequencing genomes.
Recombinant DNA Technology I
Next-generation sequencing technology
Sequencing technology and assembly
DNA Sequencing.
Genetic Research and Biotechnology
Sequencing Technologies
DNA Sequencing The DNA from the genome is chopped into bits- whole chromosomes are too large to deal with, so the DNA is broken into manageably-sized overlapping.
The student is expected to: (6H) describe how techniques such as DNA fingerprinting, genetic modifications, and chromosomal analysis are used to study.
ULTRASEQUENCING. Next Generation Sequencing: methods and applications.
DNA and the Genome Key Area 8a Genomic Sequencing.
High-Throughput Sequencing Technologies
High-Throughput Sequencing Technologies
Introduction to Sequencing
BF nd (Next) Generation Sequencing
Presentation transcript:

The past, present, and future of DNA sequencing Dan Russell

The past, present, and future of DNA sequencing* Dan Russell *DNA sequencing: Determining the number and order of nucleotides that make up a given molecule of DNA.

(Relevant) Trivia How many base pairs (bp) are there in a human genome? How much did it cost to sequence the first human genome? How long did it take to sequence the first human genome? When was the first human genome sequence complete? Whose genome was it?

How many base pairs (bp) are there in a human genome? How much did it cost to sequence the first human genome? How long did it take to sequence the first human genome? When was the first human genome sequence complete? Whose genome was it? ~3 billion (haploid) ~$2.7 billion ~13 years Several people’s, but actually mostly a dude from Buffalo (Relevant) Trivia

Overview Prologue: Assembly The Past: Sanger The Present: Next-Gen (454, Illumina, …) The Future: ? (Nanopore, MinION, Single-molecule)

Overview Prologue: Assembly The Past: Sanger The Present: Next-Gen (454, Illumina, …) The Future: ? (Nanopore, MinION, Single-molecule)

MethodRead Length Sanger 454 Illumina Ion Torrent

MethodRead Length Sanger bp 454 Illumina Ion Torrent

MethodRead Length Sanger bp bp Illumina Ion Torrent

MethodRead Length Sanger bp bp Illumina~100 bp Ion Torrent

MethodRead Length Sanger bp bp Illumina~100 bp Ion Torrent~200 bp But… Phage Genome: 30,000 to 500,000 bp Bacteria: Several million bp Human: 3 billion bp

Shotgun Genome Sequencing Complete genome copiesFragmented genome chunks

Shotgun Genome Sequencing Fragmented genome chunks NOT REALLY DONE BY DUCK HUNTERS Hydroshearing, sonication, enzymatic shearing NOT REALLY DONE BY DUCK HUNTERS Hydroshearing, sonication, enzymatic shearing

All the King’s horses and all the King’s men… ATTGTTCCCACAGAC CG CGGCGAAGCATTGT TCC ACCGTGTTTTCCGA CCG TTTCCGACCGAAATG GC TTGTTCCCACAGACC GTG AGCTCGATGCCGGCG AAG ATGCCGGCGAAGCAT TGT TAATGCGACCTCGATG CC ACAGACCGTGTTTCC CGA AAGCATTGTTCCCAC AG TGTTTTCCGACCGAA AT CCGACCGAAATGGC TCC TGCCGGCGAAGCCT TGT Assembly, aka 17 bp 66 bp

ATTGTTCCCACAGAC CG CGGCGAAGCATTGT TCC ACCGTGTTTTCCGA CCG TTTCCGACCGAAATG GC TTGTTCCCACAGACC GTG AGCTCGATGCCGGCG AAG ATGCCGGCGAAGCAT TGT TAATGCGACCTCGATG CC ACAGACCGTGTTTCC CGA AAGCATTGTTCCCAC AG TGTTTTCCGACCGAA AT CCGACCGAAATGGC TCC TGCCGGCGAAGCCT TGT Assembly TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGA AATGGCTCC Consensus:

ATTGTTCCCACAGAC CG CGGCGAAGCATTGT TCC ACCGTGTTTTCCGA CCG TTTCCGACCGAAATG GC TTGTTCCCACAGACC GTG AGCTCGATGCCGGCG AAG ATGCCGGCGAAGCAT TGT TAATGCGACCTCGATG CC ACAGACCGTGTTTCC CGA AAGCATTGTTCCCAC AG TGTTTTCCGACCGAA AT CCGACCGAAATGGC TCC TGCCGGCGAAGCCT TGT Assembly TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGA AATGGCTCC Consensus: Coverage: # of reads underlying the consensus

ATTGTTCCCACAGAC CG CGGCGAAGCATTGT TCC ACCGTGTTTTCCGA CCG TTTCCGACCGAAATG GC TTGTTCCCACAGACC GTG AGCTCGATGCCGGCG AAG ATGCCGGCGAAGCAT TGT TAATGCGACCTCGATG CC ACAGACCGTGTTTCC CGA AAGCATTGTTCCCAC AG TGTTTTCCGACCGAA AT CCGACCGAAATGGC TCC TGCCGGCGAAGCCT TGT Assembly TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGA AATGGCTCC Consensus: 6x coverage 100% identity Coverage: # of reads underlying the consensus

ATTGTTCCCACAGAC CG CGGCGAAGCATTGT TCC ACCGTGTTTTCCGA CCG TTTCCGACCGAAATG GC TTGTTCCCACAGACC GTG AGCTCGATGCCGGCG AAG ATGCCGGCGAAGCAT TGT TAATGCGACCTCGATG CC ACAGACCGTGTTTCC CGA AAGCATTGTTCCCAC AG TGTTTTCCGACCGAA AT CCGACCGAAATGGC TCC TGCCGGCGAAGCCT TGT Assembly TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGA AATGGCTCC Consensus: 5x coverage 80% identity Coverage: # of reads underlying the consensus

ATTGTTCCCACAGAC CG CGGCGAAGCATTGT TCC ACCGTGTTTTCCGA CCG TTTCCGACCGAAATG GC TTGTTCCCACAGACC GTG AGCTCGATGCCGGCG AAG ATGCCGGCGAAGCAT TGT TAATGCGACCTCGATG CC ACAGACCGTGTTTCC CGA AAGCATTGTTCCCAC AG TGTTTTCCGACCGAA AT CCGACCGAAATGGC TCC TGCCGGCGAAGCCT TGT Assembly TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGA AATGGCTCC Consensus: 2x coverage 50% identity Coverage: # of reads underlying the consensus

ATTGTTCCCACAGAC CG CGGCGAAGCATTGT TCC ACCGTGTTTTCCGA CCG TTTCCGACCGAAATG GC TTGTTCCCACAGACC GTG AGCTCGATGCCGGCG AAG ATGCCGGCGAAGCAT TGT TAATGCGACCTCGATG CC ACAGACCGTGTTTCC CGA AAGCATTGTTCCCAC AG TGTTTTCCGACCGAA AT CCGACCGAAATGGC TCC TGCCGGCGAAGCCT TGT Assembly TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGA AATGGCTCC Consensus: 1x coverage Coverage: # of reads underlying the consensus

Assembly

Overview Prologue: Assembly The Past: Sanger The Present: Next-Gen (454, Illumina, …) The Future: ? (Nanopore, MinION, Single-molecule)

Fragments were cloned:

x millions

Sanger Sequencing Reactions For given template DNA, it’s like PCR except: Uses only a single primer and polymerase to make new ssDNA pieces. Includes regular nucleotides (A, C, G, T) for extension, but also includes dideoxy nucleotides. A A A A A A A G A T C C C C C C C T T T T T G G G G G G Regular Nucleotides Dideoxy Nucleotides A A A A AT C C C T T T T G G G G G 1.Labeled 2.Terminators

Sanger Sequencing 5’ T G C G C G G C C C A Primer A C G C G C C G G G T ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 5’3’

Sanger Sequencing A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’ 5’ T G C G C G G C C C A Primer G T C T T G G G C T

Sanger Sequencing G T C T T G G G C T A G C G C A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’ 5’ T G C G C G G C C C A Primer G T C T T G G G C T 5’ T G C G C G G C C C A 21 bp

Sanger Sequencing A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’ G T C T T G G G C T A G C G C 5’ T G C G C G G C C C A G T C T T G G G C T 5’ T G C G C G G C C C A 21 bp 26 bp 5’ T G C G C G G C C C A Primer G T C T T G G G C T A

Sanger Sequencing A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’ G T C T T G G G C T A G C G C 5’ T G C G C G G C C C A G T C T T G G G C T 5’ T G C G C G G C C C A 21 bp 26 bp 5’ T G C G C G G C C C A G T C T T G G G C T A 22 bp 5’ T G C G C G G C C C A Primer G

Sanger Sequencing A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’ G T C T T G G G C T A G C G C 5’ T G C G C G G C C C A G T C T T G G G C T 5’ T G C G C G G C C C A 21 bp 26 bp 5’ T G C G C G G C C C A G T C T T G G G C T A 22 bp 5’ T G C G C G G C C C A G 12 bp 5’ T G C G C G G C C C A Primer G T C T T G G G C

Sanger Sequencing A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’ G T C T T G G G C T A G C G C 5’ T G C G C G G C C C A G T C T T G G G C T 5’ T G C G C G G C C C A 21 bp 26 bp 5’ T G C G C G G C C C A G T C T T G G G C T A 22 bp 5’ T G C G C G G C C C A G 12 bp 5’ T G C G C G G C C C A G T C T T G G G C 20 bp 5’ T G C G C G G C C C A Primer G T C T T

Sanger Sequencing A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’ G T C T T G G G C T A G C G C 5’ T G C G C G G C C C A G T C T T G G G C T 5’ T G C G C G G C C C A 21 bp 26 bp 5’ T G C G C G G C C C A G T C T T G G G C T A 22 bp 5’ T G C G C G G C C C A G 12 bp 5’ T G C G C G G C C C A G T C T T G G G C 20 bp 5’ T G C G C G G C C C A G T C T T 16 bp

Sanger Sequencing A C G C G C C G G G T ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 5’3’ ? ? ? ? ? ? ? ? ? ? ? ? ? ? C 5’ T G C G C G G C C C A ? ? ? ? ? ? ? ? ? T 5’ T G C G C G G C C C A 21 bp 26 bp 5’ T G C G C G G C C C A ? ? ? ? ? ? ? ? ? ? A 22 bp 5’ T G C G C G G C C C A G 12 bp 5’ T G C G C G G C C C A ? ? ? ? ? ? ? ? C 20 bp 5’ T G C G C G G C C C A ? ? ? ? T 16 bp

5’ T G C G C G G C C C A G T C T T G G G 19 bp 5’ T G C G C G G C C C A G T C T T G G G C T A 22 bp Sanger Sequencing G T C T T G G G C T 5’ T G C G C G G C C C A 21 bp 5’ T G C G C G G C C C A G T C T T G G G C 20 bp 5’ T G C G C G G C C C A G 12 bp 5’ T G C G C G G C C C AG T 13 bp 5’ T G C G C G G C C C A G T C T T 16 bp 5’ T G C G C G G C C C AG T C 14 bp 5’ T G C G C G G C C C A G T C T 15 bp 5’ T G C G C G G C C C A G T C T T G 17 bp 5’ T G C G C G G C C C A G T C T T G G 18 bp Laser Reader

Sanger Sequencing Output Each sequencing reaction gives us a chromatogram, usually ~ bp:

Sanger Throughput Limitations Must have 1 colony picked for every 2 reactions Must do 1 DNA prep for every 2 reactions Must have 1 PCR tube for each reaction Must have 1 gel lane for each reaction from The Economist

Overview Prologue: Assembly The Past: Sanger The Present: Next-Gen (454, Illumina, …) The Future: ? (Nanopore, MinION, Single-molecule)

Shotgun sequencing by Ion Torrent Personal Genome Machine and 454

Genomic Fragment Adapters Shotgun sequencing by PGM/454

Genomic Fragment Barcode

Shotgun sequencing by PGM/454

Bead/ISP Adapter Complement Sequences The idea is that each bead should be amplified all over with a SINGLE library fragment.

Shotgun sequencing by PGM/454 Problem: How do I do PCR to amplify the fragments without having to use 1 tube for each reaction?

Shotgun sequencing by PGM/454

~3.5 µm for Ion Torrent, ~30 µm for 454

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’ 5’ T G C G C G G C C C A Primer Only give polymerase one nucleotide at a time: If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G T T T T T Shotgun sequencing by PGM/454

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’ 5’ T G C G C G G C C C A Primer Only give polymerase one nucleotide at a time: If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G A A A A A Shotgun sequencing by PGM/454

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’ 5’ T G C G C G G C C C A Primer Only give polymerase one nucleotide at a time: If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G G G G G G G Shotgun sequencing by PGM/454

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’ 5’ T G C G C G G C C C A Primer Only give polymerase one nucleotide at a time: If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G G T T T T T T Shotgun sequencing by PGM/454

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’ 5’ T G C G C G G C C C A Primer Only give polymerase one nucleotide at a time: If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G GT C C C C C C Shotgun sequencing by PGM/454

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’ 5’ T G C G C G G C C C A Primer Only give polymerase one nucleotide at a time: If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G GT C A A A A A Shotgun sequencing by PGM/454

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’ 5’ T G C G C G G C C C A Primer Only give polymerase one nucleotide at a time: If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G GT C T T T T T T Shotgun sequencing by PGM/454

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’ 5’ T G C G C G G C C C A Primer Only give polymerase one nucleotide at a time: If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G GT CT G G G G G G G G The real power of this method is that it can take place in millions of tiny wells in a single plate at once. Shotgun sequencing by PGM/454

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’ 5’ T G C G C G G C C C A Primer Only give polymerase one nucleotide at a time: If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G GT CT G G G G G G G G The real power of this method is that it can take place in millions of tiny wells in a single plate at once. Raw 454 data

Ion Torrent Sequencing

Illumina Sequencing

Next-Gen Sequencing Take home message: Massively Parallel 1,000 monkeys at 1,000 typewriters is nothing We’re talking 100,000 to 100 million concurrent reads

Overview Prologue: Assembly The Past: Sanger The Present: Next-Gen (454, Illumina, …) The Future: ? (Nanopore, MinION, Single-molecule)

Largely because of PHIRE and SEA-PHAGES…

DNA Sequencing over Time from The Economist

Single Molecule Sequencing

“The MinION has been used to successfully read the genome of a lambda bacteriophage, which has 48,500-ish base pairs, twice during one pass. That's impressive, because reading 100,000 base pairs during a single DNA capture has never been managed before using traditional sequencing techniques. The operational life of the MinION is only about six hours, but during that time it can read more than 150 million base pairs. That's somewhat short of the larger human chromosomes (which contain up to 250 million base pairs), but Oxford Nanopore has also introduced GridION -- a platform where multiple cartridges can be clustered together. The company reckon that a 20-node GridION setup can sequence a complete human genome in just 15 minutes.”MinION —Wired

How many base pairs (bp) are there in a human genome? How much did it cost to sequence the first human genome? How long did it take to sequence the first human genome? When was the first human genome sequence complete? Whose genome was it? ~3 billion (haploid) ~$2.7 billion ~13 years Several people’s, but actually mostly a dude from Buffalo (Relevant) Trivia

Final Thoughts DNA sequencing is becoming vastly faster and more affordable Generating data is no longer the bottleneck, understanding it is Bioinformatics types should be in high demand in the near future

Epilogue So should we really still be sequencing more mycobacteriophage genomes? We have 250+…

Chimps vs. Humans Cluster A vs. Cluster B Mycobacteriophages At the DNA level… > 95% similar < 50% similar …but that’s just one pair of clusters, how many are there?

DNA Sequencing over Time from The Economist

Comparing Different Technologies AdvantagesDisadvantages Lowest error rate Long read length (~750 bp) Can target a primer High cost per base Long time to generate data Need for cloning Amount of data per run Sanger Sequencing

Comparing Different Technologies AdvantagesDisadvantages Low error rate Medium read length (~ bp) Relatively high cost per base Must run at large scale Medium/high startup costs 454 Sequencing

Comparing Different Technologies AdvantagesDisadvantages Low startup costs Scalable (10 – 1000 Mb of data per run) Medium/low cost per base Low error rate Fast runs (<3 hours) New, developing technology Cost not as low as Illumina Read lengths only ~ bp so far Ion Torrent Sequencing

Comparing Different Technologies AdvantagesDisadvantages Low error rate Lowest cost per base Tons of data Must run at very large scale Short read length (50-75 bp) Runs take multiple days High startup costs De Novo assembly difficult Illumina Sequencing

Comparing Different Technologies AdvantagesDisadvantages Can use single molecule as template Potential for very long reads (several kb+) High error rate (~10-15%) Medium/high cost per base High startup costs PacBio Sequencing