The past, present, and future of DNA sequencing

Slides:



Advertisements
Similar presentations
In Silico Primer Design and Simulation for Targeted High Throughput Sequencing I519 – FALL 2010 Adam Thomas, Kanishka Jain, Tulip Nandu.
Advertisements

McGraw-Hill/Irwin Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 4 Future Value, Present Value and Interest Rates.
Next-Generation Sequencing: Methodology and Application
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Polymerase Chain Reaction (PCR)
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Year 6 mental test 5 second questions
£1 Million £500,000 £250,000 £125,000 £64,000 £32,000 £16,000 £8,000 £4,000 £2,000 £1,000 £500 £300 £200 £100 Welcome.
J J EOPARDY Lets Get Ready To Play Some.... Solve One Step Equations Solve Two Step Equations Solve Other Equations Word Problems to Equations
Created by Susan Neal $100 Fractions Addition Fractions Subtraction Fractions Multiplication Fractions Division General $200 $300 $400 $500 $100 $200.
Addition 1’s to 20.
25 seconds left…...
Equal or Not. Equal or Not
Slippery Slope
Week 1.
We will resume in: 25 Minutes.
Fundamentals of Cost Analysis for Decision Making
Next-generation sequencing
The past, present, and future of DNA sequencing Dan Russell.
The 454 and Ion PGM at the Genomics Core Facility Dr. Deborah Grove, Director for Genetic Analysis Genomics Core Facility Huck Institutes of the Life Sciences.
SEQUENCING-related topics 1. chain-termination sequencing 2. the polymerase chain reaction (PCR) 3. cycle sequencing 4. large scale sequencing stefanie.hartmann.
What Is Genomics? Genomics is the study of how the entire genome of a species functions as a unit and evolves over time. It is the study of life’s blueprint,
CS 6293 Advanced Topics: Current Bioinformatics
Genome Sequencing and Assembly High throughput Sequencing Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Update on Next-Generation Sequencing
Molecular Biology Dr. Chaim Wachtel April 4, 2013.
DNA Sequencing Today, laboratories routinely sequence the order of nucleotides in DNA. DNA sequencing is done to: Confirm the identity of genes isolated.
Recombinant DNA Technology for the non- science major.
Finishing the Human Genome
Analyzing your clone 1) FISH 2) “Restriction mapping” 3) Southern analysis : DNA 4) Northern analysis: RNA tells size tells which tissues or conditions.
From Haystacks to Needles AP Biology Fall Isolating Genes  Gene library: a collection of bacteria that house different cloned DNA fragments, one.
 It is the methods scientist use to study and manipulate DNA.  It made it possible for researchers to genetically alter organisms to give them more.
High Throughput Sequencing Methods and Concepts
Introduction to next generation sequencing Rolf Sommer Kaas.
MES Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.
Applications of DNA technology
Genomics – Next-Gen sequencing and Microarrays
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Bioinformatics and Sequencing Relevant to SolCAP
High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
Error model for massively parallel (454) DNA sequencing Sriram Raghuraman (working with Haixu Tang and Justin Choi)
Molecular Biology Dr. Chaim Wachtel May 28, 2015.
GENE SEQUENCING. INTRODUCTION CELL The cells contain the nucleus. The chromosomes are present within the nucleus.
Bioinformatics & Biotechnology Lecture 1 Sequencing BLAST PCR Gel Electrophoresis.
Chapter 10: Genetic Engineering- A Revolution in Molecular Biology.
Locating and sequencing genes
SEQUENCING DNA Jos. J. Schall Biology Department University of Vermont.
Title: Studying whole genomes Homework: learning package 14 for Thursday 21 June 2016.
Topic Cloning and analyzing oxalate degrading enzymes to see if they dissolve kidney stones with Dr. VanWert.
Cse587A/Bio 5747: L2 1/19/06 1 DNA sequencing: Basic idea Background: test tube DNA synthesis DNA polymerase (a natural enzyme) extends 2-stranded DNA.
DNA Sequencing First generation techniques
Next-generation sequencing technology
Next generation sequencing
Success criteria - PCR By the end of this lesson we will be know:
Microbial Genomes and techniques for studying them.
Sequencing technologies
DNA Sequencing -sayed Mohammad Amin Nourion -A’Kia Buford
DNA Technologies (Introduction)
Genomics Sequencing genomes.
Next-generation sequencing technology
Sequencing Technologies
DNA Sequencing The DNA from the genome is chopped into bits- whole chromosomes are too large to deal with, so the DNA is broken into manageably-sized overlapping.
ULTRASEQUENCING. Next Generation Sequencing: methods and applications.
DNA and the Genome Key Area 8a Genomic Sequencing.
High-Throughput Sequencing Technologies
High-Throughput Sequencing Technologies
Plant Biotechnology Lecture 2
Presentation transcript:

The past, present, and future of DNA sequencing Dan Russell

Overview Prologue: Assembly and Finishing The Past: Sanger The Present: Next-Gen (454, Illumina, …) The Future: ? (Nanopore, MinION, Single-molecule)

Overview Prologue: Assembly and Finishing The Past: Sanger The Present: Next-Gen (454, Illumina, …) The Future: ? (Nanopore, MinION, Single-molecule)

Bacteria: Several million bp Human: 3 billion bp Method Read Length Sanger 600-1000 bp 454 300-500 bp Illumina ~100 bp Ion Torrent ~200 bp But… Phage Genome: 30,000 to 500,000 bp Bacteria: Several million bp Human: 3 billion bp

Shotgun Genome Sequencing Fragmented genome chunks Complete genome copies

Shotgun Genome Sequencing Fragmented genome chunks Fragment sizes differ for different seq platforms. NOT REALLY DONE BY DUCK HUNTERS Hydroshearing, sonication, enzymatic shearing

All the King’s horses and all the King’s men… Assembly, aka All the King’s horses and all the King’s men… ATTGTTCCCACAGACCG CGGCGAAGCATTGTTCC ACCGTGTTTTCCGACCG AGCTCGATGCCGGCGAAG TTGTTCCCACAGACCGTG TTTCCGACCGAAATGGC ATGCCGGCGAAGCATTGT ACAGACCGTGTTTCCCGA TAATGCGACCTCGATGCC AAGCATTGTTCCCACAG TGTTTTCCGACCGAAAT TGCCGGCGAAGCCTTGT CCGACCGAAATGGCTCC

Your Sequencing Technology Recommended Assembler Dan’s recommended assemblers Your Sequencing Technology Recommended Assembler Sanger phredPhrap Ion Torrent/454 Newbler Illumina velvet REGARDLESS OF ASSEMBLY PROGRAM, I’D RECOMMEND USING CONSED FOR FINISHING!

THEORY FINISH Special use of the word “finish” Some words have special meanings in scientific context THEORY FINISH Before annotation, phage genomes should be sequenced AND finished.

What is finishing?

When we put all the reads back together this time: What is finishing? When we put all the reads back together this time: GAP! But now we at least know the sequence on each side, so we can design primers to run a sequencing reaction towards the gap, and hopefully connect our contigs.

What is finishing?

What is finishing? A combination of computer and wet-bench work to ensure that the entire genome sequence is present and that all bases are high quality.

From DNA to Annotatable Sequence Shotgun sequencing to generate reads Assembly of reads Identification of weak areas Targeted sequencing runs to fix Verification of finished sequence Generation of final fasta file Done for all phages sequenced at Pitt Done by most independent seq facilities NOT DONE by most seq facilities

From DNA to Annotatable Sequence Shotgun sequencing to generate reads Assembly of reads Identification of weak areas Targeted sequencing runs to fix Verification of finished sequence Generation of final fasta file Done for all phages sequenced at Pitt Done by most independent seq facilities NOT DONE by most seq facilities = “FINISHING”

Overview Prologue: Assembly and Finishing The Past: Sanger The Present: Next-Gen (454, Illumina, …) The Future: ? (Nanopore, MinION, Single-molecule)

Fragments were cloned:

One tube per 2 sequences with Sanger and cloning One tube per 2 sequences with Sanger and cloning. Not so bad if you only want 100 sequences. What if you want 1 million?

Sanger Sequencing Reactions For given template DNA, it’s like PCR except: Uses only a single primer and polymerase to make new ssDNA pieces. Includes regular nucleotides (A, C, G, T) for extension, but also includes dideoxy nucleotides. Dideoxy Nucleotides A T C G A G T C Regular Nucleotides Labeled Terminators

Sanger Sequencing T G C G C G G C C C A G T C T T G G G C T 5’ T G C G C G G C C C A Primer G T C T T G G G C T A C G C G C C G G G T C A G A A C C C G A T C G C G 5’ 3’

Sanger Sequencing T G C G C G G C C C A G T C T T G G G C T A G C G C 5’ T G C G C G G C C C A Primer G T C T T G G G C T A G C G C A C G C G C C G G G T C A G A A C C C G A T C G C G 5’ 3’ G T C T T G G G C T 5’ T G C G C G G C C C A 21 bp

Sanger Sequencing T G C G C G G C C C A G T C T T G G G C T A 5’ T G C G C G G C C C A Primer G T C T T G G G C T A A C G C G C C G G G T C A G A A C C C G A T C G C G 5’ 3’ G T C T T G G G C T 5’ T G C G C G G C C C A 21 bp 5’ T G C G C G G C C C A G T C T T G G G C T A G C G C 26 bp

Sanger Sequencing T G C G C G G C C C A G 5’ T G C G C G G C C C A Primer G A C G C G C C G G G T C A G A A C C C G A T C G C G 5’ 3’ G T C T T G G G C T 5’ T G C G C G G C C C A 21 bp 5’ T G C G C G G C C C A G T C T T G G G C T A G C G C 26 bp 5’ T G C G C G G C C C A G T C T T G G G C T A 22 bp

Sanger Sequencing T G C G C G G C C C A G T C T T G G G C 5’ T G C G C G G C C C A Primer G T C T T G G G C A C G C G C C G G G T C A G A A C C C G A T C G C G 5’ 3’ G T C T T G G G C T 5’ T G C G C G G C C C A 21 bp 5’ T G C G C G G C C C A G T C T T G G G C T A G C G C 26 bp 5’ T G C G C G G C C C A G T C T T G G G C T A 22 bp 5’ T G C G C G G C C C A G 12 bp

Sanger Sequencing T G C G C G G C C C A G T C T T 5’ T G C G C G G C C C A Primer G T C T T A C G C G C C G G G T C A G A A C C C G A T C G C G 5’ 3’ G T C T T G G G C T 5’ T G C G C G G C C C A 21 bp 5’ T G C G C G G C C C A G T C T T G G G C T A G C G C 26 bp 5’ T G C G C G G C C C A G T C T T G G G C T A 22 bp 5’ T G C G C G G C C C A G 12 bp 5’ T G C G C G G C C C A G T C T T G G G C 20 bp

Sanger Sequencing A C G C G C C G G G T C A G A A C C C G A T C G C G 5’ 3’ G T C T T G G G C T 5’ T G C G C G G C C C A 21 bp 5’ T G C G C G G C C C A G T C T T G G G C T A G C G C 26 bp 5’ T G C G C G G C C C A G T C T T G G G C T A 22 bp 5’ T G C G C G G C C C A G 12 bp 5’ T G C G C G G C C C A G T C T T G G G C 20 bp 5’ T G C G C G G C C C A G T C T T 16 bp

Sanger Sequencing A C G C G C C G G G T ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 5’ 3’ G T C T T G G G C T 5’ T G C G C G G C C C A 21 bp 5’ T G C G C G G C C C A G T C T T G G G C T A G C G C 26 bp 5’ T G C G C G G C C C A G T C T T G G G C T A 22 bp Has to be done in a single tube per rxn. 5’ T G C G C G G C C C A G 12 bp 5’ T G C G C G G C C C A G T C T T G G G C 20 bp 5’ T G C G C G G C C C A G T C T T 16 bp

Sanger Sequencing T G C G C G G C C C A G T C T G C G C G G C C C A Laser Reader 5’ T G C G C G G C C C A G T C 14 bp 5’ T G C G C G G C C C A G T C T 15 bp 5’ T G C G C G G C C C A G T C T T G G 18 bp 5’ T G C G C G G C C C A G T C T T 16 bp 5’ T G C G C G G C C C A G T C T T G 17 bp 5’ T G C G C G G C C C A G T 13 bp 5’ T G C G C G G C C C A G T C T T G G G C T A 22 bp G T C T T G G G C T 5’ T G C G C G G C C C A 21 bp 5’ T G C G C G G C C C A G T C T T G G G C 20 bp 5’ T G C G C G G C C C A G 12 bp 5’ T G C G C G G C C C A G T C T T G G G 19 bp

Sanger Sequencing Output Each sequencing reaction gives us a chromatogram, usually ~600-1000 bp:

Sanger Throughput Limitations Must have 1 colony picked for every 2 reactions Must have 1 PCR tube for each reaction Must have 1 capillary for each reaction Improvements in cost from making Sanger higher throughput Improvements in cost from Next-Gen sequencing technologies from The Economist

Overview Prologue: Assembly and Finishing The Past: Sanger The Present: Next-Gen (454, Illumina, …) The Future: ? (Nanopore, MinION, Single-molecule)

Shotgun sequencing by Ion Torrent Personal Genome Machine and 454

Shotgun sequencing by PGM/454 Genomic Fragment Adapters

Shotgun sequencing by PGM/454 Genomic Fragment Barcode

Shotgun sequencing by PGM/454

Shotgun sequencing by PGM/454 Bead/ISP Adapter Complement Sequences The idea is that each bead should be amplified all over with a SINGLE library fragment.

Shotgun sequencing by PGM/454

Shotgun sequencing by PGM/454

Shotgun sequencing by PGM/454

Shotgun sequencing by PGM/454

Shotgun sequencing by PGM/454

Shotgun sequencing by PGM/454

Shotgun sequencing by PGM/454

Shotgun sequencing by PGM/454

Shotgun sequencing by PGM/454

Shotgun sequencing by PGM/454

Shotgun sequencing by PGM/454

Shotgun sequencing by PGM/454 ~3.5 µm for Ion Torrent, ~30 µm for 454

Shotgun sequencing by PGM/454 Only give polymerase one nucleotide at a time: T 5’ T G C G C G G C C C A Primer A C G C G C C G G G T C A G A A C C C G A T C G C G 3’ 5’ If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G 1 2 3 4 5

Shotgun sequencing by PGM/454 Only give polymerase one nucleotide at a time: C 5’ T G C G C G G C C C A Primer A C G C G C C G G G T C A G A A C C C G A T C G C G 3’ 5’ If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G 1 2 3 4 5

Shotgun sequencing by PGM/454 Only give polymerase one nucleotide at a time: A 5’ T G C G C G G C C C A Primer A C G C G C C G G G T C A G A A C C C G A T C G C G 3’ 5’ If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G 1 2 3 4 5

Shotgun sequencing by PGM/454 Only give polymerase one nucleotide at a time: G 5’ T G C G C G G C C C A Primer G A C G C G C C G G G T C A G A A C C C G A T C G C G 3’ 5’ If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G 1 2 3 4 5

Shotgun sequencing by PGM/454 Only give polymerase one nucleotide at a time: T 5’ T G C G C G G C C C A Primer G T A C G C G C C G G G T C A G A A C C C G A T C G C G 3’ 5’ If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G 1 2 3 4 5

Shotgun sequencing by PGM/454 Only give polymerase one nucleotide at a time: C 5’ T G C G C G G C C C A Primer G T C A C G C G C C G G G T C A G A A C C C G A T C G C G 3’ 5’ If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G 1 2 3 4 5

Shotgun sequencing by PGM/454 Only give polymerase one nucleotide at a time: A 5’ T G C G C G G C C C A Primer G T C A C G C G C C G G G T C A G A A C C C G A T C G C G 3’ 5’ If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G 1 2 3 4 5

Shotgun sequencing by PGM/454 Only give polymerase one nucleotide at a time: G 5’ T G C G C G G C C C A Primer G T C A C G C G C C G G G T C A G A A C C C G A T C G C G 3’ 5’ If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G 1 2 3 4 5

Shotgun sequencing by PGM/454 Only give polymerase one nucleotide at a time: T 5’ T G C G C G G C C C A Primer G T C T T A C G C G C C G G G T C A G A A C C C G A T C G C G 3’ 5’ If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G 1 2 3 4 5

Shotgun sequencing by PGM/454 Only give polymerase one nucleotide at a time: C 5’ T G C G C G G C C C A Primer G T C T T A C G C G C C G G G T C A G A A C C C G A T C G C G 3’ 5’ If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G 1 2 3 4 5

Shotgun sequencing by PGM/454 Only give polymerase one nucleotide at a time: A 5’ T G C G C G G C C C A Primer G T C T T A C G C G C C G G G T C A G A A C C C G A T C G C G 3’ 5’ If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G 1 2 3 4 5

Shotgun sequencing by PGM/454 Only give polymerase one nucleotide at a time: G 5’ T G C G C G G C C C A Primer G T C T T G G G A C G C G C C G G G T C A G A A C C C G A T C G C G 3’ 5’ If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G 1 2 3 4 5 The real power of this method is that it can take place in millions of tiny wells in a single plate at once.

Raw 454 data Only give polymerase one nucleotide at a time: 5’ T G C G C G G C C C A Primer G T C T T G G G A C G C G C C G G G T C A G A A C C C G A T C G C G 3’ 5’ If that nucleotide is incorporated, enzymes turn by-products into light: T C A G T C A G T C A G 1 2 3 4 5 The real power of this method is that it can take place in millions of tiny wells in a single plate at once.

Ion Torrent Sequencing

Illumina Sequencing

Next-Gen Sequencing Take home message: Massively Parallel 1,000 monkeys at 1,000 typewriters is nothing We’re talking 100,000 to 100 million concurrent reads

Overview Prologue: Assembly and Finishing The Past: Sanger The Present: Next-Gen (454, Illumina, …) The Future: ? (Nanopore, MinION, Single-molecule)

Largely because of PHIRE and SEA-PHAGES…

DNA Sequencing over Time Amazing growth in info and concurrent drop in price. Story about 1 base thesis. Now 1/1000 cent per base. from The Economist

Single Molecule Sequencing

“The MinION has been used to successfully read the genome of a lambda bacteriophage, which has 48,500-ish base pairs, twice during one pass. That's impressive, because reading 100,000 base pairs during a single DNA capture has never been managed before using traditional sequencing techniques. The operational life of the MinION is only about six hours, but during that time it can read more than 150 million base pairs. That's somewhat short of the larger human chromosomes (which contain up to 250 million base pairs), but Oxford Nanopore has also introduced GridION -- a platform where multiple cartridges can be clustered together. The company reckon that a 20-node GridION setup can sequence a complete human genome in just 15 minutes.” —Wired

Epilogue So should we really still be sequencing more mycobacteriophage genomes? We have 250+…

Cluster A vs. Cluster B Mycobacteriophages At the DNA level… Chimps vs. Humans > 95% similar Cluster A vs. Cluster B Mycobacteriophages < 50% similar …but that’s just one pair of clusters, how many are there?

DNA Sequencing over Time Amazing growth in info and concurrent drop in price. Story about 1 base thesis. Now 1/1000 cent per base. from The Economist

Comparing Different Technologies Sanger Sequencing Advantages Disadvantages Lowest error rate Long read length (~750 bp) Can target a primer High cost per base Long time to generate data Need for cloning Amount of data per run

Comparing Different Technologies 454 Sequencing Advantages Disadvantages Low error rate Medium read length (~400-600 bp) Relatively high cost per base Must run at large scale Medium/high startup costs

Comparing Different Technologies Ion Torrent Sequencing Advantages Disadvantages Low startup costs Scalable (10 – 1000 Mb of data per run) Medium/low cost per base Low error rate Fast runs (<3 hours) New, developing technology Cost not as low as Illumina Read lengths only ~100-200 bp so far

Comparing Different Technologies Illumina Sequencing Advantages Disadvantages Low error rate Lowest cost per base Tons of data Must run at very large scale Short read length (50-75 bp) Runs take multiple days High startup costs De Novo assembly difficult

Comparing Different Technologies PacBio Sequencing Advantages Disadvantages Can use single molecule as template Potential for very long reads (several kb+) High error rate (~10-15%) Medium/high cost per base High startup costs