High Throughput Sequencing Methods and Concepts

Slides:



Advertisements
Similar presentations
13-2 Manipulating DNA.
Advertisements

Canadian Bioinformatics Workshops
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
What Is Genomics? Genomics is the study of how the entire genome of a species functions as a unit and evolves over time. It is the study of life’s blueprint,
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
7.1 cont’d: Sanger Sequencing SBI4UP MRS. FRANKLIN.
CS 6293 Advanced Topics: Current Bioinformatics
The polymerase chain reaction (PCR) rapidly
Fundamentals of Forensic DNA Typing Slides prepared by John M. Butler June 2009 Chapter 7 DNA Amplification.
Update on Next-Generation Sequencing
DNA Sequencing Today, laboratories routinely sequence the order of nucleotides in DNA. DNA sequencing is done to: Confirm the identity of genes isolated.
Biotechnology and Recombinant DNA
Analyzing your clone 1) FISH 2) “Restriction mapping” 3) Southern analysis : DNA 4) Northern analysis: RNA tells size tells which tissues or conditions.
6.3 Advanced Molecular Biological Techniques 1. Polymerase chain reaction (PCR) 2. Restriction fragment length polymorphism (RFLP) 3. DNA sequencing.
From Haystacks to Needles AP Biology Fall Isolating Genes  Gene library: a collection of bacteria that house different cloned DNA fragments, one.
 It is the methods scientist use to study and manipulate DNA.  It made it possible for researchers to genetically alter organisms to give them more.
Announcements Lab notebooks due Monday by 5 No Ch. 9 Part 2 homework
NEXT – GEN SEQUENCING TECHNIQUES
What do these terms mean to you? You have 5 min to discuss possible meanings and examples with your group! DNA sequencing DNA profiling/fingerprinting.
 It is the methods scientist use to study and manipulate DNA.  It made it possible for researchers to genetically alter organisms to give them more.
High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
13-1 Changing the Living World
A Sequenciação em Análises Clínicas Polymerase Chain Reaction.
CHAPTER 7 DNA SEQUENCING - INTRODUCTION - SANGER DIDEOXY METHOD - AUTOMATED SEQUENCING - NEXT GENERATION OF SEQUENCING METHODS MISS NUR SHALENA SOFIAN.
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
Success criteria - PCR By the end of this lesson we will be able to: 1. The polymerase chain reaction (PCR) is a technique for the amplification ( making.
PHYSICAL MAPPING AND POSITIONAL CLONING. Linkage mapping – Flanking markers identified – 1cM, for example Probably ~ 1 MB or more in humans Need very.
Molecular Testing and Clinical Diagnosis
Polymerase Chain Reaction (PCR)
GENE SEQUENCING. INTRODUCTION CELL The cells contain the nucleus. The chromosomes are present within the nucleus.
Bioinformatics & Biotechnology Lecture 1 Sequencing BLAST PCR Gel Electrophoresis.
6.3 Advanced Molecular Biological Techniques 1. Polymerase chain reaction (PCR) 2. Restriction fragment length polymorphism (RFLP) 3. DNA sequencing.
Chapter 10: Genetic Engineering- A Revolution in Molecular Biology.
Locating and sequencing genes
Advantages of STR Analysis
1 PCR: identification, amplification, or cloning of DNA through DNA synthesis DNA synthesis, whether PCR or DNA replication in a cell, is carried out by.
FOOTHILL HIGH SCHOOL SCIENCE DEPARTMENT Chapter 13 Genetic Engineering Section 13-2 Manipulating DNA.
Semiconservative DNA replication Each strand of DNA acts as a template for synthesis of a new strand Daughter DNA contains one parental and one newly synthesized.
DNA Sequencing Hunter Jones, Mitchell Gage. What’s the point? In a process similar to PCR, DNA sequencing uses a mixture of temperature changes, enzymes.
Introduction to PCR Polymerase Chain Reaction
Introduction to Illumina Sequencing
Cse587A/Bio 5747: L2 1/19/06 1 DNA sequencing: Basic idea Background: test tube DNA synthesis DNA polymerase (a natural enzyme) extends 2-stranded DNA.
DNA Sequencing First generation techniques
Next-generation sequencing technology
Introduction to PCR Polymerase Chain Reaction
DNA Sequencing Second generation techniques
Part 3 Gene Technology & Medicine
Success criteria - PCR By the end of this lesson we will be know:
Sequencing Introduction
Sequencing technologies
DNA Sequencing -sayed Mohammad Amin Nourion -A’Kia Buford
Next-generation sequencing technology
DNA Sequencing Techniques
copying & sequencing DNA
PCR uses polymerases to copy DNA segments.
AMPLIFYING AND ANALYZING DNA.
SOLEXA aka: Sequencing by Synthesis
CISC 667 Intro to Bioinformatics (Spring 2007) Molecular Biology Tools
DNA Sequencing The DNA from the genome is chopped into bits- whole chromosomes are too large to deal with, so the DNA is broken into manageably-sized overlapping.
DNA and the Genome Key Area 8a Genomic Sequencing.
Massively Parallel Sequencing: The Next Big Thing in Genetic Medicine
PCR uses polymerases to copy DNA segments.
PCR uses polymerases to copy DNA segments.
Polymerase Chain Reaction (PCR) & DNA SEQUENCING
PCR uses polymerases to copy DNA segments.
PCR uses polymerases to copy DNA segments.
PCR uses polymerases to copy DNA segments.
SBI4U0 Biotechnology.
Polymerase Chain Reaction (PCR) & DNA SEQUENCING
PCR uses polymerases to copy DNA segments.
Presentation transcript:

High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown

DNA Sequencing The final essential tool in the molecular biology toolkit is the ability to read the base sequence of DNA molecules Fred Sanger developed an elegant method to sequence DNA by using DNA polymerase enzyme (for which he was awarded the Nobel Prize in 1980) The Sanger method copies a piece of cloned DNA but some of the copies are halted at each base pair along the sequence.

Sanger Method DNA polymerase adds free nucleotides to a primer which is complementary DNA template. Sanger used some modified dideoxynucleotides to stop the replication process if they are incorporated in the growing DNA chain (terminators). This produces a set of partial DNA copies of the original template sequence, each one stopping at a different base. Sanger used 4 different reactions that each contained only terminators for one of the bases. When the partial copies are sorted by size using electrophoresis, all fragment of a distinct size are terminated with the same base.

Automated Sequencing Sequencing technology was improved in the late 1980s by Leroy Hood who developed fluorescent color labels for the 4 terminator nucleotide bases. This allowed all 4 bases to be sequenced in a single reaction and sorted in a single gel lane. Hood also pioneered direct data collection by computer. Minor improvements in this technology now enable the sequencing of billion base genomes in a year or less.

Automated sequencing machines, particularly those made by PE Applied Biosystems, use 4 colors, so they can read all 4 bases at once.

DNA Sequencing capability has grown exponentially DNA sequences in GenBank Doubling time = 18 months

Next Generation Sequencing 454 Life Sciences/Roche Genome Sequencer FLX: currently produces 400-600 million bases per day per machine Published 1 million bases of Neanderthal DNA in 2006 May 2007 published complete genome of James Watson (3.2 billion bases ~20x coverage) Solexa/Illumina 10 GB per machine/week May 2008 published complete genomes for 3 hapmap subjects (14x coverage) ABI SOLID 20 GB per machine/week

“Paradigm Shift” Standard ABI “Sanger” sequencing 96 samples/day Read length ~650 bp Total = 450,000 bases of sequence data 454 was the game changer! ~400,000 different templates (reads)/day Read length ~250 bp Total = 100,000,000 bases of sequence data!!!

Solexa ups the Game Solexa (Illumina GA) 60,000,000 different sequence templates (yes that is an insane 60 million reads) 36 bp read length 4 billion bases of DNA per run (3 days)

Nanotechnology Each system works differently, but they are all based on a similar principals: Shear target DNA into small pieces bind individual DNA molecules to a solid surface, amplify each molecule into a cluster copy one base at a time and detect different signals for A, C, T, & G bases requires very precise high-resolution imaging of tiny features (Solexa has 800 images @ 4 megapixels each)

One (of 800) tiles on Solexa Sequencer

Huge Amount of Image Data The raw image data is truly huge: 1-2 TB for the Solexa, more for ABI-SOLID, less for 454 The images are immediately processed into intensity data (spots w/ location and brightness) Intensity data is then processed into basecalls (A, C, T, or G plus a quality score for each) Basecall data is on the order of 5-10 GB per run (or a week of runs for 454).

454 First high-throughput DNA sequencer, commercially available in 2004 Now (10/08) produces ~500 MB reads of 500 bp Run of 8 samples in 10 hours, so can do multiple runs/week Uses pyrosquencing, beads, and a microtiter plate Low error rate, but insert/delete problems with homopolymers (stretches of a single base)

Illumina Genome Analyzer Originally developed by Solexa, now subsidiary of Illumina. Commercially available in 2006 Now produces 8-12 million reads per sample of 36 bp length = 10 GB/week. Run takes 3 days for 7 samples. Low error rate, mostly base changes, few indels

Illumina sequencing technology in 12 steps Source: http://www.illumina.com/downloads/SS_DNAsequencing.pdf 24

1. Prepare genomic DNA 2. Attach DNA to surface 3. Bridge amplification 4. Fragments become double stranded 5. Denature the double- stranded molecules 6. Complete amplification DNA adapters Randomly fragment genomic DNA and ligate adapters to both ends of the fragments

4. Fragments become double stranded adapter DNA fragment 1. Prepare genomic DNA 2. Attach DNA to surface 3. Bridge amplification 4. Fragments become double stranded 5. Denature the double- stranded molecules 6. Complete amplification dense lawn of primers adapter Bind single-stranded fragments randomly to the inside surface of the flow cell channels

1. Prepare genomic DNA 2. Attach DNA to surface 3. Bridge amplification 4. Fragments become double stranded 5. Denature the double- stranded molecules 6. Complete amplification Add unlabeled nucleotides and enzyme to initiate solid-phase bridge amplification

4. Fragments become double stranded 1. Prepare genomic DNA 2. Attach DNA to surface 3. Bridge amplification 4. Fragments become double stranded 5. Denature the double- stranded molecules 6. Complete amplification Attached terminus free Attached terminus terminus The enzyme incorporates nucleotides to build double-stranded bridges on the solid-phase substrate

4. Fragments become double stranded 1. Prepare genomic DNA 2. Attach DNA to surface 3. Bridge amplification 4. Fragments become double stranded 5. Denature the double- stranded molecules 6. Complete amplification Attached Attached Denaturation leaves single-stranded templates anchored to the substrate

4. Fragments become double stranded 1. Prepare genomic DNA 2. Attach DNA to surface 3. Bridge amplification 4. Fragments become double stranded 5. Denature the double- stranded molecules 6. Complete amplification Clusters Several million dense clusters of double-stranded DNA are generated in each channel of the flow cell

10. Image second chemistry cycle 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align data Laser The first sequencing cycle begins by adding four labeled reversible terminators, primers, and DNA polymerase

7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align data After laser excitation, the emitted fluorescence from each cluster is captured and the first base is identified

10. Image second chemistry cycle 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align data Laser The next cycle repeats the incorporation of four labeled reversible terminators, primers, and DNA polymerase

7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align data After laser excitation the image is captured as before, and the identity of the second base is recorded.

7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align data The sequencing cycles are repeated to determine the sequence of bases in a fragment, one base at a time.

10. Image second chemistry cycle Reference sequence 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align data Unknown variant identified and called Known SNP called The data are aligned and compared to a reference, and sequencing differences are identified.

Illumina Genome Analyzer Richard K. Wilson

Paired-End Sequencing Nature Methods 5, May 2008

Sequencing Resynthesis of P5 Strand (15Cycles) Sequencing First Read OH Sequencing First Read Denaturation and De-Protection OH Denaturation and Hybridization P7 Linearization OH Sequencing Second Read Denaturation and Hybridization Block with ddNTPs The steps up to and including the first read sequencing are pretty much the same as for a single read. The first read sequencing is where the single read protocol would stop. For the PE protocol, it continues with deprotecting the P5 primer using deprotection enzyme. Resynthesis of the P5 strand occurs over 15 cycles. P7 linearization uses Linearization 2 Enzyme. Blocking again occurs with ddNTPS and Blocking enzyme 1 and 2. Sequencing read 2 uses Read 2 PE Sequencing Primer.

ABI-SOLID First commercially available in late 2007 Currently capable of producing 20 GB of data per run (week) Most users generate 6 GB/run Reads ~30 bp long Uses unique sequence-by-ligation method “color-space” data Very low error rate

Short Reads Short reads from Nex-Gen machines are a challenge (Solexa = 36 bp) Very hard to assemble whole genomes Difficult to get any information on repeat regions Requires many-fold coverage New algorithms needed for many traditional bioinformatics operations Reads are getting longer – another moving target

PacBio High throughput Single Molecule Real Time (SMRT) Sequencing

PacBio High throughput Single Molecule Real Time (SMRT) Sequencing

PacBio

PacBio

PacBio www.pacificbiosciences.com/

Applications “If you build it, they will come.” An explosion of scientific innovation! Every new technology enables new applications, which are not directly foreseen by the original developers of the tech. Cheap access to high-volume sequencing becomes a data collection method for many different types of experimental applications

When All You Have is a Hammer, All Problems Look Like Nails Mark Twains

Applications