NGS technologies
First generation Maxam-Gilbert Sanger
Sanger sequencing
Automatization: first generation Possible due to: the replacement of phosphor tritrium-radiolabelling with fluorometric based detection (allowing the reaction to occur in one vessel instead of four) improved detection through capillary based electrophoresis. Read length: slightly less than 1kb Shotgun sequencing: Clone overlapping fragments and sequence. Enhancements: PCR (in vitro cloning) Recombinant DNA technologies More efficient DNA polymerases Newer sequencers allowed for simultaneous sequencing of multiple samples
The second generation: pyrophosphate synthesis Advantages: Natural nucleotides were used (instead of heavily modified dNTPs) Bases could be read real-time, without requiring electrophoresis Enhancements: Attaching DNA to paramagnetic beads Removing unincorporated dNTPs enzymatically (without the need to wash) Challenge: Determine the number of attached nucleotides in a row (the GG and CC cases)
454 Roche Pyrosequencing was later licensed to 454 Life Sciences, a biotechnology company founded by Jonathan Rothburg, where it evolved into the first major successful commercial ‘next-generation sequencing’ (NGS) technology. Later purchased by Roche. Paradigm shift: mass parallelisation of sequencing reactions due to microfabrication and high-resolution imaging
DNA nebulization => fragments of length 50-900 bp Fragments are blunt-ended and phosphorylated with polymerases (polishing reaction) Adapter ligation: 44-base adaptors – containing in 5’-3’ direction: 20b PCR primer 20b sequencing primer 4b AGTC sequencing key Two classes of adaptors: A and B, differing in sequence and in presence of 5’ biotin tag in B adaptor Size selection: gel electrophoresis followed by cutting the regions for250-500bp Single stranded AB-adapted library prepared Attachment to beads, emulsion PCR Double stranded amplified DNA melted to single strands Enrichment primer added to filter against null beads Sequencing primer added Incubation with DNA polymerase, SSB protein, apyrase – on fibreoptic slide Luciferase preparation Signal normalization (account for the number of beads per well, and the number of template sequences per bead): (i) raw signals are first normalized by reference to the pre- and post-sequencing run PPi standard flows, (ii) these signals are further normalized by reference to the signals measured during incorporation of the first three bases of the known “key” sequence included in each template.
The Solexa-Illumina system Shankar Balasubramanian, David Klenerman very early experiments included the observation of single molecules of DNA polymerase binding to substrate DNA (template plus primer) in solution, using a highly sensitive confocal single molecule detection system (cleavable) dye-labaled dNTPs
1997: Solexa sequencing concept Basic => applied science transition Chemical blockage of dNTPs Mutagenesis of DNA polymerase
1997: Solexa sequencing concept Basic => applied science transition 2. Solid phase sequencing Goal: Theoretical limit was estimated to be one sequenceable DNA fragment per diffraction limited site (less than 1 square mm) => Aim was to sequence one billion bases (~ human genome size) Fragmentation drive parallelization was possible with consideration of future alignment of the reads to a ‘master’ genome.
Advancements to solexa sequencing concept: solid phase DNA amplification a stronger signal, a less expensive imaging system reduction of stochastic single molecule errors
An image of the surface taken during a cycle of an early Solexa sequencing experiment. Each of the spots is a cluster of identical DNA sample fragments and the colour indicates which of the four bases has been incorporated at that particular cycle.
Advancements to solexa sequencing concept: paired-end sequencing sequencing of one end of the original DNA fragment, followed by the other end to facilitate de novo sequencing of genomes
Timeline 2006, Solexa-Illumina Genome Analyzer, 1 bln bases, 1G of human genome (2.5 days to generate read lengths of 36 bases) The first human African, Asian and cancer genomes plus the first giant panda genome were sequenced on the Genome Analyser. 2010, Illumina HiSeq 2000, 200 bln bases, 200G of genome (2 human genomes with x30 coverage) Five-fold increase in sequencing capacity annually (more than 2-fold of the Moore’s law)
Illumina sequencing technologies Details Illumina sequencing technologies
General workflow
Fragmentation Genome Analyzer was done by nebulisation (in 30-60% glycerol at 30-35psi) fragments were of range 0-1200bp and a peak around 5-600bp reproducible and cheap technique But uneconomical: 200bp +/- 20bp fragments represent only ∼10% of the total DNA half of the DNA vaporises during nebulisation Thus only 5% of the original DNA is used for subsequent library generation. Acoustic energy is controllably focused into the aqueous DNA sample, resulting in cavitation events. The collapse of bubbles in the suspension creates multiple, intense, localized jets of water, which disrupt the DNA molecules. Range = 100-5000 bp 200bp fragments comprise 17% of the total fractionated DNA very little DNA is lost Narrow size distribution allows to skip the size-selection step in some applications
Size selection 1. Size detection 2. Slicing 3. Gel melting and DNA extraction: A/T bias
General workflow
Fragment processing, adapter ligation Illumina: Fragments are blunted and 5’ phosphorylated with enzymes (polymerases and kinases) the 3′ ends are A-tailed Adapters are ligated with T-overhangs (sticky end ligation) Adapters can be barcoded for different samples Ion torrent: Fragments are blunted Adapters are blunt-end ligated
Fragment processing, adapter ligation
General workflow
Size-selection, PCR amplification Size selection is for Filtering against adapter-dimers Filtering against chimeric fragment ligations PCR amplification Increases depth or coverage May lead to over-duplication (PCR duplicates) High amount of initial template DNA may lead to single-stranded fragments
General workflow
Bridge amplification
General workflow
Sequencing by synthesis
Sequencing by synthesis
Paired-end sequencing: long (d) and short (c) inserts
Illumina machine comparison
Illumina advantages Advantages High throughput/ cost Wide range of applications Disadvantages High substitution error rates Sequence quality deterioration towards the end
Summary: SOLiD has one of the lowest error-rates (~0 Summary: SOLiD has one of the lowest error-rates (~0.01) due to 2-base encoding. It is however still limited by short read lengths ( 35 bp / 85 bp for PE).
Comparison of sequencing technologies
454 Pros: long reads Cons: low throughput, high reagent cost, high error rates in homopolymers Illumina/Solexa Pros: leader => most library prep protocols are for Illumina; highest throughput and lowest cost per base; Reads up to 300 bp Cons: before hiseq X, random scattering of fragments across the flow cell made it concentration-dependent; Requires high complexity of library (?) SOLiD Pros: highest throughput; lower error rates Cons: short treads (75 nt); less widely used Ion Torrent: Pros: semi-conductor – no need for optical scanning and fluorescent nucleotides; Fast run-times Cons: high error-rates for homopolymers PacBio Pros: extremely long reads (up to 20 kb); fast run-times Cons: high-cost; high error-rate; low throughput