Presentation is loading. Please wait.

Presentation is loading. Please wait.

Next Generation Sequencing

Similar presentations


Presentation on theme: "Next Generation Sequencing"— Presentation transcript:

1 Next Generation Sequencing
The past, present, and future of DNA sequencing *DNA sequencing: Determining the number and order of nucleotides that make up a given molecule of DNA. Alex V. Postma, PhD Department of Anatomy, Embryology & Physiology Academic Medical Center 1 1

2 (Relevant) Trivia How many base pairs (bp) are there in a human genome? How much did it cost to sequence the first human genome? How long did it take to sequence the first human genome? When was the first human genome sequence complete? Whose genome was it?

3 (Relevant) Trivia ~3 billion (haploid) ~$2.7 billion ~13 years How many base pairs (bp) are there in a human genome? How much did it cost to sequence the first human genome? How long did it take to sequence the first human genome? When was the first human genome sequence complete?

4 Genome Sequencing Goal Problem Solution
figuring the order of nucleotides across a genome Problem Current DNA sequencing methods can handle only short stretches of DNA at once (<1-2Kbp) Solution Sequence and then use computers to assemble the small pieces

5 Genome Sequencing AC..GC TT..TC CG..CA TG..GT TC..CC GA..GC TG..AC CT..TG GT..GC AT..AT TT..CC AA..GC Short DNA sequences Genome Short fragments of DNA ACGTGGTAA CGTATACAC TAGGCCATA GTAATGGCG CACCCTTAG TGGCGTATA CATA… ACGTGGTAATGGCGTATACACCCTTAGGCCATA ACGTGACCGGTACTGGTAACGTACA CCTACGTGACCGGTACTGGTAACGT ACGCCTACGTGACCGGTACTGGTAA CGTATACACGTGACCGGTACTGGTA ACGTACACCTACGTGACCGGTACTG GTAACGTACGCCTACGTGACCGGTA CTGGTAACGTATACCTCT... Sequenced genome

6 Sanger Sequencing Mix DNA with dNTPs and ddNTPs Amplify Run in Gel
Fragments migrate distance that is proportional to their size

7 Sanger Sequencing

8 Sanger Sequencing Advantages Disadvantages Long reads (~900bps)
Suitable for small projects Disadvantages Low throughput Expensive

9 Sanger Sequencing 2007: Global Ocean Sampling Expedition
~3,000 organisms, 7Gbp (Venter et al.) 1994: H. Influenzae 1.8 Mbp (Fleischmann et al.) 1980 1990 2000 1982: lambda virus DNA stretches up to Kbp (Sanger et al.) 2001: H. Sapiens, D. Melanogaster 3 Gbp (Venter et al.)

10 Next Generation Sequencing: Why Now?
Motivation: HGP and its derivatives, personalized medicine Short reads applications: (re-)sequencing, other methods (e.g. gene expression) Advancements in technology NGS is a general term refering to all post-Sanger sequencing technologies that enable massive sequencing at low cost. NGS may be further divided into polony-sequencing based technologies which require the amplification of DNA prior to sequencing, and single molecule sequencing which do not. Motivation for new technologies drives its roots not only from potentially commercial usage such as in personalised medicine, but also from government supported projects suce as the HGP or the 1000 genomes projects aiming to sequence the genomes of 1000 individuals around the world with price tag for genome sequencing single genomes set to 50,000$. other than de-novo sequencing Potential applications include re-sequencing, and also gene expression analysis, both can make use of short reads which are offered by all current technologies. So despite the read-length barrier of the new technologies, sequencers still became commercial. And of course – advancements in chemistry, microscopy and other related technologies enabled the new sequencing technologies. 10

11 High Parallelism is Achieved in Polony Sequencing
Sanger Polony Polony sequencing refers to all commercial technologies except for Helicos. Polony sequencing takes place using array of polonies, in which all amplicons of the same DNA fragment are clustered together on the same region of the array. These groups of amplicons were termed polonies, shortcut for polymerase colonies. The degree of parallelism that can be achieved through Sanger sequencing is only a fraction of what can be achieved in polony sequencing 11

12 Generation of Polony array: DNA Beads (454, SOLiD)
Generation of polony array is done as follows: The process begins with the mixing of the DNA fragments ligased to connectors with beads, PCR components and primers in water. The components are mixed with oil in order to create “microreactors”, which are droplets of water containing all necessary components for PCR. Next, PCR is performed with the new copies in each microreactor being attached to the bead. Finally, the emulsion and empty beads are removed and we are left with only DNA containing beads. DNA Beads are generated using Emulsion PCR 12

13 Generation of Polony array: DNA Beads (454, SOLiD)
The beads are loaded onto an array containing pico-liter scale wells. Together with small beads containing the enzymes required for the reactions the DNA beads are placed into the wells. DNA Beads are placed in wells

14 Generation of Polony array: Bridge-PCR (Solexa)
Create DNA library Place on array Perform bridge-PCR (primers are attached to an array) Results: ~1M colonies with ~1K sequences at each DNA fragments are attached to array and used as PCR templates 14

15 Single Molecule Sequencing: HeliScope
Direct sequencing of DNA molecules: no amplification stage DNA fragments are attached to array Potential benefits: higher throughput, less errors DNA fragments are attached to array as in Illumina Sequencing is asynchronous, using highly sensitive fluorescence detection system Based on work from Stephen Quake’s group (Harvard) In a work published by Quake’s lab a human genome was fully sequenced at a cost of 40K $. 15

16 Genome Sequencer 20 (454) Genome Analyzer (Solexa) Ion torrent MinION

17 *Source: Shendure & Ji, Nat Biotech, 2008
Technology Summary Read length Sequencing Technology Throughput (per run) Cost (1mbp)* Sanger ~800bp 400kbp 500$ 454 ~400bp Polony 500Mbp 60$ Solexa 75bp 20Gbp 2$ SOLiD 60Gbp Helicos 30-35bp Single molecule 25Gbp 1$ Instrument cost should be taken into account: 454, Solexa and ABI is ~40% of HeliScope 454 Life Sciences: FLX Titnium series. Run=10 hours, a cluster of computers is required (only a single processor for the standard FLX) . ABI SOLiD 3 ( *Source: Shendure & Ji, Nat Biotech, 2008 17

18 Comparing Different Technologies
Sanger Sequencing Advantages Disadvantages Lowest error rate Long read length (~750 bp) Can target a primer High cost per base Long time to generate data Need for cloning Amount of data per run

19 Comparing Different Technologies
454 Sequencing Advantages Disadvantages Low error rate Medium read length (~ bp) Relatively high cost per base Must run at large scale Medium/high startup costs

20 Comparing Different Technologies
Ion Torrent Sequencing Advantages Disadvantages Low startup costs Scalable (10 – 1000 Mb of data per run) Medium/low cost per base Low error rate Fast runs (<3 hours) New, developing technology Cost not as low as Illumina Read lengths only ~ bp so far

21 Comparing Different Technologies
Illumina Sequencing Advantages Disadvantages Low error rate Lowest cost per base Tons of data Must run at very large scale Short read length (50-75 bp) Runs take multiple days High startup costs De Novo assembly difficult

22 Comparing Different Technologies
PacBio Sequencing Advantages Disadvantages Can use single molecule as template Potential for very long reads (several kb+) High error rate (~10-15%) Medium/high cost per base High startup costs

23 NGS Platforms Overview
Differ in design and chemistries Fundamentally related-sequencing of thousands to millions of clonally amplified molecules in a massively parallel manner Orders of magnitude more information-will continue to evolve Attractive for clinical applications – individual sequencing assays costly and laborious- serial “gene by gene” analysis Pacific Biosciences Helicos Biosciences NABsys VisiGen Biotechnologies Complete Genomics Oxford Nanophore Technologies

24 What, When and Why Sanger: 454: Solexa, SOLiD, Heliscope:
Small projects (less than 1Mbp) 454: De-novo sequencing, metagenomics Solexa, SOLiD, Heliscope: Gene expression, protein-DNA interactions Resequencing 24

25 Sequencing the Human Genome
2001: Human Genome Project 2.7G$, 11 years 10 2001: Celera 100M$, 3 years 2007: 454 1M$, 3 months 8 2008: ABI SOLiD 60K$, 2 weeks 6 Log10(price) 2010: 5K$, a few days? 2009: Illumina, Helicos 40-50K$ I would like to begin with an overview of the history of human genome sequencing. Despite significant improvements … it was clear that Sanger sequencing would not make massive DNA sequencing at a low cost and high speed feasible. Several technologies were developed at the time, of which the 454 Life Sciences sequencer was the first to become commercial in years later it was used for … Whether …, but the direction is clear: in a few years from now very fast and cheap sequencing technologies will be available for commercial and research purposes 4 2012: 100$, <24 hrs? 2 2000 2005 2010 Year 25

26 Sequencing costs have fallen

27 Next Generation Sequencing Applications
Mutation dectection Foreign DNA detection Non invasive diagnosis aneuplody Population characterization Cancer genetics Ancient DNA (Neanderthaler) Expression analysis Transcription binding Chromosomal interaction Etc etc

28 chromosomal aneuploidy – מספר לא נורמלי של כרומוזומים
In this work the authors were able to detect abnormalities in the number of chromosomes using massive sequencing of plasma extracted from a blood sample collected from the mother. chromosomal aneuploidy – מספר לא נורמלי של כרומוזומים amniosentesis - מי שפיר chorionic villus sampling - סיסי שלייה. Cell free fetal DNA 28 28

29 Exome Sequencing Identifies a Tibetan Adaptation
Yi et al. Science 2010 The widespread mutation in Tibetans is near a gene called EPAS1, a so-called “super athlete gene” identified several years ago and named because some variants of the gene are associated with improved athletic performance. The gene codes for a protein involved in sensing oxygen levels and perhaps balancing aerobic and anaerobic metabolism.

30 Ancient Genomes Resurrected
Degraded state of the sample  mitDNA sequencing Nuclear genomes of ancient remains: cave bear, mommoth, Neanderthal (106 bp ) Problems: contamination modern humans and coisolation bacterial DNA

31 NGS Application Examples- Inherited Conditions
Discovery tool: Single gene disorders i.e. AD – Kabuki syndrome (MLL) Causative mutations for multigenic diseases –superior to “one by one” approach of traditional sequencing Diagnostic advancements for diseases with overlapping symptoms, multiple possible syndromes/genes

32 Variant detection through next generation sequencing
Meyerson et al. NRG 2010

33 Inherited Conditions- Challenges and Opportunities
Example: Monogenic disorders Novel missense mutations Structural aberrations Germ line mosaicism Imprinting effects Epigenetic factors Opportunities Example: Multifactorial disease Risk loci more often in non-coding or inter-gene regions Pathogenicity of variants often unclear- less testing vs. monogenic disease Reference human genome cataloguing of variants = more test offerings

34 Sequencing of a Single Individual with Family Data
Lupski et al. NEJM 2010

35 The First 8 Human Genomes

36 SNP Distribution in Proband

37 Nonsynonymous SNPs in Known Disease Genes

38 NGS Application Examples- Neoplastic Conditions
Cancer susceptibility genes Risk assessment Risk management Tumor sub-typing Micro-RNAs Prognosis Alterations in gene expression Molecular profiling Patient stratification Predictions of therapeutic response personalized treatment Therapeutic monitoring Somatic/driver mutations Methylation Epigenetic changes

39 Exome Sequencing in Prostate Cancer
Barbieri et al. Nature Genetics 2012

40 Exome Sequencing in Prostate Cancer
Barbieri et al. Nature Genetics 2012

41 Nonsynonymous Somatic Mutations in Neuroblastoma
Molenaar et al. Nature 2012

42 Mutation count associated with age, stage, and survival
Molenaar et al. Nature 2012

43 Next Generation Sequencing
NGS diagnostics - shifted towards data analysis rather than the technical component NGS infrastructures must consist of appropriate expertise and computational hardware Unprecedented amounts of medical data and various processing algorithms necessitate adequate tools for Data management (alignment and assembly) QC of image processing, base calling, filtering, alignment, SNP finding/application steps archiving

44 Considerations Evaluation of the variant positions “called” involves queries of all known relevant databases Lack of databases curated to accept clinical standards likely the most significant challenge in managing and reporting genome sequencing data EHR considerations – test ordering, archiving of NGS reports, patient consent, data (reinterpretation?)

45 NGS-Post-Analytical Considerations
Expert interpretation and guidance-correlation of age, gender, clinical presentation, family hx Team approach ideal -pathologists, geneticists, other providers Proficiency testing and alternative assessment are challenging Proficiency testing schemes based on NGS methods vs. specific genes are likely

46 Professional Considerations-Reimbursement and Gene Patents
Challenging reimbursement issues Genome sequencing may potentially involve numerous patented gene sequences Development of an affordable system of common access to genes? What about mutations in known disease genes, not evident to patient phenotype?


Download ppt "Next Generation Sequencing"

Similar presentations


Ads by Google