Next Generation Sequencing

Slides:



Advertisements
Similar presentations
Next-Generation Sequencing: Methodology and Application
Advertisements

Jane Gibson, Ph.D., FACMG Professor of Pathology
The Good, Bad, and Ugly of Next-Gen Sequencing
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Next–generation DNA sequencing technologies – theory & practice
High-Throughput Sequencing Technologies
Next-generation sequencing
Gene 210 Cancer Genomics May 5, Key events in investigating the cancer genome M R Stratton Science 2011;331:
The past, present, and future of DNA sequencing Dan Russell.
Canadian Bioinformatics Workshops
Next-generation sequencing and PBRC. Next Generation Sequencer Applications DeNovo Sequencing Resequencing, Comparative Genomics Global SNP Analysis Gene.
Greg Phillips Veterinary Microbiology
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Bacterial Physiology (Micr430)
1 Next Generation Sequencing Itai Sharon November 11th, 2009 Introduction to Bioinformatics.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Microsatellite Instability Detection by Next Generation Sequencing S.J. Salipante, S.M. Scroggins, H.L. Hampel, E.H. Turner, and C.C. Pritchard September.
High Throughput Sequencing
Department of Bioinformatics and Computational Biology
CS 6293 Advanced Topics: Current Bioinformatics
Presented by Karen Xu. Introduction Cancer is commonly referred to as the “disease of the genes” Cancer may be favored by genetic predisposition, but.
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
Next generation sequencing platforms Applications
Next Now-Generation Genomics: methods and applications for modern disease research Aaron J. Mackey, Ph.D. Center for Public Health.
High-Throughput Sequencing Technologies
Molecular Biology Dr. Chaim Wachtel April 4, 2013.
Next Generation Sequencing – Benefits for Patients Jo Whittaker/ Su Stenhouse.
Dr Katie Snape Specialist Registrar in Genetics St Georges Hospital
Whole Exome Sequencing for Variant Discovery and Prioritisation
LEQ: WHAT ARE THE BENEFITS OF DNA TECHNOLOGY & THE HUMAN GENOME PROJECT? to
Biotechnology SB2.f – Examine the use of DNA technology in forensics, medicine and agriculture.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Genetics-multistep tumorigenesis genomic integrity & cancer Sections from Weinberg’s ‘the biology of Cancer’ Cancer genetics and genomics Selected.
Next-Generation Sequencing: Methodology and Application
Bioinformatics Algorithms Sequence Assembly 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources.
Next-Generation Sequencing
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
CS177 Lecture 10 SNPs and Human Genetic Variation
Next-Generation Sequencing Eric Jorgenson Epidemiology 217 2/28/12.
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
Molecular Biology Dr. Chaim Wachtel May 28, 2015.
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
SEQUENCING – THE BENCHTOPS. Roche 454 Junior Same technology as 454 FLX Read length: 400 bases Paired-end 100,000 reads 12 hours (instrument time) Output.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Neanderthals Noonan, et al. Sequencing and Analysis of Neanderthal Genomic DNA Green, et al. Analysis of one million base pairs of Neanderthal DNA Kristine.
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Anna Shcherbina Bioinformatics Challenge Day 01/10/2013 De novo assembly from clinical sample This work is sponsored by the Defense Threat Reduction Agency.
Advances in Genetic Technology Class Notes Make sure you study this along with our first PowerPoint on Transgenics and your class Article notes.
INTERPRETING GENETIC MUTATIONAL DATA FOR CLINICAL ONCOLOGY Ben Ho Park, M.D., Ph.D. Associate Professor of Oncology Johns Hopkins University May 2014.
Unit 1 – Living Cells.  The study of the human genome  - involves sequencing DNA nucleotides  - and relating this to gene functions  In 2003, the.
Recent Advances in Genomic Science Julian Sampson Institute of Medical Genetics, Cardiff.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Bioinformatics Algorithms Sequence Assembly. Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources.
Next-generation sequencing technology
Next Generation Sequencing
Interpreting exomes and genomes: a beginner’s guide
Research Techniques Made Simple: Next-Generation Sequencing:
Biotechnology.
Next generation sequencing
Sequencing technologies
DNA Sequencing -sayed Mohammad Amin Nourion -A’Kia Buford
Next-generation sequencing technology
Very important to know the difference between the trees!
Genomes and Their Evolution
Next-generation DNA sequencing
Presentation transcript:

Next Generation Sequencing The past, present, and future of DNA sequencing *DNA sequencing: Determining the number and order of nucleotides that make up a given molecule of DNA. Alex V. Postma, PhD Department of Anatomy, Embryology & Physiology Academic Medical Center 1 1

(Relevant) Trivia How many base pairs (bp) are there in a human genome? How much did it cost to sequence the first human genome? How long did it take to sequence the first human genome? When was the first human genome sequence complete? Whose genome was it?

(Relevant) Trivia ~3 billion (haploid) ~$2.7 billion ~13 years 2000-2003 How many base pairs (bp) are there in a human genome? How much did it cost to sequence the first human genome? How long did it take to sequence the first human genome? When was the first human genome sequence complete?

Genome Sequencing Goal Problem Solution figuring the order of nucleotides across a genome Problem Current DNA sequencing methods can handle only short stretches of DNA at once (<1-2Kbp) Solution Sequence and then use computers to assemble the small pieces

Genome Sequencing AC..GC TT..TC CG..CA TG..GT TC..CC GA..GC TG..AC CT..TG GT..GC AT..AT TT..CC AA..GC Short DNA sequences Genome Short fragments of DNA ACGTGGTAA CGTATACAC TAGGCCATA GTAATGGCG CACCCTTAG TGGCGTATA CATA… ACGTGGTAATGGCGTATACACCCTTAGGCCATA ACGTGACCGGTACTGGTAACGTACA CCTACGTGACCGGTACTGGTAACGT ACGCCTACGTGACCGGTACTGGTAA CGTATACACGTGACCGGTACTGGTA ACGTACACCTACGTGACCGGTACTG GTAACGTACGCCTACGTGACCGGTA CTGGTAACGTATACCTCT... Sequenced genome

Sanger Sequencing Mix DNA with dNTPs and ddNTPs Amplify Run in Gel Fragments migrate distance that is proportional to their size

Sanger Sequencing

Sanger Sequencing Advantages Disadvantages Long reads (~900bps) Suitable for small projects Disadvantages Low throughput Expensive

Sanger Sequencing 2007: Global Ocean Sampling Expedition ~3,000 organisms, 7Gbp (Venter et al.) 1994: H. Influenzae 1.8 Mbp (Fleischmann et al.) 1980 1990 2000 1982: lambda virus DNA stretches up to 30-40Kbp (Sanger et al.) 2001: H. Sapiens, D. Melanogaster 3 Gbp (Venter et al.)

Next Generation Sequencing: Why Now? Motivation: HGP and its derivatives, personalized medicine Short reads applications: (re-)sequencing, other methods (e.g. gene expression) Advancements in technology NGS is a general term refering to all post-Sanger sequencing technologies that enable massive sequencing at low cost. NGS may be further divided into polony-sequencing based technologies which require the amplification of DNA prior to sequencing, and single molecule sequencing which do not. Motivation for new technologies drives its roots not only from potentially commercial usage such as in personalised medicine, but also from government supported projects suce as the HGP or the 1000 genomes projects aiming to sequence the genomes of 1000 individuals around the world with price tag for genome sequencing single genomes set to 50,000$. other than de-novo sequencing Potential applications include re-sequencing, and also gene expression analysis, both can make use of short reads which are offered by all current technologies. So despite the read-length barrier of the new technologies, sequencers still became commercial. And of course – advancements in chemistry, microscopy and other related technologies enabled the new sequencing technologies. 10

High Parallelism is Achieved in Polony Sequencing Sanger Polony Polony sequencing refers to all commercial technologies except for Helicos. Polony sequencing takes place using array of polonies, in which all amplicons of the same DNA fragment are clustered together on the same region of the array. These groups of amplicons were termed polonies, shortcut for polymerase colonies. The degree of parallelism that can be achieved through Sanger sequencing is only a fraction of what can be achieved in polony sequencing 11

Generation of Polony array: DNA Beads (454, SOLiD) Generation of polony array is done as follows: The process begins with the mixing of the DNA fragments ligased to connectors with beads, PCR components and primers in water. The components are mixed with oil in order to create “microreactors”, which are droplets of water containing all necessary components for PCR. Next, PCR is performed with the new copies in each microreactor being attached to the bead. Finally, the emulsion and empty beads are removed and we are left with only DNA containing beads. DNA Beads are generated using Emulsion PCR 12

Generation of Polony array: DNA Beads (454, SOLiD) The beads are loaded onto an array containing pico-liter scale wells. Together with small beads containing the enzymes required for the reactions the DNA beads are placed into the wells. DNA Beads are placed in wells

Generation of Polony array: Bridge-PCR (Solexa) Create DNA library Place on array Perform bridge-PCR (primers are attached to an array) Results: ~1M colonies with ~1K sequences at each DNA fragments are attached to array and used as PCR templates 14

Single Molecule Sequencing: HeliScope Direct sequencing of DNA molecules: no amplification stage DNA fragments are attached to array Potential benefits: higher throughput, less errors DNA fragments are attached to array as in Illumina Sequencing is asynchronous, using highly sensitive fluorescence detection system Based on work from Stephen Quake’s group (Harvard) In a work published by Quake’s lab a human genome was fully sequenced at a cost of 40K $. 15

Genome Sequencer 20 (454) Genome Analyzer (Solexa) Ion torrent MinION

*Source: Shendure & Ji, Nat Biotech, 2008 Technology Summary Read length Sequencing Technology Throughput (per run) Cost (1mbp)* Sanger ~800bp 400kbp 500$ 454 ~400bp Polony 500Mbp 60$ Solexa 75bp 20Gbp 2$ SOLiD 60Gbp Helicos 30-35bp Single molecule 25Gbp 1$ Instrument cost should be taken into account: 454, Solexa and ABI is ~40% of HeliScope 454 Life Sciences: FLX Titnium series. Run=10 hours, a cluster of computers is required (only a single processor for the standard FLX) . http://www.454.com/products-solutions/system-features.asp#titanium ABI SOLiD 3 (http://www3.appliedbiosystems.com/AB_Home/applicationstechnologies/SOLiDSystemSequencing/overviewofsolidsystem/index.htm) *Source: Shendure & Ji, Nat Biotech, 2008 17

Comparing Different Technologies Sanger Sequencing Advantages Disadvantages Lowest error rate Long read length (~750 bp) Can target a primer High cost per base Long time to generate data Need for cloning Amount of data per run

Comparing Different Technologies 454 Sequencing Advantages Disadvantages Low error rate Medium read length (~400-600 bp) Relatively high cost per base Must run at large scale Medium/high startup costs

Comparing Different Technologies Ion Torrent Sequencing Advantages Disadvantages Low startup costs Scalable (10 – 1000 Mb of data per run) Medium/low cost per base Low error rate Fast runs (<3 hours) New, developing technology Cost not as low as Illumina Read lengths only ~100-200 bp so far

Comparing Different Technologies Illumina Sequencing Advantages Disadvantages Low error rate Lowest cost per base Tons of data Must run at very large scale Short read length (50-75 bp) Runs take multiple days High startup costs De Novo assembly difficult

Comparing Different Technologies PacBio Sequencing Advantages Disadvantages Can use single molecule as template Potential for very long reads (several kb+) High error rate (~10-15%) Medium/high cost per base High startup costs

NGS Platforms Overview Differ in design and chemistries Fundamentally related-sequencing of thousands to millions of clonally amplified molecules in a massively parallel manner Orders of magnitude more information-will continue to evolve Attractive for clinical applications – individual sequencing assays costly and laborious- serial “gene by gene” analysis Pacific Biosciences Helicos Biosciences NABsys VisiGen Biotechnologies Complete Genomics Oxford Nanophore Technologies

What, When and Why Sanger: 454: Solexa, SOLiD, Heliscope: Small projects (less than 1Mbp) 454: De-novo sequencing, metagenomics Solexa, SOLiD, Heliscope: Gene expression, protein-DNA interactions Resequencing 24

Sequencing the Human Genome 2001: Human Genome Project 2.7G$, 11 years 10 2001: Celera 100M$, 3 years 2007: 454 1M$, 3 months 8 2008: ABI SOLiD 60K$, 2 weeks 6 Log10(price) 2010: 5K$, a few days? 2009: Illumina, Helicos 40-50K$ I would like to begin with an overview of the history of human genome sequencing. Despite significant improvements … it was clear that Sanger sequencing would not make massive DNA sequencing at a low cost and high speed feasible. Several technologies were developed at the time, of which the 454 Life Sciences sequencer was the first to become commercial in 2005. 2 years later it was used for … Whether …, but the direction is clear: in a few years from now very fast and cheap sequencing technologies will be available for commercial and research purposes 4 2012: 100$, <24 hrs? 2 2000 2005 2010 Year 25

Sequencing costs have fallen

Next Generation Sequencing Applications Mutation dectection Foreign DNA detection Non invasive diagnosis aneuplody Population characterization Cancer genetics Ancient DNA (Neanderthaler) Expression analysis Transcription binding Chromosomal interaction Etc etc

chromosomal aneuploidy – מספר לא נורמלי של כרומוזומים In this work the authors were able to detect abnormalities in the number of chromosomes using massive sequencing of plasma extracted from a blood sample collected from the mother. chromosomal aneuploidy – מספר לא נורמלי של כרומוזומים amniosentesis - מי שפיר chorionic villus sampling - סיסי שלייה. Cell free fetal DNA 28 28

Exome Sequencing Identifies a Tibetan Adaptation Yi et al. Science 2010 The widespread mutation in Tibetans is near a gene called EPAS1, a so-called “super athlete gene” identified several years ago and named because some variants of the gene are associated with improved athletic performance. The gene codes for a protein involved in sensing oxygen levels and perhaps balancing aerobic and anaerobic metabolism.

Ancient Genomes Resurrected Degraded state of the sample  mitDNA sequencing Nuclear genomes of ancient remains: cave bear, mommoth, Neanderthal (106 bp ) Problems: contamination modern humans and coisolation bacterial DNA

NGS Application Examples- Inherited Conditions Discovery tool: Single gene disorders i.e. AD – Kabuki syndrome (MLL) Causative mutations for multigenic diseases –superior to “one by one” approach of traditional sequencing Diagnostic advancements for diseases with overlapping symptoms, multiple possible syndromes/genes

Variant detection through next generation sequencing Meyerson et al. NRG 2010

Inherited Conditions- Challenges and Opportunities Example: Monogenic disorders Novel missense mutations Structural aberrations Germ line mosaicism Imprinting effects Epigenetic factors Opportunities Example: Multifactorial disease Risk loci more often in non-coding or inter-gene regions Pathogenicity of variants often unclear- less testing vs. monogenic disease Reference human genome cataloguing of variants = more test offerings

Sequencing of a Single Individual with Family Data Lupski et al. NEJM 2010

The First 8 Human Genomes

SNP Distribution in Proband

Nonsynonymous SNPs in Known Disease Genes

NGS Application Examples- Neoplastic Conditions Cancer susceptibility genes Risk assessment Risk management Tumor sub-typing Micro-RNAs Prognosis Alterations in gene expression Molecular profiling Patient stratification Predictions of therapeutic response personalized treatment Therapeutic monitoring Somatic/driver mutations Methylation Epigenetic changes

Exome Sequencing in Prostate Cancer Barbieri et al. Nature Genetics 2012

Exome Sequencing in Prostate Cancer Barbieri et al. Nature Genetics 2012

Nonsynonymous Somatic Mutations in Neuroblastoma Molenaar et al. Nature 2012

Mutation count associated with age, stage, and survival Molenaar et al. Nature 2012

Next Generation Sequencing NGS diagnostics - shifted towards data analysis rather than the technical component NGS infrastructures must consist of appropriate expertise and computational hardware Unprecedented amounts of medical data and various processing algorithms necessitate adequate tools for Data management (alignment and assembly) QC of image processing, base calling, filtering, alignment, SNP finding/application steps archiving

Considerations Evaluation of the variant positions “called” involves queries of all known relevant databases Lack of databases curated to accept clinical standards likely the most significant challenge in managing and reporting genome sequencing data EHR considerations – test ordering, archiving of NGS reports, patient consent, data (reinterpretation?)

NGS-Post-Analytical Considerations Expert interpretation and guidance-correlation of age, gender, clinical presentation, family hx Team approach ideal -pathologists, geneticists, other providers Proficiency testing and alternative assessment are challenging Proficiency testing schemes based on NGS methods vs. specific genes are likely

Professional Considerations-Reimbursement and Gene Patents Challenging reimbursement issues Genome sequencing may potentially involve numerous patented gene sequences Development of an affordable system of common access to genes? What about mutations in known disease genes, not evident to patient phenotype?