© 2013 Illumina, Inc. All rights reserved. Illumina, IlluminaDx, BaseSpace, BeadArray, BeadXpress, cBot, CSPro, DASL, DesignStudio, Eco, GAIIx, Genetic.

Slides:



Advertisements
Similar presentations
Next-Generation Sequencing: Methodology and Application
Advertisements

Mo17 shotgun project Goal: sequence Mo17 gene space with inexpensive new technologies Datasets in progress: Four-phases of 454-FLX sequencing to max of.
© 2011 Illumina, Inc. All rights reserved. Illumina, illuminaDx, BaseSpace, BeadArray, BeadXpress, cBot, CSPro, DASL, DesignStudio, Eco, GAIIx, Genetic.
Genome Biology for Programmers Lecture Series: Illumina Sequencing
16S sequencing for microbiome studies Nicola Segata and Nick Loman
The Past, Present, and Future of DNA Sequencing
Next–generation DNA sequencing technologies – theory & practice
Metabarcoding 16S RNA targeted sequencing
SOLiD Sequencing & Data
Next Generation Sequencing, Assembly, and Alignment Methods
Next-generation sequencing and PBRC. Next Generation Sequencer Applications DeNovo Sequencing Resequencing, Comparative Genomics Global SNP Analysis Gene.
Greg Phillips Veterinary Microbiology
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
High Throughput Sequencing
CS 6293 Advanced Topics: Current Bioinformatics
11 © 2009 PerkinElmer © 2010 PerkinElmer November 20, 2012 DNA Services Overview.
Zachary Bendiks. Jonathan Eisen  UC Davis Genome Center  Lab focus: “Our work focuses on genomic basis for the origin of novelty in microorganisms (how.
Metagenomics Binning and Machine Learning
© 2013 Illumina, Inc. All rights reserved. Illumina, IlluminaDx, BaseSpace, BeadArray, BeadXpress, cBot, CSPro, DASL, DesignStudio, Eco, GAIIx, Genetic.
Whole Exome Sequencing for Variant Discovery and Prioritisation
Metagenomic Analysis Using MEGAN4
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
ARC Biotechnology Platform: Sequencing for Game Genomics Dr Jasper Rees
Library Preparation Application dependant, using standard molecular biological techniques. Fragment library oligo kit: (per library)$35 GeneAmp dNTP blend:
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Introduction to next generation sequencing Rolf Sommer Kaas.
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
CSIU Submission of BLAST jobs via the Galaxy Interface Rob Quick Open Science Grid – Operations Area Coordinator Indiana University.
Biodiversity initiative: Integrating Taxonomy, Genomics and Biodiversity ++ = ????? Speaker: Benjamin Linard Alfried Vogler Team.
The Changing Face of Sequencing
The iPlant Collaborative
De Novo Genome Assembly - Introduction Henrik Lantz - BILS/SciLife/Uppsala University.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
SEQUENCING – THE BENCHTOPS. Roche 454 Junior Same technology as 454 FLX Read length: 400 bases Paired-end 100,000 reads 12 hours (instrument time) Output.
Genomics Core Facility at UNH: High-Throughput Sequencing on the Illumina HiSeq 2500 Platform Project Consultation Sample Submission Library Creation Illumina.
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
Anna Shcherbina Bioinformatics Challenge Day 01/10/2013 De novo assembly from clinical sample This work is sponsored by the Defense Threat Reduction Agency.
Accurate estimation of microbial communities using 16S tags
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
From: Duggan et.al. Nature Genetics 21:10-14, 1999 Microarray-Based Assays (The Basics) Each feature or “spot” represents a specific expressed gene (mRNA).
1 Sample Multiplexing © 2007 Illumina, Inc. Illumina, Sentrix, Array of Arrays, BeadArray, DASL, Infinium, GoldenGate, BeadXpress, VeraCode,
A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.
© 2011 Illumina, Inc. All rights reserved. Illumina, illuminaDx, BeadArray, BeadXpress, cBot, CSPro, DASL, Eco, Genetic Energy, GAIIx, Genome Analyzer,
From Reads to Results Exome-seq analysis at CCBR
An Overview of Applications for the MiSeq and HiSeq 2500 April 4, 2016 Kevin Shianna, Ph.D. Sequencing Specialist - Illumina, Inc. MGC USERS GROUP.
© 2010 Illumina, Inc. All rights reserved. Illumina, illuminaDx, Solexa, Making Sense Out of Life, Oligator, Sentrix, GoldenGate, GoldenGate Indexing,
16S rRNA Experimental Design
Metagenomic Species Diversity.
Short Read Sequencing Analysis Workshop
Next generation sequencing
Metagenomics: From Bench to Data Analysis 19-23rd September S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr.
Cancer Genomics Core Lab
Tools and Services Workshop
Preprocessing Data Rob Schmieder.
Joslynn Lee – Data Science Educator
Metagenomics Rob Edwards.
Metagenomic assembly Cedric Notredame
Research in Computational Molecular Biology , Vol (2008)
Teagasc/APC Sequencing Facility
H = -Σpi log2 pi.
Metagenomics Microbial community DNA extraction
Maximize read usage through mapping strategies
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

© 2013 Illumina, Inc. All rights reserved. Illumina, IlluminaDx, BaseSpace, BeadArray, BeadXpress, cBot, CSPro, DASL, DesignStudio, Eco, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera, NuPCR, SeqMonitor, Solexa, TruSeq, TruSight, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners. Advancing metagenomics with Illumina sequencing technology Anthony J. Cox Computational Biology Group Illumina Cambridge Ltd. 14 th April 2014

2 Challenge: achieving a seamless end-to- end workflow for metagenomics Case study: Eagle Creek Reservoir –16S workflow on MiSeq –Shotgun metagenomics on NextSeq Challenge: efficient storage and access for metagenomic data Contents

3 MiSeq HiSeq Gb | 25M | 2x Gb | 4B | 2x125 NextSeq HiSeq X Ten 120Gb | 400M | 2x Gb | 6B | 2x150 Decreasing Price Per Gb Increasing System Output Expanded sequencing portfolio

4 Sample Prep Sequencing Analysis Integration Streamlined end-to-end solution Industry’s leading NGS instruments Storage, Processing, Analysis & Collaboration Suite of DNA, RNA & Targeted Solutions

5 Assessing seasonal blooms of Cyanobacteria (blue-green algae) in drinking water that can impact water quality. Collaboration with Center for Earth and Environmental Science, IUPUI 49 reservoir samples collected in different months, at discrete depths. Study combines 16S analysis on MiSeq with shotgun metagenomics on NextSeq Case study: Eagle Creek reservoir, Indiana By courtesy of: Nicolas Clercin (IUPUI), Rob Schmeider, Brian Steffy, Clotilde Teiling, Kameran Wong (Illumina)

6 2Q113Q114Q111Q122Q123Q124Q121Q132Q133Q134Q13 MiSeq – continuous performance improvements Delivering on promise of 15Gb+, 2x300 bp reads Output - Gb Faster chemistry Dual surface imaging Faster chemistry Dual surface imaging Since launch: 10x increase in output 7x decrease in price per data point Since launch: 10x increase in output 7x decrease in price per data point New v3 reagent kits 150 & 600-cycle *Prices reflect US List only

7 Workflow overview 16S rRNA Sequencing was done on 27 of the samples Primer pair sequences for V3 and V4 region create a simple 460 bp long amplicon. Nextera XT indexing kit for 96 samples in parallel 100,000 reads per sample if using all 96 indexes. Comparative genomics Phylogenetic classification Genomic DNA extraction Sample Prep V3–V4 region Amplification Library Prep MiSeq & Primary Analysis Secondary Analysis The Meta-G-Nome™ DNA Isolation Kit is used to isolate inhibitor-free, fosmid cloning-ready DNA from unculturable or difficult-to-culture microbial species present in environmental water, soil, or compost samples.

8 16S metagenomics on BaseSpace

9 Can run on-instrument using MiSeq Reporter or in cloud with BaseSpace Both analysis pipelines use the same classification algorithm and taxonomic database. –The classification algorithm is a high performance implementation of the published RDP Naïve Bayesian Classifier ( –The database is an Illumina-curated version of the GreenGenes Consortium 16S rRNA database. Redundant sequences and entries with missing or partial labels are removed. Provides fast, high-accuracy species-level taxonomic classifications Uses full length of Illumina paired-end reads Outputs: PDF reports, raw data (CSV), interactive visualizations Taxonomic classification

10 Examples of 16S workflow output PCA plot of normalized relative abundance of samples Clustering dendrogram

11 NextSeq innovations Consumables Load-and-go flowcell High or medium output Ships dry All-in-one reagent tray RFID-tagged, ships frozen All-in-one buffer tray Ships at room temperature Chemistry 2-dye sequencing chemistry comparable quality to 4-dye Isothermal amplification No chiller on instrument Optimized reagent consumption Optics Solid state optics Leverages advances in consumer products No alignment needed Fluidics Eliminated fluidic tubes less dead volume, waste, contamination Automatic post-run wash protocols Bleach step eliminates carry-over Simultaneous chemistry & imaging chemistry in one lane while imaging other pair

12 Sample Extraction Library Prep NextSeq Sequencing Analysis Shotgun metagenomics on NextSeq: workflow overview 11 samples sequenced in 1 NextSeq run 400 million 2×150bp read pairs generated in 29 hours 78.8% of bases exceeded Q30 Analysis done with MG-RAST

13 Seasonal variation in composition at bottom of lake 25 th July Actinobacteria 33% 23 rd May Actinobacteria = 76% 23 rd October Actinobacteria=79% Ongoing challenge: what should be our data analysis pipeline for shotgun metagenomic data, e.g. on BaseSpace? Several standalone apps for taxonomic classification Seem to be fewer options for functional classification

14 HiSeq 1 terabase run (R&D data) Per run you can do up to: −10 genomes −150 exomes −80 WT RNA samples *Assumes 100Gb, 30x genome; Nextera Rapid Capture Exome; 50M reads per RNA sample 2 x 125 Cycles

15 Challenge: efficient storage and access for shotgun metagenomic data Resequencing data (Human genome build ~160 Gbp, ~400 Gbyte FASTQ) FASTQ (gzipped) 150 Gbyte BAM (40 Q-scores) 120 Gbyte BAM (8 Q-scores) 82 Gbyte BAM (consensus compressed) 60 Gbyte CRAM (consensus compressed) 27 Gbyte Relies heavily on known high-quality reference sequence Resequencing data (Human genome build 145Gbp, ~160 Gbp, ~400 Gbyte FASTQ) FASTQ (gzipped, 8 Q-scores) 89 Gbyte BWT compression (now) 37Gbyte BWT compression (likely achievable) 23 Gbyte 89  37Gbyte: BWT/PPM for reads, simple binning of Q-scores (lossless) Sort reads for better compression – save 4Gbyte (Cox et al., 2012) Discard uninformative Q-scores (reference free) – save 10Gbyte (Janin et al., 2012)

16 Trading compression for searchability Resequencing data (Human genome build ~165 Gbp) FASTQ (gzipped) 152 Gbyte BWT (searchable) 105 Gbyte NB: 40 Q-scores, both FASTQ and BWT would be smaller for 8 Q-scores For a query sequence q, returns: Full FASTQ record (sequence, Q-scores, read names) for all reads containing q … and full FASTQ record of their read pairs Pipe search output directly to your favourite tool, e.g. Velvet Applications: “In silico pull-down” Assembling breakpoints Genotyping complex variants by tracking k-mers Reads (BWT) :26 Gbyte Q-scores (razip):64 Gbyte Read names (razip):15 Gbyte Further info: beetl.github.io/BEETL/, Janin et al. (2014, submitted)

17 Thank you!

18 Extra slides

19 Moleculo Technology Enables Synthetic Long Reads Up to 10Kb from Illumina short reads Synthetic long reads 8 – 10kb Enables fully phased genomes Accurate de novo assembly of large, complex genomes Synthetic long reads 8 – 10kb Enables fully phased genomes Accurate de novo assembly of large, complex genomes Available: Illumina services 2H13 Kit format early 2014 Available: Illumina services 2H13 Kit format early 2014

20 BaseSpace: Plug and Play Genomic Cloud Solution All you need is an internet connection

21 How Is BaseSpace Being Used World Wide? Users & Growth Bioinformatics Cloud Computing Service Illumina Begins Streaming MiSeq Data to the Cloud October 2011 Illumina Begins Data Sharing in the Cloud December 2011 Illumina Begins Streaming HiSeq Data to the Cloud November 2012 BaseSpace Commercial (Supported) Release May 2013 Over 20,000 Instrument Runs Streamed to BaseSpace December 2012 Over 40,000 Instrument Runs Streamed to BaseSpace April 2013 General Availability of BaseSpace to all HiSeq instruments July 2013 Over 60,000 Instrument Runs Streamed to BaseSpace, and Over 10,000 Apps Run September 2013