Metagenomics: From Bench to Data Analysis 19-23rd September 2016 16S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr.

Slides:



Advertisements
Similar presentations
16S sequencing for microbiome studies Nicola Segata and Nick Loman
Advertisements

Clostridium difficile Colitis or Dysbiosis. Symbiostasis/Dysbiosis.
 Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina  Prokaryotic profiling › De novo genome sequencing › Metagenomics › SNP profiling › Species.
Use of the genomic data o Reconstruction of metabolic properties o Nature’s Microbiome o NGS in Population Genetics.
Next-generation sequencing
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Zachary Bendiks. Jonathan Eisen  UC Davis Genome Center  Lab focus: “Our work focuses on genomic basis for the origin of novelty in microorganisms (how.
Molecular Microbial Ecology
Ji-hye Choi August Introduction (2006) ABRF-NGS (the Association fo Biomolecular Resource Facilities next-generation sequencing study)
Discovery of new biomarkers as indicators of watershed health and water quality Anamaria Crisan & Mike Peabody.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Introduction to next generation sequencing Rolf Sommer Kaas.
Species  OTUs  OPUs  Species  OTUs  OPUs. Rosselló-Mora & Amann 2001, FEMS Rev. 25:39-67 Taxa circumscription depends on the observable characters.
Probes can be designed in an evolutionary hierarchy.
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
 16S rRNA gene marker  intra-gene variability  primer selection  size & information content Primer selection, information content, alignment and length.
The iPlant Collaborative
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
Elucidating factors behind pair wise distances discrepancies between short and near full-length sequences. We hypothesized that since the 16S rRNA molecule.
Anna Shcherbina Bioinformatics Challenge Day 01/10/2013 De novo assembly from clinical sample This work is sponsored by the Defense Threat Reduction Agency.
Census of Marine Life (CoML) / Sloan Foundation
Accurate estimation of microbial communities using 16S tags
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Canadian Bioinformatics Workshops
Presented by Samuel Chapman. Pyrosequencing-Intro The core idea behind pyrosequencing is that it utilizes the process of complementary DNA extension on.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Computational Characterization of Short Environmental DNA Fragments Jens Stoye 1, Lutz Krause 1, Robert A. Edwards 2, Forest Rohwer 2, Naryttza N. Diaz.
Date of download: 7/7/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A) DNA.
Tools for microbial community analysis. What I am not going to talk  Culture dependent analysis  Isolate all possible colonies  Infer community  Test.
New method developments in pollen DNA barcoding for forensic applications Karen Leanne Bell, Emory University Homeland Security Symposium Series, University.
16S rRNA Experimental Design
Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments Xinjun Zhang.
16S RNA sequencing analysis
Interpreting exomes and genomes: a beginner’s guide
Canadian Bioinformatics Workshops
Presented By: Emily Lamoureux
Metagenomic Species Diversity.
Introduction to Bioinformatics Resources for DNA Barcoding
Lesson: Sequence processing
RNA Quantitation from RNAseq Data
Topics to be covered Basics of PCR
Micelle PCR reduces artifact formation in 16S microbiota profiling
PNAS 2012 Alpha diversity: how many species are in each sample?
Gene expression from RNA-Seq
Metagenomic assembly Cedric Notredame
Research in Computational Molecular Biology , Vol (2008)
The African Soil Microbiology project
Very important to know the difference between the trees!
Denaturing Gradient Gel Electrophoresis
Workshop on the analysis of microbial sequence data using ARB
Teagasc/APC Sequencing Facility
Systematic Characterization and Analysis of the Taxonomic Drivers of Functional Shifts in the Human Microbiome  Ohad Manor, Elhanan Borenstein  Cell Host.
Taxonomic profiling with MetaPhlAn2
Microbiome: 16S rRNA Sequencing
H = -Σpi log2 pi.
Summary and Recommendations
Molecular Diagnosis of Autosomal Dominant Polycystic Kidney Disease Using Next- Generation Sequencing  Adrian Y. Tan, Alber Michaeel, Genyan Liu, Olivier.
Accurate Sample Assignment in a Multiplexed, Ultrasensitive, High-Throughput Sequencing Assay for Minimal Residual Disease  Jack Bartram, Edward Mountjoy,
Volume 17, Issue 3, Pages (March 2015)
Volume 21, Issue 8, Pages (August 2014)
Skin Microbiome Surveys Are Strongly Influenced by Experimental Design
Volume 10, Issue 4, Pages (October 2011)
Single-Molecule Sequencing: Towards Clinical Applications
Presented by Jacob Miller
Summary and Recommendations
Sequence Analysis - RNA-Seq 2
Genome resolved metagenomics
Development of a Novel Next-Generation Sequencing Assay for Carrier Screening in Old Order Amish and Mennonite Populations of Pennsylvania  Erin L. Crowgey,
Toward Accurate and Quantitative Comparative Metagenomics
Host-Associated Quantitative Abundance Profiling Reveals the Microbial Load Variation of Root Microbiome  Xiaoxuan Guo, Xiaoning Zhang, Yuan Qin, Yong-Xin.
Presentation transcript:

Metagenomics: From Bench to Data Analysis 19-23rd September 2016 16S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr Mark Alston Computational Biologist Organisms and Ecosystems Group mark.alston@earlham.ac.uk

Outline Compare sequencing platforms and 16S rRNA regions Amplicon choice amplicons vs. full-length rRNA sequencing Bias and quantification Comparison to WGS approaches

16S Microbial Community Profiling 16S rRNA gene sequence conserved (green) and hypervariable (blue) regions Most common phylogenetic marker ‘gold standard’ in molecular surveys of bacterial and archaeal diversity Pros ubiquitous, highly conserved, evolutionarily stable Cons often multiple copy, little resolution at/below species level

Comparing Different Platforms and Target Regions ‘A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling’ DOI: 10.1186/s12864-015-2194-9 Compare sequencing platforms MiSeq (Illumina), Pacific Biosciences RSII 454 GS-FLX/+ (Roche) IonTorrent (Life Technologies) Compare target regions Assess performance via synthetic microbial communities mix gDNA from 49 bacterial and 10 archaeal species even / uneven distribution Summary of primers and platforms used

Ability of Different Platforms and Regions to Reconstruct the Synthetic Community Even synthetic community Platform had a significant effect Species’ frequencies highly unbalanced Possible causes primer mismatches rRNA copy number amplification bias (associated with target length) Bacterial Species Target Region

How do Different rRNA Regions reflect Composition? ‘Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities’ DOI:10.1111/1462-2920.12086 Synthetic Bacteria community Heat map represents accuracy ratio Perfect agreement has value of 1 underestimated abundance overestimated abundance

How do Different rRNA Regions reflect Composition? ‘Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities’ DOI:10.1111/1462-2920.12086 Synthetic Bacteria community Heat map represents accuracy ratio Perfect agreement has value of 1 underestimated abundance overestimated abundance Regions suffer from substantial bias

Which Region Should I Choose? 16S rRNA gene sequence conserved (green) and hypervariable (blue) regions Most common approach V4, V3–V4 or V4–V5 primers on Illumina platforms ~ 250–430 bp read length e.g. 16S for V4 on MiSeq http://www.illumina.com/content/dam/illumina-marketing/documents/products/appnotes/appnote_miseq_16S.pdf

Full-length vs. Amplicon 16S Sequencing Factors affecting taxon abundance estimates and tree-placement Sequencing platform, primer choice, read length, environmental source, reference database, assignment method [or a combination] New technologies short reads sequence ~15-30 % of the full 16S rRNA gene more quantitative information reduced taxonomic resolution species level assignment can be elusive implications for inferring metabolic traits in various ecosystems

Full-length vs. Amplicon 16S Sequencing Factors affecting taxon abundance estimates and tree-placement Sequencing platform, primer choice, read length, environmental source, reference database, assignment method [or a combination] New technologies short reads sequence ~15-30 % of the full 16S rRNA gene more quantitative information reduced taxonomic resolution species level assignment can be elusive implications for inferring metabolic traits in various ecosystems Use full-length 16S rRNA sequencing?

Full-length 16S rRNA Sequencing PacBio long-read, single-molecule real-time (SMRT) technology average read lengths > 8 kb at ~ 87% read accuracy only been used for a few environmental surveys ‘High-resolution phylogenetic microbial community profiling’ DOI: 0.1038/ismej.2015.24 MinION™ USB stick-sized device per-base sequencing accuracy ~85% for 2D reads additional read length helps resolve 16S rRNA to species level ‘Species level resolution of 16S rRNA gene amplicons sequenced through MinIONTM portable nanopore sequencer’ DOI: 10.1186/s13742-016-0111-z

Full-length 16S rRNA Sequencing PacBio long-read, single-molecule real-time (SMRT) technology average read lengths > 8 kb at ~ 87% read accuracy only been used for a few environmental surveys ‘High-resolution phylogenetic microbial community profiling’ DOI: 0.1038/ismej.2015.24 MinION™ USB stick-sized device per-base sequencing accuracy ~85% for 2D reads additional read length helps resolve 16S rRNA to species level ‘Species level resolution of 16S rRNA gene amplicons sequenced through MinIONTM portable nanopore sequencer’ DOI: 10.1186/s13742-016-0111-z

Full-length 16S rRNA Sequencing and Gene Variability non-homogeneous distribution of mutations varies across different phylogenetic groups leads to both over- and underestimation of community diversity

Full-length 16S rRNA Sequencing and Gene Variability non-homogeneous distribution of mutations varies across different phylogenetic groups leads to both over- and underestimation of community diversity

Full-length 16S rRNA Sequencing and Gene Variability non-homogeneous distribution of mutations varies across different phylogenetic groups leads to both over- and underestimation of community diversity 2 Salmonella spp. 97.4% identical across gene 100% identical across V4 region Underestimate community diversity

Full-length 16S rRNA Sequencing and Gene Variability non-homogeneous distribution of mutations varies across different phylogenetic groups leads to both over- and underestimation of community diversity Mutations accumulated in V4 region Overestimate community diversity

Compare FL vs. V4 [Sakinaw lake samples] Community composition profile at genus level Colour pairs denote samples of the same depth Bubble sizes indicate read abundance

Compare FL vs. V4 [Sakinaw lake samples] BUT it looks possible to make the same conclusions because there’s a lot of stuff in common! FL vs. V4 discrepancies highlighted by boxes e.g. Bacillus greatly underrepresented by V4 c.f. PB [50m samples] ‘High-resolution phylogenetic microbial community profiling’ DOI: 0.1038/ismej.2015.24

Platforms and Regions Suffer from Substantial Bias The observed relative frequencies do not reflect the true species frequencies in the community

Platforms and Regions Suffer from Substantial Bias The observed relative frequencies do not reflect the true species frequencies in the community

Platforms and Regions Suffer from Substantial Bias The observed relative frequencies do not reflect the true species frequencies in the community But, the observed differences between samples could still reflect true differences Can we have a quantitative method despite the bias?

Can 16S rRNA Sequencing be Quantitative? ‘A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling’ DOI: 10.1186/s12864-015-2194-9 Assembled 2 synthetic communities one with even distribution, one uneven Take pairs of samples Sequence on MiSeq and PacBio platforms

Can 16S rRNA Sequencing be Quantitative? Compare for each species true ratio of frequencies [known mixtures] and observed ratio of frequencies Highly significant correlation between the two ratios [blue line] and a slope of 1 [red line] MiSeq Ratio of Observed Freq. PacBio Ratio of True Freq.

Can 16S rRNA Sequencing be Quantitative? Compare for each species true ratio of frequencies [known mixtures] and observed ratio of frequencies Highly significant correlation between the two ratios [blue line] and a slope of 1 [red line] Implies 16S rRNA sequencing is strongly quantitative despite being biased MiSeq more quantitative than PacBio MiSeq Ratio of Observed Freq. PacBio Ratio of True Freq.

MiSeq more quantitative than PacBio Species responsible for this difference? Which are more accurately quantified on one platform relative to the other? MiSeq Ratio of Observed Freq. PacBio Ratio of True Freq.

MiSeq vs. PacBio Species with significantly different quantification accuracies:

MiSeq vs. PacBio Species with significantly different quantification accuracies: MiSeq the better platform

MiSeq vs. PacBio Species with significantly different quantification accuracies: MiSeq the better platform Except for strain resolution Full-length 16S rRNA sequencing of benefit Shewanella baltica OS223 Shewanella baltica OS185

16S Microbial Community Profiling 16S rRNA gene sequence conserved (green) and hypervariable (blue) regions Most common approach V4, V3–V4 or V4–V5 primers on Illumina platforms ~ 250–430 bp read length Economy of scale single MiSeq run > 10 million reads High base-calling accuracy e.g. 16S for V4 on MiSeq http://www.illumina.com/content/dam/illumina-marketing/documents/products/appnotes/appnote_miseq_16S.pdf

Compare Error Rates Across Platforms Even synthetic community Platform had a significant effect MiSeq has the most accurate sequence reads

Impact of Overlapping Reads on MiSeq V4 Error Rates Even synthetic community Overlapping forward and reverse reads greatly reduces errors MiSeq Dual Index barcode Illumina barcodes on both reads ‘stitched’ reads

Shotgun Metagenomics vs. Amplicon Sequencing ‘Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities’ DOI: 10.1111/1462-2920.12086 Compare amplicon sequencing to Illumina [HiSeq] and 454 metagenomics sequencing

Shotgun Metagenomics vs. Amplicon Sequencing ‘Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities’ DOI: 10.1111/1462-2920.12086 Compare amplicon sequencing to Illumina [HiSeq] and 454 metagenomics sequencing Metagenomic data tends to outperform amplicon sequencing

Shotgun Metagenomics vs. Amplicon Sequencing ‘A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling’ DOI: 10.1186/s12864-015-2194-9 MiSeq MG sample expected Metagenome sample benchmark should be relatively unbiased as fewer PCR amplification steps in library construction WGS gives the most accurate species estimations

Is 16S “Metagenomics” ? Many papers talk about “metagenomics analysis based on microbial 16S rRNA gene sequencing” “16S metagenomic studies” etc. But rRNA surveys focus on a single gene, not genomes Is this due to a fear of not getting funded if you don’t include a word containing ‘Meta*omics’? “Referring to 16S surveys as metagenomics is misleading and annoying #badomics #OmicMimicry” http://phylogenomics.blogspot.co.uk/2012/08/referring-to-16s-surveys-as.html

In Summary Many sources of bias when we sequence 16S rRNA e.g. platform, region etc. Can still be a quantitative MiSeq V4 a good ‘all round bet’ prior knowledge of taxa may suggest otherwise combinations of primers? full-length for strain resolution Whole genome shotgun better estimations of species abundances

Metagenomics: From Bench to Data Analysis 19-23rd September 2016 Thank You for Listening Dr Mark Alston Computational Biologist Organisms and Ecosystems Group mark.alston@earlham.ac.uk