Canadian Bioinformatics Workshops

Slides:



Advertisements
Similar presentations
16S sequencing for microbiome studies Nicola Segata and Nick Loman
Advertisements

Metabarcoding 16S RNA targeted sequencing
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
The Microbiome and Metagenomics
Metagenomics Binning and Machine Learning
Discussion on Metagenomic Data for ANGUS Course Adina Howe.
Molecular Microbial Ecology
Discovery of new biomarkers as indicators of watershed health and water quality Anamaria Crisan & Mike Peabody.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Technological Solutions. In 1977 Sanger et al. were able to work out the complete nucleotide sequence in a virus – (Phage 0X174) This breakthrough allowed.
Genomics – Next-Gen sequencing and Microarrays
Probes can be designed in an evolutionary hierarchy.
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
RNAseq analyses -- methods
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
Accurate estimation of microbial communities using 16S tags
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
tracking microbes at the strain level
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Canadian Bioinformatics Workshops
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Discussion on Genomic/Metagenomic Data for ANGUS Course Adina Howe.
Date of download: 7/7/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A) DNA.
Robert Edgar Independent scientist
16S rRNA Experimental Design
Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments Xinjun Zhang.
16S RNA sequencing analysis
Rob Edwards San Diego State University
Polymerase Chain Reaction
Presented By: Emily Lamoureux
Metagenomic Species Diversity.
Metagenomics: From Bench to Data Analysis 19-23rd September S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr.
Merja Oja, Jaakko Peltonen, Sami Kaski University of Helsinki and
Peter Sterk EBI Metagenomics Course 2014
Presented By: Chinua Umoja
Gene expression from RNA-Seq
Metagenomic assembly Cedric Notredame
Research in Computational Molecular Biology , Vol (2008)
No evidence for extensive horizontal gene transfer in the genome of the tardigrade Hypsibius dujardini.
Unraveling the microbial profile of the rhizosphere of SDS-suppressive soils in Soybean fields Ali Y. Srour1, Jason Bond1, Leonor Leandro2, Dean Malvick3.
The African Soil Microbiology project
COURSE OF MICROBIOLOGY
The Indispensible Forensic Science Tool
Taxonomic profiling with MetaPhlAn2
Metagenomics Image: Iverson et al. 2012, Science.
Taxonomic profiling with MetaPhlAn2
H = -Σpi log2 pi.
Volume 20, Issue 5, Pages (November 2014)
Volume 21, Issue 8, Pages (August 2014)
3.1 Genes Essential idea: Every living organism inherits a blueprint for life from its parents. Genes and hence genetic information is inherited from.
Microbiome studies for microbial disease pathogenesis research
Skin Microbiome Surveys Are Strongly Influenced by Experimental Design
Taxonomic identification and phylogenetic profiling
History of DNA Fingerprinting
Example usage of mockrobiota MC resource for marker gene and metagenome sequencing pipelines. Example usage of mockrobiota MC resource for marker gene.
High-Throughput Identification and Quantification of Candida Species Using High Resolution Derivative Melt Analysis of Panfungal Amplicons  Tasneem Mandviwala,
Example of amplicon performance in our presented workflow.
BF nd (Next) Generation Sequencing
Volume 20, Issue 5, Pages (November 2014)
A typical current computational meta'omic pipeline to analyze and contrast microbial communities. A typical current computational meta'omic pipeline to.
Research Techniques Made Simple: Profiling the Skin Microbiota
Microbial composition of mother and infant samples and shared bacteria within mother-infant pairs. Microbial composition of mother and infant samples and.
Overview of Shotgun Sequence Analysis
Toward Accurate and Quantitative Comparative Metagenomics
Relative abundance of taxa in the 16S rRNA PCR amplicon and gDNA mock communities. Relative abundance of taxa in the 16S rRNA PCR amplicon and gDNA mock.
Presentation transcript:

Canadian Bioinformatics Workshops www.bioinformatics.ca

Module #: Title of Module 2

Module 4 Metagenomic Taxonomic Composition Morgan Langille Analysis of Metagenomic Data June 22-24, 2016

Learning Objectives of Module Contrast 16S and metagenomic sequencing Be able to describe several approaches for taxonomic composition of a metagenomics sample Be able to run Metaphlan2 on one or more samples Be able to determine statistically significant differences in taxonomic abundance across sample groups using STAMP

16S vs Metagenomics 16S is targeted sequencing of a single gene which acts as a marker for identification Pros Well established Sequencing costs are relatively cheap (~50,000 reads/sample) Only amplifies what you want (no host contamination) Cons Primer choice can bias results towards certain organisms Usually not enough resolution to identify to the strain level Need different primers usually for archaea & eukaryotes (18S) Doesn’t identify viruses Consequtive basepairs

16S vs Metagenomics Metagenomics: sequencing all the DNA in a sample Pros No primer bias Can identify all microbes (euks, viruses, etc.) Provides functional information (“What are they doing?”) Cons More expensive (millions of sequences needed) Host/site contamination can be significant May not be able to sequence “rare” microbes Complex bioinformatics

Who is there? Taxonomic Profiles

Metagenomics: Who is there? Goal: Identify the relative abundance of different microbes in a sample given using metagenomics Problems: Reads are all mixed together Reads can be short (~100bp) Lateral gene transfer Two broad approaches Binning Based Marker Based

Binning Based Attempts to group or “bin” reads into the genome from which they originated Composition-based Uses sequence composition such as GC%, k-mers (e.g. Naïve Bayes Classifier) Fast Sequence-based Compare reads to large reference database using BLAST (or some other similarity search method) Reads are assigned based on “Best-hit” or “Lowest Common Ancestor” approach

LCA: Lowest Common Ancestor Use all BLAST hits above a threshold and assign taxonomy at the lowest level in the tree which covers these taxa. Notable Examples: MEGAN: http://ab.inf.uni-tuebingen.de/software/megan5/ One of the first metagenomic tools Does functional profiling too! MG-RAST: https://metagenomics.anl.gov/ Web-based pipeline (might need to wait awhile for results) Kraken: https://ccb.jhu.edu/software/kraken/ Fastest binning approach to date and very accurate. Large computing requirements (e.g. >128GB RAM)

Marker Based Single Gene Multiple Gene Identify and extract reads hitting a single marker gene (e.g. 16S, cpn60, or other “universal” genes) Use existing bioinformatics pipeline (e.g. QIIME, etc.) Multiple Gene Several universal genes PhyloSift (Darling et al, 2014) Uses 37 universal single-copy genes Clade specific markers MetaPhlAn2 (Truong et al., 2015)

Marker or Binning? Binning approaches Marker approaches Similarity search is computationally intensive Varying genome sizes and LGT can bias results Marker approaches Doesn’t allow functions to be linked directly to organisms Genome reconstruction/assembly is not possible Dependent on choice of markers

Why MetaPhlAn? Fast (marker database is considerably smaller) Markers for bacteria, archaea, eukaryotes, and viruses (since MetaPhlAn2 was released) Being continuously updated and supported Used by the Human Microbiome Project Generally accepted as a robust method for taxonomy assignment Main Disadvantage: not all reads are assigned a taxonomic label

MetaPhlAn Uses “clade-specific” gene markers A clade represents a set of genomes that can be as broad as a phylum or as specific as a species Uses ~1 million markers derived from 17,000 genomes ~13,500 bacterial and archaeal, ~3,500 viral, and ~110 eukaryotic Can identify down to the species level (and possibly even strain level) Can handle millions of reads on a standard computer within a few minutes

MetaPhlAn Marker Selection

MetaPhlAn Marker Selection

Using MetaPhlan MetaPhlan uses Bowtie2 for sequence similarity searching (nucleotide sequences vs. nucleotide database) Paired-end data can be used directly (but are treated as independent reads) Each sample is processed individually and then multiple sample can be combined together at the last step Output is relative abundances at different taxonomic levels

Absolute vs. Relative Abundance Absolute abundance: Numbers represent real abundance of thing being measured (e.g. the actual quantity of a particular gene or organism) Relative abundance: Numbers represent proportion of thing being measured within sample In almost all cases microbiome studies are measuring relative abundance This is due to DNA amplification during sequencing library preparation not being quantitative

Relative Abundance Use Case Sample A: Has 108 bacterial cells (but we don’t know this from sequencing) 25% of the microbiome from this sample is classified as Shigella Sample B: Has 106 bacterial cells (but we don’t know this from sequencing) 50% of the microbiome from this sample is classified as Shigella “Sample B contains twice as much Shigella as Sample A” WRONG! (If quantified it we would find Sample A has more Shigella) “Sample B contains a greater proportion of Shigella compared to Sample A” Correct!

Visualization and statistics What is important? Visualization and statistics

Visualization and Statistics Various tools are available to determine statistically significant taxonomic differences across groups of samples Excel SigmaPlot R MeV (MultiExperiment Viewer) Python (matplotlib) LefSe & Graphlan (Huttenhower Group) STAMP

Visualization and Statistics Various tools are available to determine statistically significant taxonomic differences across groups of samples Excel SigmaPlot Past R (many libraries) Python (matplotlib) STAMP

STAMP

STAMP Plots

STAMP Input “Profile file”: Table of features (samples by OTUs, samples by functions, etc.) Features can form a heirarchy (e.g. Phylum, Order, Class, etc) to allow data to be collapsed within the program “Group file”: Contains different metadata for grouping samples Can be two groups: (e.g. Healthy vs Sick) or multiple groups (e.g. Water depth at 2M, 4M, and 6M) Output PCA, heatmap, box, and bar plots Tables of significantly different features

Metagenomics Workflow Putting it all together Metagenomics Workflow

Microbiome Helper Microbiome Helper is an open resource aimed at helping researchers process and analyze their microbiome data Combines bioinformatic methods from different groups and is updated as newer and better methods are released Scripts to wrap and integrate existing tools Available as an Ubuntu Virtualbox Tutorials/Walkthroughs https://github.com/mlangill/microbiome_helper/wiki

Microbiome Helper Wiki

Microbiome Helper Vbox

Integrated Microbiome Resource Sequencing and Bioinformatics Support Integrated Microbiome Resource

IMR: Sequencing and bioinformatics service for microbiome projects http://cgeb-imr.ca

(16S / 18S amplicons on the Illumina MiSeq) Microbiome Amplicon Sequencing Workflow (16S / 18S amplicons on the Illumina MiSeq) DNA extraction 16S (V6-V8) or 18S (V4) PCR Gel verification PCR clean-up & library normalization Illumina MiSeq sequencing Method/kit appropriate to specific samples (ex: stool, urine, etc.) Time = 1-3 d QC Duplicate with template dilutions Multiplexing to 384 samples/run Only 1 PCR w/fusion primers: i5 index F primer R i7 P5 adapter P7 adapter 16S/18S sequence Time = 0.5 d Invitrogen E-gel 96-well high-throughput method QC Time = 1 h Invitrogen SequalPrep 96-well high-throughput method Time = 1.5 h QC 300+300 bp paired-end reads ~25 M reads = ~15 Gb ~65 k reads/sample (for 384) Time = ~3 d QC Quality-control check/step Total Time = 1 week CGEB-IMR.ca • DalhousieU • March 2015

IMR Achievements

IMR Achievements

IMR Collaborators & Clients

Pricing

Questions?