Toward Accurate and Quantitative Comparative Metagenomics

Slides:



Advertisements
Similar presentations
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Advertisements

Metagenomic Analysis Using MEGAN4
Todd J. Treangen, Steven L. Salzberg
No reference available
Canadian Bioinformatics Workshops
From Reads to Results Exome-seq analysis at CCBR
Metagenomic Species Diversity.
Metagenomics: From Bench to Data Analysis 19-23rd September S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr.
Preprocessing Data Rob Schmieder.
Quality Control & Preprocessing of Metagenomic Data
Gene expression from RNA-Seq
Metagenomic assembly Cedric Notredame
Research in Computational Molecular Biology , Vol (2008)
Design and Analysis of Single-Cell Sequencing Experiments
Volume 18, Issue 4, Pages (October 2015)
Systematic Characterization and Analysis of the Taxonomic Drivers of Functional Shifts in the Human Microbiome  Ohad Manor, Elhanan Borenstein  Cell Host.
Comparative Analysis of Single-Cell RNA Sequencing Methods
Alternative Computational Analysis Shows No Evidence for Nucleosome Enrichment at Repetitive Sequences in Mammalian Spermatozoa  Hélène Royo, Michael Beda.
Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing  Graham Heimberg, Rajat.
2nd (Next) Generation Sequencing
Human Gut Microbiome: Function Matters
Volume 160, Issue 3, Pages (January 2015)
H = -Σpi log2 pi.
Bacterial Size: Can’t Escape the Long Arm of the Law
Convergent Evolution of Rumen Microbiomes in High-Altitude Mammals
Volume 138, Issue 4, Pages (August 2009)
Mapping Human Epigenomes
Volume 137, Issue 2, Pages (August 2009)
Taichi Umeyama, Takashi Ito  Cell Reports 
Retrotransposition and Structural Variation in the Human Genome
Volume 17, Issue 3, Pages (March 2015)
Protein Occupancy Landscape of a Bacterial Genome
Cell-Line Selectivity Improves the Predictive Power of Pharmacogenomic Analyses and Helps Identify NADPH as Biomarker for Ferroptosis Sensitivity  Kenichi.
Sarah T. Arron, Michelle T. Dimon, Zhenghui Li, Michael E
Volume 20, Issue 5, Pages (November 2014)
Effect of protocol modifications.
A Massively Parallel Reporter Assay of 3′ UTR Sequences Identifies In Vivo Rules for mRNA Degradation  Michal Rabani, Lindsey Pieper, Guo-Liang Chew,
Inference of Environmental Factor-Microbe and Microbe-Microbe Associations from Metagenomic Data Using a Hierarchical Bayesian Statistical Model  Yuqing.
Genetic Determinants of the Gut Microbiome in UK Twins
Microbiota and Host Nutrition across Plant and Animal Kingdoms
Conducting a Microbiome Study
Lea Goentoro, Marc W. Kirschner  Molecular Cell 
The Human Condition—A Molecular Approach
Alex M. Plocik, Brenton R. Graveley  Molecular Cell 
Modeling Enzyme Processivity Reveals that RNA-Seq Libraries Are Biased in Characteristic and Correctable Ways  Nathan Archer, Mark D. Walsh, Vahid Shahrezaei,
Volume 14, Issue 7, Pages (February 2016)
Response to the Commentaries on the Paper: Propionibacterium acnes Strain Populations in the Human Skin Microbiome Associated with Acne  Noah Craft, Huiying.
Michal Levin, Tamar Hashimshony, Florian Wagner, Itai Yanai 
Introduction to Sequencing
Single-Molecule Sequencing: Towards Clinical Applications
High-Throughput Identification and Quantification of Candida Species Using High Resolution Derivative Melt Analysis of Panfungal Amplicons  Tasneem Mandviwala,
Volume 20, Issue 6, Pages (December 2016)
Gautam Dey, Tobias Meyer  Cell Systems 
Baekgyu Kim, Kyowon Jeong, V. Narry Kim  Molecular Cell 
Genome Sequences from Extinct Relatives
An Invitation to the Marriage of Metagenomics and Metabolomics
Volume 20, Issue 5, Pages (November 2014)
Volume 147, Issue 2, Pages (October 2011)
The Cell Biology of Genomes: Bringing the Double Helix to Life
Volume 7, Issue 2, Pages (August 2010)
Modeling Enzyme Processivity Reveals that RNA-Seq Libraries Are Biased in Characteristic and Correctable Ways  Nathan Archer, Mark D. Walsh, Vahid Shahrezaei,
Andrew L. Goodman, Jeffrey I. Gordon  Cell Metabolism 
Microbial composition of mother and infant samples and shared bacteria within mother-infant pairs. Microbial composition of mother and infant samples and.
Comparison of species and function profiles with ultradeep sequencing data. Comparison of species and function profiles with ultradeep sequencing data.
Taichi Umeyama, Takashi Ito  Cell Reports 
Volume 25, Issue 14, Pages R611-R613 (July 2015)
Volume 11, Issue 7, Pages (May 2015)
Microbial Skin Inhabitants: Friends Forever
Variations in beta and alpha diversity of gut microbiome eukaryotic communities explained by presence of Blastocystis. Variations in beta and alpha diversity.
General overview of the bioinformatic pipelines for the 16S rRNA gene microbial profiling and shotgun metagenomics. General overview of the bioinformatic.
Presentation transcript:

Toward Accurate and Quantitative Comparative Metagenomics Stephen Nayfach, Katherine S. Pollard  Cell  Volume 166, Issue 5, Pages 1103-1116 (August 2016) DOI: 10.1016/j.cell.2016.08.007 Copyright © 2016 Elsevier Inc. Terms and Conditions

Figure 1 Challenges Associated with Estimating the Composition of a Microbial Community from Shotgun DNA Sequencing (A) A sample from a microbial community composed of four different microbial species. Colored cells (blue, red, green) indicate “known” species that have at least one genome sequence in reference databases. The green cell indicates a species that is rare within the microbial community. DNA contamination includes DNA from the host, laboratory environment, or experimental reagents. (B) DNA is extracted from the microbial cells in the sample. Extraction efficiency varies for different taxa, depending on the experimental protocol. The amount of DNA extracted per cell depends on growth rate—actively dividing cells yield more genomic DNA, which accumulates at the origin of replication. (C) Extracted DNA is broken into fragments by mechanical or enzymatic methods. Certain sequences are more likely to be breakpoints. (D) A library is prepared from DNA fragments and sequenced. DNA fragments with high or low GC% are under-represented in the sequencing reads. Typically millions of short (e.g., 150 bp) reads are generated per sample. (E) Bioinformatics quality-control steps may be performed to eliminate duplicate reads, trim low-quality bases from read ends, and remove reads from contamination sources or with low-quality scores. (F) To infer the composition of the microbial community, high-quality reads are either compared to reference sequences or assembled de novo. Reference-based classification cannot account for unknown species and overestimates the abundances of known species. Metagenomic assembly may not detect rare species and overestimates abundance of abundant species. Cell 2016 166, 1103-1116DOI: (10.1016/j.cell.2016.08.007) Copyright © 2016 Elsevier Inc. Terms and Conditions

Figure 2 Parameters Used for Taxonomic and Functional Profiling When computing the abundance of taxa and genes, it is important to think about what parameter of the underlying community one wishes to quantify. (A) A community with ten cells composed of three taxa with different subsets of four different gene families (colored arrows). Two cellular abundance parameters and four gene abundance parameters are defined by examples. (B) A comparison of gene relative abundance, average genomic copy number, and absolute abundance across three communities (top, middle, and bottom). The red gene is present at one copy per cell and has constant absolute abundance in all communities, but its relative abundance decreases with increasing genome size. The copy number of the blue gene increases with genome size, but its relative abundance is constant. Cell 2016 166, 1103-1116DOI: (10.1016/j.cell.2016.08.007) Copyright © 2016 Elsevier Inc. Terms and Conditions

Figure 3 Differences in Functional Profiles due to Read Length, Library Size, and Quality Control Are Small Compared to Biological Variation Publicly available metagenomes often differ in their library sizes, read lengths, and quality-control measures, which leads one to ask, how comparable are metagenomes from different studies? Twenty-six human gut metagenomes of varying quality were processed using different quality-control methods, and the resulting reads were used to estimate the relative abundance of KEGG Orthology Groups (KOs). We compared the variation introduced by these factors (top) with the variation observed between a large set of technical (N = 1,474), biological (N = 144), and non-replicate gut metagenomes (N = 179) from the Human Microbiome Project (Consortium, 2012) that contained at least one million reads (bottom). Trimming reads from their 5′ ends was done to simulate libraries of different read length; downsampling metagenomes by 95% was done to simulate libraries of different size; fastq-mcf (Aronesty, 2011) was used for de-duplication and quality filtering. To estimate the average genomic copy number of functional groups, reads were mapped to the integrated catalog of reference genes in the human gut microbiome (Li et al., 2014a, 2014b) using bowtie2 (Langmead and Salzberg, 2012) and normalized by the median coverage of 30 universal single-copy genes (Wu et al., 2013). The percent variation between two metagenomes was measured by the following: (1) taking the sum of absolute deviations across KOs, (2) dividing this by the total abundance of KOs in both metagenomes, and (3) multiplying this by 100. Cell 2016 166, 1103-1116DOI: (10.1016/j.cell.2016.08.007) Copyright © 2016 Elsevier Inc. Terms and Conditions

Figure 4 The Presence of Duplicated Reads Is Largely a Function of Library Size and Microbial Diversity FASTQC was used to estimate the percent of duplicated reads across 181 human gut metagenomes from the Human Microbiome Project and compared to (A) library size and (B) species-level alpha diversity using the Shannon diversity index (Keylock, 2005). Species abundance of bacteria and archaea was estimated with mOTU (Sunagawa et al., 2013). Together, library size and Shannon diversity explain 63% of the variation in sequence duplication rates. Cell 2016 166, 1103-1116DOI: (10.1016/j.cell.2016.08.007) Copyright © 2016 Elsevier Inc. Terms and Conditions

Figure 5 Growth of Shotgun Metagenome Data in the NCBI Sequence Read Archive Cumulative size in terabases of publicly available shotgun metagenomic data in the NCBI Sequence Read Archive (SRA). Sequencing runs were identified using the SRAdb database (Zhu et al., 2013) by the following: library_source = “METAGENOMIC,” study_type = “Metagenomics,” and library_strategy = “WGS.” Cell 2016 166, 1103-1116DOI: (10.1016/j.cell.2016.08.007) Copyright © 2016 Elsevier Inc. Terms and Conditions