Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.

Slides:



Advertisements
Similar presentations
“Tracking Immune Biomarkers and the Human Gut Microbiome: Inflammation, Crohn's Disease, and Colon Cancer” USC Monthly Seminar Series Physical Sciences.
Advertisements

16S sequencing for microbiome studies Nicola Segata and Nick Loman
Clostridium difficile Colitis or Dysbiosis. Symbiostasis/Dysbiosis.
Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
Use of the genomic data o Reconstruction of metabolic properties o Nature’s Microbiome o NGS in Population Genetics.
Jeff Dangl, UNC Chapel Hill Phil Hugenholtz, Susannah Tringe, JGI Ruth Ley, Cornell Rhizosphere Grand Challenge Pilot Project Scott Clingenpeel Project.
Collinearity. Symptoms of collinearity Collinearity between independent variables – High r 2 High vif of variables in model Variables significant in simple.
Transcriptomics Breakout. Topics Discussed Transcriptomics Applications and Challenges For Each Systems Biology Project –Host and Pathogen Bacteria Viruses.
10/17/071 Read: Ch. 15, GSF Comparing Ecological Communities Part Two: Ordination.
The NIH Human Microbiome Project
1 (Student’s) T Distribution. 2 Z vs. T Many applications involve making conclusions about an unknown mean . Because a second unknown, , is present,
The Microbiome and Metagenomics
Introduction to metagenomics Agnieszka S. Juncker Center for Biological Sequence Analysis Technical University of Denmark.
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
Photo courtesy of James Cook What is a Disease suppressive soil? “Take-all decline”: well-characterized example Take-all of wheat caused by Gaeumanomyces.
A MOLECULAR APPROACH TO INVESTIGATE TUBERCULOSIS CASES IN A GOTHIC POPULATION FROM GHERĂSENI NECROPOLIS, BUZĂU COUNTY 1 Molecular Biology Center, Interdisciplinary.
“Mapping the Human Gut Microbiome in Health and Disease Using Sequencing, Supercomputing, and Data Analysis” Invited Talk Delivered by Mehrdad Yazdani,
Discovery of new biomarkers as indicators of watershed health and water quality Anamaria Crisan & Mike Peabody.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
Species  OTUs  OPUs  Species  OTUs  OPUs. Rosselló-Mora & Amann 2001, FEMS Rev. 25:39-67 Taxa circumscription depends on the observable characters.
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
The NIH Roadmap and the Human Microbiome Project Francis S. Collins, M.D., Ph.D. National Human Genome Research Institute April 22, 2007.
Aim The aim of this study was to gain insight into the microbial diversity of the stool of infants with intestinal failure due to surgical resection from.
Lecture Science & Entertainment Exchange National Academy of Sciences Los Angeles June 13, 2013 Dr. Larry Smarr Director, California Institute for Telecommunications.
Accurate estimation of microbial communities using 16S tags
No reference available
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples James Robert White, Niranjan Nagaranjan, Mihai Pop PLoS.
Canadian Bioinformatics Workshops
Shotgun sequencing reveals transkingdom alterations in immunodeficiency associated enteropathy Xiaoxi Dong (Oregon State University), Jialu Hu (Oregon.
A Different Paradigm to Detect Differential Abundance of Taxa in Microbiome Data Mateen Shaikh and Joseph Beyene McMaster University December
Convenience Sample of 4 Adults and 6 Infants. Adults 4 visits over 2 weeks; infants 2 visits over 2 weeks Adult specimens: 1) plaque (by method, teeth,
Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness Patric D. Schloss and Jo Handelsman Department.
Valentin Vasselon 1, Agnès Bouchez 1, Isabelle Domaizon 1, Maria Kahlert 2, Frédéric Rimet 1 Towards standardization of DNA extraction for next- generation.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Metagenomic Species Diversity.
Volume 18, Issue 4, Pages (October 2015)
VISUALIZING COMPLEX BACTERIAL POPULATIONS IN ANIMAL MODELS
H = -Σpi log2 pi.
Taxonomic composition of subway microbial communities.
Inflammatory Bowel Disease as a Model for Translating the Microbiome
Volume 20, Issue 5, Pages (November 2014)
A Microbiome Foundation for the Study of Crohn’s Disease
Genetic Determinants of the Gut Microbiome in UK Twins
Conducting a Microbiome Study
(a) PCoA of the abundance of unique OTUs per sample from the 16S marker gene sequencing data from the AGP data repository (small spheres) and the San Diego.
Putative MBTA microbial community sources.
Volume 10, Issue 4, Pages (October 2011)
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Volume 19, Issue 3, Pages (March 2016)
Example usage of mockrobiota MC resource for marker gene and metagenome sequencing pipelines. Example usage of mockrobiota MC resource for marker gene.
The similar shifts of gut microbiota in IBD across cohorts.
Volume 22, Issue 3, Pages e4 (September 2017)
The Inuit microbiome has a community composition similar to that of the Western microbiome. The Inuit microbiome has a community composition similar to.
Volume 20, Issue 5, Pages (November 2014)
Volume 17, Issue 3, Pages (March 2015)
A Microbiome Foundation for the Study of Crohn’s Disease
Aboveground and belowground samples showed differences in their bacterial community structures and compositions, while bulk soil and root communities differed.
Microbial composition of mother and infant samples and shared bacteria within mother-infant pairs. Microbial composition of mother and infant samples and.
Comparison of biological replicates for Ery-treated samples.
Gut Microbiome Studies
Fig. 2 Host genetics explains core microbiome composition with heritable microbes serving as hubs within the microbial interaction networks. Host genetics.
Bacterial composition of olive fermentations is affected by microbial inoculation. Bacterial composition of olive fermentations is affected by microbial.
Inferring microbiome networks using graphical models
Toward Accurate and Quantitative Comparative Metagenomics
Differential Expression of RNA-Seq Data
Variations in beta and alpha diversity of gut microbiome bacterial communities in relation to presence of Blastocystis. Variations in beta and alpha diversity.
Variations in beta and alpha diversity of gut microbiome eukaryotic communities explained by presence of Blastocystis. Variations in beta and alpha diversity.
Presentation transcript:

Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte

Can we fairly compare high and low biomass samples? VS.

Low abundance samples are inherently challenging to survey Less abundant Hanna et al - Comparison of culture and molecular techniques for microbial community characterization in infected necrotizing pancreatitis - J. Surgical Research

Clearly the sequencing of negative controls should be part of all of our pipelines..

Can we fairly compare samples with different numbers of sequences? VS.

16S rRNA experiments are always compositional and often sparse Compositional – because different samples have different numbers of sequences Sparse – because there are many zeros in the spreadsheet SAMPLESSAMPLES OTUs

Compositionality is a well-studied problem in statistics, but remains challenging

Compositionality can introduce subtle artifacts into our dataset Relative abundance Problems include: Inference may report a change in A and B even though biologically A and B have not changed. The estimate of A and B is dependent on C. If C is contaminant (or rRNA in a RNA-seq experiment), the values of A and B might not be appropriate. A and B will appear correlated, but this is a statistical artifact.

The correlation issue has been considered by multiple groups…

The compositional nature of 16S rRNA data has led to controversies over analysis pipelines…

Notice that in all the above examples, the ratio of B/A is always 2 irrespective of what happens with taxa C = 10 / / / / 1015 ==2 Normalization schemes can take advantage of working in ratio space Relative abundance

Cells in the spreadsheet with few counts are largely structured by sequencing depth Source: Gevers et al. - The Treatment-Naive Microbiome in New-Onset Crohn’s Disease - Cell Host Microbe 2014

Ordination without normalization leads to dependency of sequencing depth… log Log10 (number of sequences) Bray-Curtis distance

No normalization scheme eliminates the dependency of sequencing depth

No normalization scheme eliminates compositional dependencies Bioinformatics pipelines for 16S rRNA might consider explicitly tracking the number of sequences per samples as a potential confounder…

Sequencing depth can be correlated with input variables of interest… Log10 (number of sequences) NMDS 1 Theta YC distance Difference in number of sequences Source: Baxter et al. - Structure of the gut microbiome following colonization with human feces determines colonic tumor burden - Microbiome 2014

Log10 (number of sequences) Theta YC distance NMDS 1 Difference in number of sequences Difference in number of sequences Difference in number of sequences Difference in number of sequences Difference in number of sequences Difference in number of sequences Difference in number of sequences Different normalization schemes can have very different consequences for inference..

No normalization scheme eliminates compositional dependencies (although some do better than others!). Bioinformatics pipelines for 16S rRNA should explicitly track number of sequences per samples as a potential confounding variable. Just as no one statistical test is appropriate for inference, there is likely no one normalization scheme that will be appropriate for all datasets. Conclusions

Raad Z. Gharaibeh (We thank Dirk Gevers for providing a parsable OTU table for the Risk data)

Cells in the spreadsheet with few counts are largely structured by sequencing depth Source: Gevers et al. - The Treatment-Naive Microbiome in New-Onset Crohn’s Disease - Cell Host Microbe 2014

In any experiment confounding variables can complicate inference..