CuratedMetagenomicData: curated taxonomic and functional profiles for thousands of human-associated microbiomes Microbiome working group seminar Dec 1,

Slides:



Advertisements
Similar presentations
16S sequencing for microbiome studies Nicola Segata and Nick Loman
Advertisements

Use of the genomic data o Reconstruction of metabolic properties o Nature’s Microbiome o NGS in Population Genetics.
Metabarcoding 16S RNA targeted sequencing
Sahar Abubucker, Nicola Segata,
The NIH Human Microbiome Project
UniProt - The Universal Protein Resource
The Microbiome and Metagenomics
Introduction to metagenomics Agnieszka S. Juncker Center for Biological Sequence Analysis Technical University of Denmark.
“Mapping the Human Gut Microbiome in Health and Disease Using Sequencing, Supercomputing, and Data Analysis” Invited Talk Delivered by Mehrdad Yazdani,
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Identify gene markers for different taxonomic groups in Archaea and Bacteria Genomes Dongying Wu 1,2, Jonathan A. Eisen 1,2 1. DOE Joint Genome Institute,
“Comparative Human Microbiome Analysis” Remote Video Talk to CICESE Big Data, Big Network Workshop Ensenada, Mexico October 10, 2013 Dr. Larry Smarr Director,
Finish up array applications Move on to proteomics Protein microarrays.
“Living in a Microbial World” Global Health Program Council on Foreign Relations New York, NY April 10, 2014 Dr. Larry Smarr Director, California Institute.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Tsute (George) Chen Bioinformatics Core Department of Microbiology The Forsyth Institute March 24 th, 2015 HOMD A Tour to the Data and Tools.
2009 IADR, MIAMI, FL, USA Hands-on Experience for using the Human Oral Microbiome Database (HOMD) 2009 IADR Workshop, Miami, FL, USA Tsute (George) Chen.
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
The Microbiome and Metagenomics
Metagenomics at Second Genome
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
No reference available
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
tracking microbes at the strain level
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Tools for microbial community analysis. What I am not going to talk  Culture dependent analysis  Isolate all possible colonies  Infer community  Test.
Functional profiling with HUMAnN2
TIPP: Taxonomic Identification And Phylogenetic Profiling
Canadian Bioinformatics Workshops
Metagenomic Species Diversity.
Introduction to Bioinformatics
Seminar in Bioinformatics (236818)
Considerations for metagenomics data analysis and summary of workflows
Strain profiling with StrainPhlAn and PanPhlAn
Metagenomic assembly Cedric Notredame
Research in Computational Molecular Biology , Vol (2008)
Taxonomic profiling with MetaPhlAn2
Identifying personal microbiomes using metagenomic codes
Systematic Characterization and Analysis of the Taxonomic Drivers of Functional Shifts in the Human Microbiome  Ohad Manor, Elhanan Borenstein  Cell Host.
Taxonomic profiling with MetaPhlAn2
Strain profiling with StrainPhlAn
Microbiome: 16S rRNA Sequencing
VISUALIZING COMPLEX BACTERIAL POPULATIONS IN ANIMAL MODELS
Figure 1 Experimental design
Human Gut Microbiome: Function Matters
H = -Σpi log2 pi.
Metagenomics Microbial community DNA extraction
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Microbiome: Metagenomics
Volume 20, Issue 5, Pages (November 2014)
Volume 21, Issue 8, Pages (August 2014)
Microbiome studies for microbial disease pathogenesis research
Dissemination of the mcr-1 colistin resistance gene
Skin Microbiome Surveys Are Strongly Influenced by Experimental Design
Example usage of mockrobiota MC resource for marker gene and metagenome sequencing pipelines. Example usage of mockrobiota MC resource for marker gene.
HPC for large NGS data: Microbial diversity analysis
Volume 20, Issue 5, Pages (November 2014)
Community diversity and metagenome depth interact to influence assembly quality. Community diversity and metagenome depth interact to influence assembly.
A typical current computational meta'omic pipeline to analyze and contrast microbial communities. A typical current computational meta'omic pipeline to.
A Presentation by Regina Strelecki
Research Techniques Made Simple: Profiling the Skin Microbiota
Microbial composition of mother and infant samples and shared bacteria within mother-infant pairs. Microbial composition of mother and infant samples and.
The future TB clinic. The future TB clinic. Sputum, saliva, and fecal samples are collected from patients for genome sequencing, SNP typing, or metagenomic.
Toward Accurate and Quantitative Comparative Metagenomics
General overview of the bioinformatic pipelines for the 16S rRNA gene microbial profiling and shotgun metagenomics. General overview of the bioinformatic.
Fig. 3 Postnatal assembly of the humanized gut microbiota.
Presentation transcript:

curatedMetagenomicData: curated taxonomic and functional profiles for thousands of human-associated microbiomes Microbiome working group seminar Dec 1, 2016 Levi Waldron

Motivation Metagenomic sequencing data publicly available but hard to use fastq files from NCBI, EBI, ... bioinformatic expertise computational resources manual curation Wanted to make data easy to use for epidemiologists, biostatisticians, biologists, ...

Sequencing as a Tool for Microbial Community Analysis 16S rRNA sequencing Whole-metagenome shotgun sequencing Pros taxonomy to species and even strain viruses and fungi gene variants, e.g. ABX resistance use of many marker genes is less susceptible to biases more direct + precise functional inference Cons expensive – probably no multiplexing contamination from human DNA big data (before processing) Pros cheap (multiplex hundreds of samples) relatively small data provides genus-level taxonomy and inferred metabolic function for bacteria and archaea Cons taxonomy reliable only to genus level indirect inference of metabolic function use of a single marker gene is susceptible to biases

Taxonomy for WMS: MetaPhlAn2 GATTACATAG Samples Microbes Relative abundances More than 100x speedup over other accurate methods for WMS taxonomic assignment Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C: Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 2012, 9:811–814. Truong DT, Franzosa EA, Tickle TL, Scholz M, Weingart G, Pasolli E, Tett A, Huttenhower C, Segata N: MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 2015, 12:902–903.

Metabolic function for WMS: HUMAnN2 Community functional profiling Databases of genomes, genes, and pathways UniRef database provides gene family definitions MetaCyc pathway definitions by gene family MinPath to identify the set of minimum pathways DNA and translated searches Abubucker S, Segata N, Goll J, Schubert AM, Izard J, Cantarel BL, Rodriguez-Mueller B, Zucker J, Thiagarajan M, Henrissat B, White O, Kelley ST, Methé B, Schloss PD, Gevers D, Mitreva M, Huttenhower C: Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput. Biol. 2012, 8:e1002358.

curatedMetagenomicData pipeline Raw fastq files 13 datasets 2,875 samples Study metadata Age, body site, disease, etc… Differential abundance Diversity metrics Clustering Machine learning Convenience download functions Megabytes-sized datasets Download (~25TB) Uniform processing MetaPhlAn2 HUMAnN2 Manual curation Automatic documentation ExperimentHub product Amazon S3 cloud distribution Tag-based searching Dataset snapshot dates Automatic local caching standardized metadata species abundance marker presence gene family abundance marker abundance metabolic pathway abundance metabolic pathway presence https://upload.wikimedia.org/wikipedia/commons/thumb/7/7e/Funnel_Mech.svg/667px-Funnel_Mech.svg.png https://pixabay.com/en/cheering-happy-jumping-people-297419/ Integrated Bioconductor ExpressionSet objects Per-patient microbiome data Per-patient metadata Experiment-wide metadata Integration User experience Offline high computing pipeline > 500 kH CPU, 75TB disk requirements

Automatic documentation Link to manual

curated*Data Bioconductor packages curatedMetagenomicData curatedOvarianData 30 datasets, > 3K unique samples most annotated for OS, surgical debulking, histology... curatedCRCData 34 datasets, ~4K unique samples many annotated for MSS, gender, stage, age, N, M curatedBladderData 12 datasets, ~1,200 unique samples many annotated for stage, grade, OS

The Cancer Genome Atlas 50 platforms The Cancer Genome Atlas 36 diseases 19 data types Figure credit: Marcel Ramos

MultiAssayExperiment Integrative multi-omics data representation and management for Bioconductor https://bioconductor.org/packages/MultiAssayExperiment Provide pre-packaged objects for all of TCGA http://tinyurl.com/MAEOurls

Thank you Lab (www.waldronlab.org / www.waldronlab.github.io) Lucas Schiffer Marcel Ramos, Lavanya Kannan, Hanish Kodali, Rimsha Azar, Carmen Rodriguez, Audrey Renson Collaborators Nicola Segata, Edoardo Pasolli (University of Trento, Italy) Valerie Obenchain, Martin Morgan (Bioconductor core team) CUNY High-performance Computing Center Statistical Learning Book Club: Join us remotely, Fridays at 10am Currently reading “Data Analysis for the Life Sciences” by Irizarry and Love http://tinyurl.com/huw8cb5 met Jin Xu from East China Normal University, Shanghai

Datasets Total of 13 datasets with 2,875 samples Dataset Samples Citation HMP_2012 749 Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012). KarlssonFH_2013 145 Karlsson, F. H. et al. Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013). LeChatelierE_2013 292 Le Chatelier, E. et al. Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546 (2013). LomanNJ_2013_Hi 44 Loman, N. J. et al. A culture-independent sequence-based metagenomics approach to the investigation of an outbreak of Shiga-toxigenic Escherichia coli O104:H4. JAMA 309, 1502–1510 (2013). LomanNJ_2013_Mi 9 NielsenHB_2014 396 Nielsen, H. B. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014). Obregon_TitoAJ_2015 58 Obregon-Tito, A. J. et al. Subsistence strategies in traditional societies distinguish gut microbiomes. Nat Commun 6, 6505 (2015). OhJ_2014 291 Oh, J. et al. Biogeography and individuality shape function in the human skin metagenome. Nature 514, 59–64 (2014). QinJ_2012 363 Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012). QinN_2014 237 Qin, N. et al. Alterations of the human gut microbiome in liver cirrhosis. Nature 513, 59–64 (2014). RampelliS_2015 38 Rampelli, S. et al. Metagenome Sequencing of the Hadza Hunter-Gatherer Gut Microbiota. Curr. Biol. 25, 1682–1693 (2015). TettAJ_2016 97 Ferretti, P. et al. Experimental metagenomics and ribosomal profiling of the human skin microbiome. Exp. Dermatol. (2016). doi:10.1111/exd.13210 ZellerG_2014 156 Zeller, G. et al. Potential of fecal microbiota for early-stage detection of colorectal cancer. Mol. Syst. Biol. 10, 766 (2014). Total of 13 datasets with 2,875 samples