Scalable metabolic reconstruction for metagenomic data and the human microbiome Sahar Abubucker, Nicola Segata, Johannes Goll, Alyxandria Schubert, Beltran.

Slides:



Advertisements
Similar presentations
Journal Club Jenny Gu October 24, Introduction Defining the subset of Superfamilies in LUCA Examine adaptability and expansion of particular superfamilies.
Advertisements

The Human Microbiome in Health and Disease Curtis Huttenhower Harvard School of Public Health Department of Biostatistics.
Use of the genomic data o Reconstruction of metabolic properties o Nature’s Microbiome o NGS in Population Genetics.
Computational Analysis of the Taxanomical Classification of Short 16S rRNA Sequences Christel Chehoud Mentor: Brian Haas.
Computational Methodology for Microbial and Metagenomic Characterization using Large Scale Functional Genomic Data Integration Curtis Huttenhower
Scalable data mining for functional genomics and metagenomics
Scalable data mining for functional genomics and metagenomics Curtis Huttenhower Harvard School of Public Health Department of Biostatistics.
Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,
Large scale functional data mining: What can we find in the data we have? Curtis Huttenhower Harvard School of Public Health Department of Biostatistics.
Sahar Abubucker, Nicola Segata,
The NIH Human Microbiome Project
Computational metagenomics and the human microbiome Curtis Huttenhower Harvard School of Public Health Department of Biostatistics.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
The Microbiome and Metagenomics
Introduction to metagenomics Agnieszka S. Juncker Center for Biological Sequence Analysis Technical University of Denmark.
Unit 1: The Language of Science  communicate and apply scientific information extracted from various sources (3.B)  evaluate models according to their.
Georgia Wiesner, MD CREC June 20, GATACAATGCATCATATG TATCAGATGCAATATATC ATTGTATCATGTATCATG TATCATGTATCATGTATC ATGTATCATGTCTCCAGA TGCTATGGATCTTATGTA.
Metagenomic Analysis Using MEGAN4
Microbial taxonomy and phylogeny
Amino acids are the building blocks of what macromolecule?
Answering biological questions using large genomic data collections Curtis Huttenhower Harvard School of Public Health Department of Biostatistics.
“Mapping the Human Gut Microbiome in Health and Disease Using Sequencing, Supercomputing, and Data Analysis” Invited Talk Delivered by Mehrdad Yazdani,
Discovery of new biomarkers as indicators of watershed health and water quality Anamaria Crisan & Mike Peabody.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
A brief Introduction to Bioinformatics Y. SINGH NELSON R. MANDELA SCHOOL OF MEDICINE DEPARTMENT OF TELEHEALTH Content licensed under.
What must DNA do? 1.Replicate to be passed on to the next generation 2.Store information 3.Undergo mutations to provide genetic diversity.
The Human Microbiome.
Human Microbiome Conference
Charting the function of microbes and microbial communities Curtis Huttenhower Harvard School of Public Health Department of Biostatistics.
The NIH Roadmap and the Human Microbiome Project Francis S. Collins, M.D., Ph.D. National Human Genome Research Institute April 22, 2007.
713 Lecture 15 Host metagenomics. Progression of techniques Culture based –Use phenotypes and genotypes to ID Non-culture based, focused on 16S rDNA –Clone.
Large scale genomic data integration for functional genomics and metagenomics Curtis Huttenhower Harvard School of Public Health Department of.
Meta’omic functional profiling with ShortBRED Galeb Abu-Ali Curtis Huttenhower Harvard T.H. Chan School of Public Health Department of Biostatistics.
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
Meta’omic Analysis with MetaPhlAn, HUMAnN, and LEfSe Curtis Huttenhower Harvard School of Public Health Department of Biostatistics.
“Observing the Dynamics of the Human Immune System Coupled to the Microbiome in Health and Disease” CASIS Workshop on Biomedical Research Aboard the ISS.
2009 IADR, MIAMI, FL, USA Hands-on Experience for using the Human Oral Microbiome Database (HOMD) 2009 IADR Workshop, Miami, FL, USA Tsute (George) Chen.
Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.
Meta’omic functional profiling with ShortBRED Curtis Huttenhower Harvard School of Public Health Department of Biostatistics U. Oregon.
Inflammatory Bowel Diseases November 19, 2007 NCDD Meeting Chair: Daniel K. Podolsky, MD Vice Chair: Eugene B. Chang, MD.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
tracking microbes at the strain level
Chapter 4, 5, 6. Chapter 4: Microscopy, Staining, and Classification.
Bacterial metabolism Assist. Prof. Emrah Ruh NEU Faculty of Medicine
An Introduction to Meta’omic Analyses Curtis Huttenhower Galeb Abu-Ali Eric Franzosa Harvard T.H. Chan School of Public Health Department of Biostatistics.
Functional profiling with HUMAnN2
CuratedMetagenomicData: curated taxonomic and functional profiles for thousands of human-associated microbiomes Microbiome working group seminar Dec 1,
Metagenomic Species Diversity.
The Human Microbiome Project
Strain profiling with StrainPhlAn and PanPhlAn
Omolola C. Betiku1,2. , Carl J. Yeoman2, T. Gibson Gaylord1, Suzanne L
Functional profiling with HUMAnN2
The African Soil Microbiology project
Taxonomic profiling with MetaPhlAn2
Identifying personal microbiomes using metagenomic codes
Systematic Characterization and Analysis of the Taxonomic Drivers of Functional Shifts in the Human Microbiome  Ohad Manor, Elhanan Borenstein  Cell Host.
Taxonomic profiling with MetaPhlAn2
Genomic Data Manipulation
Strain profiling with StrainPhlAn
Fundamental Concepts for Genetics
Volume 10, Issue 3, Pages (September 2011)
Human Gut Microbiome: Function Matters
H = -Σpi log2 pi.
Microbiome studies for microbial disease pathogenesis research
Overview of Genetics.
The Human Microbiome Project in 2011 and Beyond
A typical current computational meta'omic pipeline to analyze and contrast microbial communities. A typical current computational meta'omic pipeline to.
Fundamental Concepts for Genetics
Presentation transcript:

Scalable metabolic reconstruction for metagenomic data and the human microbiome Sahar Abubucker, Nicola Segata, Johannes Goll, Alyxandria Schubert, Beltran Rodriguez-Mueller, Jeremy Zucker, the Human Microbiome Project Metabolic Reconstruction team, the Human Microbiome Consortium, Patrick D. Schloss, Dirk Gevers, Makedonka Mitreva, Curtis Huttenhower Harvard School of Public Health Department of Biostatistics

What’s metagenomics? 2 Total collection of microorganisms within a community Also microbial community or microbiota Total genomic potential of a microbial community Total biomolecular repertoire of a microbial community Study of uncultured microorganisms from the environment, which can include humans or other living hosts

The Human Microbiome Project for a normal population 300 People/ 15(18) Body Sites Multifaceted analyses 2 clin. centers, 4 seq. centers, data generation, technology development, computational tools, ethics… Multifaceted data Human population Microbial population Novel organisms Biotypes Viruses Metabolism >12,000 samples >50M 16S seqs. 4.6Tbp unique metagenomic sequence >1,900 reference genomes Full clinical metadata

15+ Demonstration Projects for microbial communities in disease Gastrointestinal All include additional subjects and technology development Skin Psoriasis Acne Atopic dermatitis Obesity Crohn’s disease Ulcerative colitis Autoimmunity Cancer Necrotizing enterocolitis Urogenital Bacterial vaginosis STDs Reproductive health

What to do with your metagenome? 5 (x10 10 ) Diagnostic or prognostic biomarker for host disease Public health tool monitoring population health and interactions Comprehensive snapshot of microbial ecology and evolution Reservoir of gene and protein functional information Who’s there? What are they doing? What do functional genomic data tell us about microbiomes? What can our microbiomes tell us about us?

Gene expression SNP genotypes Metabolic/Functional Reconstruction: The Goal 6 Healthy/IBD BMI Diet Taxon abundances Enzyme family abundances Pathway abundances Batch effects? Population structure? Niches & Phylogeny Test for correlates Multiple hypothesis correction Feature selection p >> n Confounds/ stratification/ environment Cross- validate Biological story? Independent sample Intervention/ perturbation

HMP: Metabolic reconstruction 7 WGS reads Pathways/ modules Genes (KOs) Pathways (KEGGs) Functional seq. KEGG + MetaCYC CAZy, TCDB, VFDB, MEROPS… BLAST → Genes Genes → Pathways MinPath (Ye 2009) Smoothing Witten-Bell Gap filling c(g) = max( c(g), median ) 300 subjects 1-3 visits/subject ~6 body sites/visit M reads/sample 100bp reads BLAST ? Taxonomic limitation Rem. paths in taxa < ave. Xipe Distinguish zero/low (Rodriguez-Mueller in review) HUMAnN: HMP Unified Metabolic Analysis Network

HMP: Metabolic reconstruction 8 Pathway coveragePathway abundance

HUMAnN: Validating gene and pathway abundances on synthetic data 9 Validated on individual gene families, module coverage, and abundance 4 synthetic communities: Low (20 org.) and high (100 org.) complexity Even and lognormal abundances Few false negatives: short genes (<100bp), taxonomically rare pathways Few false positives: large and multicopy (not many in bacteria) Individual gene families ρ=0.86

Functional modules in 741 HMP samples 10 Coverage Abundance ANO(BM)PFO(SP)SRCRCO(TD) ← Samples → ← Pathways→ Zero microbes (of ~1,000) are core among body sites Zero microbes are core among individuals 19 (of ~220) pathways are present in every sample 53 pathways are present in 90%+ samples Only 31 (of 1,110) pathways are present/absent from exactly one body site 263 pathways are differentially abundant in exactly one body site

A portrait of the human microbiome: Who’s there? 11 With Jacques Izard, Susan Haake, Katherine Lemon

Pathway coverage A portrait of the human microbiome: What are they doing? 12

HMP: How do microbes vary within each body site across the population? 13

HMP: How do body sites compare between individuals across the population? 14

HMP: Penetrance of species (OTUs) across the population 15 Data from Pat Schloss

HMP: Penetrance of genera (phylotypes) across the population 16 Data from Pat Schloss

HMP: Penetrance of pathways across the population 17 KEGG Metabolic modules M00001: Glycolysis (Embden-Meyerhof) M00002: Glycolysis, core module M00003: Gluconeogenesis M00004: Pentose phosphate cycle M00007: Pentose phosphate (non-oxidative) M00049: Adenine biosynthesis M00050: Guanine biosynthesis M00052: Pyrimidine biosynthesis M00053: Pyrimidine deoxyribonuleotide biosynthesis M00120: Coenzyme A biosynthesis M00126: Tetrahydrofolate biosynthesis M00157: F-type ATPase, bacteria M00164: ATP synthase M00178: Ribosome, bacteria M00183: RNA polymerase, bacteria M00260: DNA polymerase III complex, bacteria M00335: Sec (secretion) system M00359: Aminoacyl-tRNA biosynthesis M00360: Aminoacyl-tRNA biosynthesis, prokaryotes M00362: Nucleotide sugar biosynthesis, prokaryotes M00006: Pentose phosphate (oxidative) M00051: Uridine monophosphate biosynthesis M00125: Riboflavin biosynthesis M00008: Entner-Doudoroff pathway M00239: Peptides/nickel transport system M00018: Threonine biosynthesis M00168: CAM (Crassulacean acid metabolism), dark M00167: Reductive pentose phosphate cycle Human microbiome functional structure dictated primarily by microbial niche, not host (in health) Huge variation among hosts in who’s there; small variation in what they’re doing Note: definitely variation in how these functions are implemented Does not yet speak in detail to host environment (diet!), genetics, or disease

Population summary statistics  population biology 18 Posterior fornix, ref. genomes ← Individuals → ← Species → ← Pathways → Lactobacillus iners Lactobacillus crispatus Gardnerella vaginalis Lactobacillus jensenii Lactobacillus gasseri Posterior fornix, functional modules Basic biology, sugar transport Essential amino acids Urea cycle, amines, aromatic AAs

LEfSe: Metagenomic class comparison and explanation 19 LEfSe Nicola Segata LDA + Effect Size

LEfSe: Evaluation on synthetic data 20 Their FP rate Our FP rate

Microbes characteristic of the oral and gut microbiota 21

Aerobic, microaerobic and anaerobic communities High oxygen:skin, nasal Mid oxygen:vaginal, oral Low oxygen:gut

LEfSe: The TRUC murine colitis microbiota 23 With Wendy Garrett

Microbial biomolecular function and biomarkers in the human microbiome: the story so far? Who’s there changes –What they’re doing doesn’t (as much) –How they’re doing it does The data so far only scratch the surface –Only 1/3 to 2/3 of the reads/sample map to cataloged gene families –Only 1/3 to 2/3 of these gene families have cataloged functions –Very much in line with MetaHIT study of gut microbiota –Job security! Looking forward to functional reconstruction… –In environmental communities –With respect to host environment + genetics –With respect to host disease 24

Thanks! 25 Jacques Izard Wendy Garrett Pinaki SarderNicola Segata Levi WaldronLarisa Miropolsky Interested? We’re recruiting graduate students and postdocs! Human Microbiome Project HMP Metabolic Reconstruction George Weinstock Jennifer Wortman Owen White Makedonka Mitreva Erica Sodergren Mihai Pop Vivien Bonazzi Jane Peterson Lita Proctor Sahar Abubucker Yuzhen Ye Beltran Rodriguez-Mueller Jeremy Zucker Qiandong Zeng Mathangi Thiagarajan Brandi Cantarel Maria Rivera Barbara Methe Bill Klimke Daniel Haft Ramnik XavierDirk Gevers Bruce BirrenMark Daly Doyle WardEric Alm Ashlee EarlLisa Cosimi