Taxonomic profiling with MetaPhlAn2

Slides:



Advertisements
Similar presentations
Meta’omic functional profiling with HUMAnN Curtis Huttenhower Harvard School of Public Health Department of Biostatistics U. Oregon META Center.
Advertisements

Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization I Sharon, MJ Morowitz,
Metabarcoding 16S RNA targeted sequencing
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
The Microbiome and Metagenomics
Microbial Genomes Features Analysis Role of high-throughput sequencing Yeast - the eukaryotic model microbe Databases –TIGR CMR –NCBI Microbial Genomes.
Molecular Microbial Ecology
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Probes can be designed in an evolutionary hierarchy.
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
Identify gene markers for different taxonomic groups in Archaea and Bacteria Genomes Dongying Wu 1,2, Jonathan A. Eisen 1,2 1. DOE Joint Genome Institute,
Meta’omic functional profiling with ShortBRED Galeb Abu-Ali Curtis Huttenhower Harvard T.H. Chan School of Public Health Department of Biostatistics.
TIPP: Taxon Identification and Phylogenetic Profiling Tandy Warnow The Department of Computer Science.
Metagenomics at Second Genome
Download all complete prokaryotic genomes from the NCBI RefSeq database Extract 16S rRNA sequences from each genome. Use UCLUST algorithm to cluster 16S.
Meta’omic functional profiling with ShortBRED Curtis Huttenhower Harvard School of Public Health Department of Biostatistics U. Oregon.
Accurate estimation of microbial communities using 16S tags
A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.
Canadian Bioinformatics Workshops
tracking microbes at the strain level
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
An Introduction to Meta’omic Analyses Curtis Huttenhower Galeb Abu-Ali Eric Franzosa Harvard T.H. Chan School of Public Health Department of Biostatistics.
Functional profiling with HUMAnN2
Using the bioBakery Curtis Huttenhower
Bacterial Comparative Genomics
TIPP: Taxonomic Identification And Phylogenetic Profiling
Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments Xinjun Zhang.
CuratedMetagenomicData: curated taxonomic and functional profiles for thousands of human-associated microbiomes Microbiome working group seminar Dec 1,
Canadian Bioinformatics Workshops
Metagenomic Species Diversity.
The Integrated Microbial Genome (IMG) systems
Research Paper on BioInformatics
Metagenomics: From Bench to Data Analysis 19-23rd September S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr.
Genomes and their evolution
Preprocessing Data Rob Schmieder.
Quality Control & Preprocessing of Metagenomic Data
Strain profiling with StrainPhlAn and PanPhlAn
Genomic Data Integration
The Integrated Microbial Genome (IMG) systems
Basics of Comparative Genomics
An Introduction to Meta’omic Analyses
Metagenomic assembly Cedric Notredame
Genomes and Their Evolution
Unraveling the microbial profile of the rhizosphere of SDS-suppressive soils in Soybean fields Ali Y. Srour1, Jason Bond1, Leonor Leandro2, Dean Malvick3.
Functional profiling with HUMAnN2
Taxonomic profiling with MetaPhlAn2
Identifying personal microbiomes using metagenomic codes
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Genomes and Their Evolution
Genome Annotation Continued
Metagenomics Image: Iverson et al. 2012, Science.
Strain profiling with StrainPhlAn
Curtis Huttenhower Galeb Abu-Ali Eric Franzosa
H = -Σpi log2 pi.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Metagenomics Microbial community DNA extraction
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Part II: Potential Genetic Privacy Risks
Volume 20, Issue 5, Pages (November 2014)
Evolution of Genomes Chapter 21.
Basics of Comparative Genomics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Volume 20, Issue 5, Pages (November 2014)
A typical current computational meta'omic pipeline to analyze and contrast microbial communities. A typical current computational meta'omic pipeline to.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Volume 26, Issue 3, Pages e5 (September 2019)
Overview of Shotgun Sequence Analysis
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Taxonomic profiling with MetaPhlAn2 Curtis Huttenhower Galeb Abu-Ali Eric Franzosa 08-12-16 Harvard T.H. Chan School of Public Health Department of Biostatistics

The two big questions of microbial community analysis... What are they doing? Who is there?

Taxonomic profiling: who’s there http://huttenhower.sph.harvard.edu/metaphlan2

Efficient assembly-free meta’omics by leveraging isolates II III IV V I II III IV V II III II I IV I I II III II V IV V V Species pan-genomes 7,677 containing 18.6 million gene clusters Core genes Marker genes NCBI isolate genomes Archaea 300 Bacteria 12,926 Viruses 3,565 Eukaryota 112 Open reading frames 49.0 million total genes Nicola Segata RepoPhlAn ChocoPhlAn http://www.metaref.org

MetaPhlAn2: metagenomic taxonomic profiling http://huttenhower.sph.harvard.edu/metaphlan2 X is a unique marker gene for clade Y Gene X ~1M most representative markers used for identification 184±45 markers per species (target 200) ~7,100 species (excludes incomplete annotations, spp., etc.) False positive/False negative rates of ~1 in 106 Profiles all domains of life: bacteria, viruses, euks, archaea Strain level profiling using marker barcodes and SNPs Quasi-markers used to resolve ambiguity in postprocessing Nicola’s taken advantage of this catalog for several computational methods, but the one I’d like to talk about today relies on identifying high-quality taxonomically unique marker sequences guaranteed to arise from exactly one microbial clade. By organizing the gene catalog of IMG into groups of gene families – not orthologous families, but highly nucleotide-similar sequences – we can identify gene families that are core to one or more clades. This means that the gene’s conserved throughout the clade, although it may appear elsewhere due to conservation or horizontal gene transfer. Core genes are thus a superset of unique marker genes, which are both core to a clade and unique there – they never appear elsewhere, even by horizontal transfer. Nicola’s developed a system called ChocoPhlAn, which I’m pretty sure is an acronym for something, that identifies all genes core or unique for any clade within IMG. This results in a high-quality set of about two million unique markers, with uniqueness verified by whole-genome BLAST against the entire database. About 400 thousand of these proved sufficient to uniquely identify all 1200 species in the database, plus several hundred higher-level clades, with several hundred markers for most organisms.

Per-species abundance by robust averaging Abundance-sorted pan-gene families Coverage Multi-copy genes Plateau of genes from one metagenome’s strain Absent genes

Meta-analysis of metagenomic taxonomic profiles Waldron and Segata: meta-analysis of >2,400 gut metagenomes. Available as an R package. Allows systematic tests of phenotypes across datasets, or health vs. disease.