MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res. 2007.

Slides:



Advertisements
Similar presentations
 Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina  Prokaryotic profiling › De novo genome sequencing › Metagenomics › SNP profiling › Species.
Advertisements

Metabarcoding 16S RNA targeted sequencing
Variant discovery Different approaches: With or without a reference? With a reference – Limiting factors are CPU time and memory required – Crossbow –
Bioinformatics and Phylogenetic Analysis
High performance computational analysis of DNA sequences from different environments Rob Edwards Computer Science Biology edwards.sdsu.eduwww.theseed.org.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Project Proposals Due Monday Feb. 12 Two Parts: Background—describe the question Why is it important and interesting? What is already known about it? Proposed.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Supplementary material Figure S1. Cumulative histogram of the fitness of the pairwise alignments of random generated ESSs. In order to assess the statistical.
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
The Microbiome and Metagenomics
Metagenomics Binning and Machine Learning
Metagenomic Analysis Using MEGAN4
Molecular Microbial Ecology
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Metagenomic Analysis Using MEGAN?
Identify gene markers for different taxonomic groups in Archaea and Bacteria Genomes Dongying Wu 1,2, Jonathan A. Eisen 1,2 1. DOE Joint Genome Institute,
Construction of Substitution Matrices
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
Serghei Mangul Department of Computer Science Georgia State University Joint work with Irina Astrovskaya, Marius Nicolae, Bassam Tork, Ion Mandoiu and.
CompostBin : A DNA composition based metagenomic binning algorithm Sourav Chatterji *, Ichitaro Yamazaki, Zhaojun Bai and Jonathan Eisen UC Davis
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Nothing in (computational) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970) Comparative genomics, genome context.
Accurate estimation of microbial communities using 16S tags
Construction of Substitution matrices
What is BLAST? Basic BLAST search What is BLAST?
Metagenomic dataset preprocessing – data reduction
A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.
Environmental Genome Shotgun Sequencing of the Sargasso Sea Venter et. al (2004) Presented by Ken Vittayarukskul Steven S. White.
Shruthi Prabhakara, Raj Acharya Department of Computer Science and Engineering, Pennsylvania State University We propose a two-pass semi-supervised fuzzy.
Canadian Bioinformatics Workshops
Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models Arthur Brady and Steven L. Salzberg Nature Methods 6(9):
Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness Patric D. Schloss and Jo Handelsman Department.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Metagenomics The study of metagenomes, genetic material recovered directly from environmental samples. Term: Coined in 1998 to refer to the idea that a.
Gene prediction in metagenomic fragments: A large scale machine learning approach Katharina J Hoff, Maike Tech, Thomas Lingner, Rolf Daniel, Burkhard Morgenstern.
Computational Characterization of Short Environmental DNA Fragments Jens Stoye 1, Lutz Krause 1, Robert A. Edwards 2, Forest Rohwer 2, Naryttza N. Diaz.
What is BLAST? Basic BLAST search What is BLAST?
Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments Xinjun Zhang.
bacteria and eukaryotes
Canadian Bioinformatics Workshops
Bioinformatics Overview
Metagenomic Species Diversity.
Introduction to Bioinformatics Resources for DNA Barcoding
Strain profiling with StrainPhlAn and PanPhlAn
Basics of BLAST Basic BLAST Search - What is BLAST?
Research in Computational Molecular Biology , Vol (2008)
Workshop on the analysis of microbial sequence data using ARB
Taxonomic profiling with MetaPhlAn2
Metagenomics Image: Iverson et al. 2012, Science.
Taxonomic profiling with MetaPhlAn2
Genomes and Their Evolution
Discovery tools for human genetic variations
H = -Σpi log2 pi.
Dr Tan Tin Wee Director Bioinformatics Centre
Comparative Genomics.
Volume 21, Issue 8, Pages (August 2014)
Example usage of mockrobiota MC resource for marker gene and metagenome sequencing pipelines. Example usage of mockrobiota MC resource for marker gene.
Toward Accurate and Quantitative Comparative Metagenomics
General overview of the bioinformatic pipelines for the 16S rRNA gene microbial profiling and shotgun metagenomics. General overview of the bioinformatic.
Presentation transcript:

MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res. 2007

Early metagenomic  Known phylogenetic markers and subsequent sequencing of clones  Analysis of paired-end reads  Complete sequences of environmental fosmid and BAC clones  Rough annotation of the metabolic capacity  Environmental assemblies  Distinguish between discrete species and population of closely related biotypes  Problem of using proven phylogenetic markers(ribosomal genes, coding sequences)  Slow-evolving genes : distinguishing between species at large evolutionary distances

What is MEGAN?  Metagenome Analyzer (MEGAN)  Free software.  Deviates from the analytical pattern of previous  Built on the statistical analysis of comparing random sequence intervals with unspecified phylogenetic properties against databases  Depends on the related sequences in the databases  Providing filter to adjust the level of stringency later to an appropriate level  Laptop analysis  Comparing result (BLAST)-> laptop (MEGAN)  Graphical and statistical output

Pipeline  Compare against databases : BLAST  Compute, explore taxonomical content : NCBI taxonomy  Lowest common ancestor (LCA) algorithm  Data sets(Sargasso Sea, mammoth bone, Short E. coli K12 & B. bacteriovorus HD100)

What we can do with MEGAN  Species and strain identification through species-specific genes  Searching species or taxa by find tool  Distribution of strains of a species  Underlying sequence alignments

Experiments-1  Sargasso Sea  data set  Sanger sequencing  Sample 1-4 from DDBJ/EMBL/GenBank  reads from Sample1  Randomly selected a pooled set of reads from samples 2-4  BLASTX->NCBI-NR  1% no hits from sample1, <3% no hits from sample 2-4  Filters  Min-score : bit-score threshold of 100  Top-percent : bit scores lie within 5% of the best score  Min-support : isolated assignments it by one read) discarded

Analysis-Sargasso Sea data  1.66M reads, AVG. 818bp by Sanger sequensing  Species profile of 16 taxonomical groups  Environmental assemblies  By analyzing six specific phylogenetic markers  rRNA, RecA/RadA, HSP70, RpoB, EF-Tu, and Ef-G

Result Sample1 ~83% reads were assigned to taxa that were more speific than the kingdom level Majority of (8298) were assigned to bacterial group Sample 2-4 ~59% reads were assigned to taxa that were more specific than the kingdom level Majority of (5709) were assigned to bacterial group Alphaproteobacteria, Gammaproteobacteria by a factor of 2-4 over the remaining 14 taxonomic groups Eukaryotes & Viruses : size filtering Archaea : May be there is 10times as much vacterial sequence information in the public databases MEGAN vs. previous (Venter et al. 2004) Specific assignment information : LCA

Result-cont. Averaged weighted percentage of the siz phylogenetic markers for each of the 16 taxonomic groups Easily detect sampling bias between sample1 and pooled sample 2-4

Experiments-2  Mammoth bone  Data set  Roche GS20 sequencing (Sequencing-by-synthesis)  Sample from 1g of mammoth bone, years  ~300,000 reads, 95bp  BLASTZ-genome sequences (elephant, human, dog)  45.4% of the reads mammoth DNA, others are environmental organisms (bacteria, fungi, amoeba, nematodes)  BLASTX–NCBI-NR for environmental sequences  Filters : bit-score threshold 30, discard isolated assignment (filtered 2086 reads)

Result  reads to Eukaryota, of which 7969 to Gnathostomata  : Bacteria, 761: Archea, 152 : Viruses

Experiment 3  Identifying species from various lead length  Short E. coli K12 & B. bacteriovorus HD100 simulation  5000 random shotgun reads  BLASTX-NCBI-NR  Filters  Bit-score threshold 35  20% of the best hit  Discarded isolated assignments  Result : no false-positive assignment, short read can be used for metagenomic analysis, albeit at the cost of a high rate of under- prediction

Experiment 3-cont.  Roche GS20 sequencing  Data set  2000 reads from random positions in the E.coli K12  ~100 bp  BALSTX – NCBI-NR  Filters  Bit-score threshold 35  20% of the best hit  Discarded isolated assignments  Result

Experiment 3-cont.  Roche GS20 sequencing  Data set  2000 reads from random positions in the B. bacteriovorus HD100  ~100 bp  BALSTX – NCBI-NR : A in figure  BLASTX – NCBI-NR without B.bacteriovorus HD100 : B in figure  Filters  Bit-score threshold 35  20% of the best hit  Discarded isolated assignments  Result

MEGAN 3(June, 2009)  Suitable for very large datasets  Advances in the throughput and cost-efficiency of sequencing technology  Interests changed  From ‘which species present’ to ‘What’s different?’  Features  Visualization technique for multiple database  New statistical method for highlighting the difference in a pairwise comparison

MEGAN3-cont.  Comparing 6 mouse gut with human gut  Clickable, collapsible.