Metagenomic Analysis Using MEGAN4

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Biological Oceanography Scientific Domain Ed DeLong MIT Department of Biological Engineering Department of Civil and Environmental Engineering DataSpace.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Bioinformatics and Phylogenetic Analysis
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Molecular Evidence Using DNA, RNA or Protein Sequences to Classify Organisms.
Project Proposals Due Monday Feb. 12 Two Parts: Background—describe the question Why is it important and interesting? What is already known about it? Proposed.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Metagenomics Binning and Machine Learning
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Ch10. Intermolecular Interactions and Biological Pathways
ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University.
Gene Set Enrichment Analysis (GSEA)
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Pollen transcript unigene identifier log 2 -fold change Annotation (BLAST) Unigene L. longiflorum chloroplast, complete genome Unigene
Metagenomic Analysis Using MEGAN?
I529: Lab5 02/20/2009 AI : Kwangmin Choi. Today’s topics Gene Ontology prediction/mapping – AmiGo –
Bacterial Genetics - Assignment and Genomics Exercise: Aims –To provide an overview of the development and.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Gene expression analysis
Discovering the Correlation Between Evolutionary Genomics and Protein-Protein Interaction Rezaul Kabir and Brett Thompson
Introduction to Phylogenetics
Construction of Substitution Matrices
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Overview of Bioinformatics 1 Module Denis Manley..
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
EB3233 Bioinformatics Introduction to Bioinformatics.
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Biocomputational Languages December 1, 2011 Greg Antell & Khoa Nguyen.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Canadian Bioinformatics Workshops
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Introducing Bioinformatics Using the Nitrogen Cycle Alyssa Bumbaugh Ron Peck Mark Radosevich.
Date of download: 7/7/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A) DNA.
BME435 BIOINFORMATICS.
Bioinformatics Overview
Metagenomic Species Diversity.
Introduction to Bioinformatics Resources for DNA Barcoding
Data-intensive Computing: Case Study Area 1: Bioinformatics
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Systematic Characterization and Analysis of the Taxonomic Drivers of Functional Shifts in the Human Microbiome  Ohad Manor, Elhanan Borenstein  Cell Host.
Predicting Active Site Residue Annotations in the Pfam Database
Metagenomics and metatranscriptomics: Windows on CF-associated viral and microbial communities  Yan Wei Lim, Robert Schmieder, Matthew Haynes, Dana Willner,
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Metagenomics and metatranscriptomics: Windows on CF-associated viral and microbial communities  Yan Wei Lim, Robert Schmieder, Matthew Haynes, Dana Willner,
Dr Tan Tin Wee Director Bioinformatics Centre
Victor M. Markowitz, I-Min A. Chen, Ken Chu, Amrita Pati, Natalia N
Basic Local Alignment Search Tool
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super Computing Center

Introduction   In METAGENOMICS, the aim is to understand the composition and operation of complex microbial consortia in environmental samples through sequencing and analysis of their DNA. Similarly, metatranscriptomics and metaproteomics target the RNA and proteins obtained from such samples. Technological advances in next-generation sequencing methods are fueling a rapid increase in the number and scope of environmental sequencing projects. In consequence, there is a dramatic increase in the volume of sequence data to be analyzed. http://ab.inf.uni-tuebingen.de/software/megan/welcome.html

The Importance of Metagenomics is Driven by Sequencing Costs The $100 Human Genome

Basic Computational Metagenomics   The first three basic computational tasks for such data are: taxonomic analysis (“who is out there?”) functional analysis (“what are they doing?”) comparative analysis. (“how do different samples compare?”) This is an immense conceptual and computational challenge that MEGAN is designed to address. http://ab.inf.uni-tuebingen.de/software/megan/welcome.html

Function/Gene Ontology (SEED) Metabolomics/Pathway Analyses (KEGG) Algorithms in Bioinformatics MEGAN-4 USES Taxonomic Metagenomic Metatranscriptomic Metaproteomic 16S rRNA sequences Function/Gene Ontology (SEED) Metabolomics/Pathway Analyses (KEGG) Comparative Genomics http://ab.inf.uni-tuebingen.de/software/megan/welcome.html

Getting started Prepare a dataset for use with MEGAN: 1. First compare reads against a database of reference sequences, e.g. BLASTX search against the NCBI-NR database. 2. Reads file & resulting BLAST file can be directly imported into MEGAN* Automatic taxonomic classification or functional classification, Uses SEED or KEGG classification, or both. 3. Multiple datasets can be opened simultaneously for comparative views   aatacgaacatt tgccatggacgc tggccattgac Comparative Data Raw Digital Data Metagenomic sample MEGAN4 DNA-RNA-Protein BLAST nr nt Ref seq pdb rdb http://ab.inf.uni-tuebingen.de/software/megan/welcome.html

Taxonomic analysis MEGAN can be used to interactively explore the dataset. Figure shows assignment of reads to the NCBI taxonomy. Each node is labeled by a taxon and the number of reads assigned to the taxon, The size of a node is scaled logarithmically to represent the number of assigned reads. Tree display options allow you to interactively drill down to the individual BLAST hits and to export all reads One can select a set of taxa and then use MEGAN to generate different types of charts http://ab.inf.uni-tuebingen.de/software/megan/welcome.html

Multiple Chart Options are Available

Functional analysis using the SEED classification SEED1 is a comparative genomics environment of curated genomic data. The following figure shows a part of the SEED analysis of a marine metagenome sample. MEGAN attempts to map each read to a SEED functional role by the highest scoring BLAST protein match with a known functional role. SEED rooted trees are “multi-labeled” because different leaves may represent the same functional role (if it occurs in different types of subsystems) The current complete SEED tree has about 13,000 nodes. 1http://www.theseed.org/wiki/Main_Page http://ab.inf.uni-tuebingen.de/software/megan/welcome.html

Functional analysis using the KEGG classification To perform a KEGG analysis, MEGAN attempts to match each read to a KEGG orthology (KO) accession number, using the best hit to a reference sequence Reads are then assigned to enzymes and pathways. The KEGG classification is represented by a rooted tree whose leaves represent pathways. See: http://www.kegg.jp/kegg/pathway.html Each pathway can also be inspected visually, for example the citric acid cycle (shown). These provide inferences regarding the cellular activities of a sample. KEGG displays different participating enzymes by numbered rectangles. MEGAN shades each such rectangle is so as to indicate the number of reads assigned to the corresponding enzyme. http://ab.inf.uni-tuebingen.de/software/megan/welcome.html

KEEG Pathways and examples KEGG (Kyoto Encyclopedia of Genes and Genomes) “is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies” KEGG is used to observe patterns in metabolic pathways, functional hierarchies, Diseases, Ortholog Groups, Genes and Genomes. KEGG is heavily used by the metabolism community, and for comparative transcriptomics. Here are some examples of the KEGG results from our metabolic samples. Do they suggest anything to you?

Comparitive analysis using the SEED classification MEGAN also supports the simultaneous analysis and comparison of the SEED functional content of multiple metagenomes, or multiple timepoints/samples (shown) A comparative view of assignments to a KEGG pathway is also possible. http://ab.inf.uni-tuebingen.de/software/megan/welcome.html

Computational comparison of metagenomes MEGANs analysis window compares multiple datasets. This enables creating distance matrices for a collection of datasets using different ecological indices. MEGAN supports a number of different methods for calculating a distance matrix, These can be visualized either using a split network calculated using the neighbor-net algorithm, or using a multi-dimensional scaling plot. NeighborNet[1] is an algorithm that computes unrooted phylogenetic networks from molecular sequence data. The figure we shows a comparison of eight marine datasets based on the taxonomic content of the datasets and computed using Goodall’s index. 1Bryant and Moulton : Neighbor-net, an agglomerative method for the construction of phylogenetic networks - Molecular Biology and Evolution 21 (2003) http://ab.inf.uni-tuebingen.de/software/megan/welcome.html

Comparative Taxonomic Visualization MEGAN provides a comparison view that is based on a tree in which each node shows the number of reads assigned to it for each of the datasets. This can be done either as a pie chart, a bar chart or as a heat map. Once the datasets are all individually opened MEGAN provides a “compare” dialog. The following figure shows the taxonomic comparison of all eight marine datasets. Here, each node in the NCBI taxonomy is shown as a bar chart indicating the number of reads (normalized, if desired) from each dataset assigned to the node. http://ab.inf.uni-tuebingen.de/software/megan/welcome.html