Computational Characterization of Short Environmental DNA Fragments Jens Stoye 1, Lutz Krause 1, Robert A. Edwards 2, Forest Rohwer 2, Naryttza N. Diaz.

Slides:



Advertisements
Similar presentations
Clostridium difficile Colitis or Dysbiosis. Symbiostasis/Dysbiosis.
Advertisements

Tucson High School Biotechnology Course Spring 2010.
Use of the genomic data o Reconstruction of metabolic properties o Nature’s Microbiome o NGS in Population Genetics.
Metabarcoding 16S RNA targeted sequencing
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Profiles for Sequences
Molecular Evolution Revised 29/12/06
Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca.
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
High Throughput Computational Sequence Analysis Rob Edwards Argonne National Laboratory San Diego State University.
High performance computational analysis of DNA sequences from different environments Rob Edwards Computer Science Biology edwards.sdsu.eduwww.theseed.org.
Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,
THE GLOBAL MARINE VIRIOME Rob Edwards Dept. Biology, SDSU Computational Sciences Research Center, SDSU Center for Microbial Sciences, San Diego, Fellowship.
Metagenomics Rob Edwards MCS. The Soudan Mine, Minnesota Red Stuff Oxidized Black Stuff Reduced.
Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca.
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
The Microbiome and Metagenomics
Metagenomics Binning and Machine Learning
Metagenomic Analysis Using MEGAN4
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.
From Genomic Sequence Data to Genotype: A Proposed Machine Learning Approach for Genotyping Hepatitis C Virus Genaro Hernandez Jr CMSC 601 Spring 2011.
Prokaryote Taxonomy & Diversity
Identify gene markers for different taxonomic groups in Archaea and Bacteria Genomes Dongying Wu 1,2, Jonathan A. Eisen 1,2 1. DOE Joint Genome Institute,
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Big Picture Of ≈1.7 million species classified so far, roughly 6000 are microbes True number of microbes is obviously larger than 6000 “Imagine if our.
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Molecular Phylogeny. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
TIPP: Taxon Identification and Phylogenetic Profiling Tandy Warnow The Department of Computer Science.
CompostBin : A DNA composition based metagenomic binning algorithm Sourav Chatterji *, Ichitaro Yamazaki, Zhaojun Bai and Jonathan Eisen UC Davis
Elucidating factors behind pair wise distances discrepancies between short and near full-length sequences. We hypothesized that since the 16S rRNA molecule.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Functional and Evolutionary Attributes through Analysis of Metabolism Sophia Tsoka European Bioinformatics Institute Cambridge UK.
Species richness The number of species is an important biological variable that scientists try to quantify.
ASSEMBLY AND ALIGNMENT-FREE METHOD OF PHYLOGENY RECONSTRUCTION FROM NGS DATA Huan Fan, Anthony R. Ives, Yann Surget-Groba and Charles H. Cannon.
SGM Meeting, Warwick, April 2006
A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness Patric D. Schloss and Jo Handelsman Department.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
KGEM: an EM Error Correction Algorithm for NGS Amplicon-based Data Alexander Artyomenko.
Gene prediction in metagenomic fragments: A large scale machine learning approach Katharina J Hoff, Maike Tech, Thomas Lingner, Rolf Daniel, Burkhard Morgenstern.
The SEED Family First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How.
Soil Microbiome of Native and Invasive Marsh Grasses in Blackbird Creek, Delaware Lathadevi K.Chintapenta 1#, Gulnihal Ozbay 1#, Venu Kalavacharla 1* Figure.
Rob Edwards San Diego State University
Metagenomic Species Diversity.
Introduction to Bioinformatics Resources for DNA Barcoding
Metagenomics: From Bench to Data Analysis 19-23rd September S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr.
The bioinformatics behind
Preprocessing Data Rob Schmieder.
Quality Control & Preprocessing of Metagenomic Data
Taxonomic distribution of large DNA viruses in the sea
Disease risk prediction
Research in Computational Molecular Biology , Vol (2008)
H = -Σpi log2 pi.
Volume 137, Issue 2, Pages (August 2009)
Skin Microbiome Surveys Are Strongly Influenced by Experimental Design
Taxonomic identification and phylogenetic profiling
Framework for integrating taxonomic and metabolomic data.
Cyanophage-host interactions from metatranscriptomic data.
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Computational Characterization of Short Environmental DNA Fragments Jens Stoye 1, Lutz Krause 1, Robert A. Edwards 2, Forest Rohwer 2, Naryttza N. Diaz 1, Alexander Goesmann 1, Scott Kelley 2, Alfred Pühler 1 1 Bielefeld University 2 San Diego State University

Computational Characterization of Short Environmental DNA Fragments Jens Stoye 1, Lutz Krause 1, Robert A. Edwards 2, Forest Rohwer 2, Naryttza N. Diaz 1, Alexander Goesmann 1, Scott Kelley 2, Alfred Pühler 1 1 Bielefeld University 2 San Diego State University

CeBiTec, Bielefeld University Jens Stoye Metagenomics 454 Pyrosequencing

CeBiTec, Bielefeld University Jens Stoye CARMA - A Pipeline for Characterizing Short-Read Metagenomes Quantitative analysis of metagenomes: Which microbes live in an environment? What are they doing? What are the differences between communities from different environments?

CeBiTec, Bielefeld University Jens Stoye (A) Functional Analysis Reads directly analyzed without prior assembly Protein family fragments used as Environmental Gene Tags (EGTs) for quantitative analysis of gene content GO-term profiles characterize genetic diversity and potential metabolism of underlying communities

CeBiTec, Bielefeld University Jens Stoye Heat Map Comparing GO-Term Frequencies Comparative analysis reveals genetic and metabolic trends Significantly overrepresented GO-terms identified with G-test

CeBiTec, Bielefeld University Jens Stoye (B) Analyzing the Community Structure EGTs assigned to taxonomic groups based on a phylogenetic analysis Taxonomic profiles characterize the composition of the underlying communities

CeBiTec, Bielefeld University Jens Stoye Taxonomic Classification of Short Environmental Gene Tags Krause et al., submitted Phylogenetic tree reconstructed for each matching Pfam family Multiple alignment of known family members (downloaded from Pfam web site)

CeBiTec, Bielefeld University Jens Stoye Taxonomic Classification of Short Environmental Gene Tags Phylogenetic tree reconstructed for each matching Pfam family Identified EGTs matching family added to full multiple alignment PF1-PF7: Known family members EGT1-EGT3: Environmental Gene Tags matching family Krause et al., submitted

CeBiTec, Bielefeld University Jens Stoye Taxonomic Classification of Short Environmental Gene Tags Phylogenetic tree reconstructed for each matching Pfam family Multiple alignment used to calculate distance matrix Pairwise distance: sequence identity in aligned region Missing values determined with additive estimation (Landry et al., 1996) PF1PF2…EGT3 PF100.3…0.6 PF20.30…0.2 …………… EGT …0

CeBiTec, Bielefeld University Jens Stoye Taxonomic Classification of Short Environmental Gene Tags Distance matrix used to reconstruct phylogenetic tree (with Neighbor Joining) EGTs classified based on their location in tree PF1PF2…EGT3 PF100.3…0.6 PF20.30…0.2 …………… EGT …0 Krause et al., submitted

CeBiTec, Bielefeld University Jens Stoye Performance Evaluation: Creating Standard of Truth Test set: 77 complete genomes 2 Superkingdoms (Archaea and Bacteria) 10 Phyla 29 Classes 62 Genera 77 Species Test set excluded from reference set (Pfam members from any of the 77 species omitted from full multiple alignments)

CeBiTec, Bielefeld University Jens Stoye Performance Evaluation: Creating Standard of Truth 77 genomes fragmentized with ReadSim (Schmid et al., submitted) Simulates sequencing using 454 pyrosequencing Fragments randomly sampled (2x) Fragment length: bp, mean 100bp Simulates sequencing errors at homopolymers

CeBiTec, Bielefeld University Jens Stoye Classification Accuracy for Short Environmental Gene Tags Sens: Sensitivity, fraction of correctly classified EGTs Spec: Specificity, reliability of predictions FNrate: False negative rate, proportion of wrongly classified EGTs Urate: Unknown rate, proportion of EGTs not assigned to any taxonomic group

Application Example: Comparative Analysis of Four Microbial Coral Reef Communities In cooperation with Rob Edwards and Forest Rohwer (San Diego State University, California) Dinsdale, et al., submitted

CeBiTec, Bielefeld University Jens Stoye Influence of Human Activities on Coral Reef Microbial Communities Kiritimati Kingman Palmyra Tabuaeran Northern Line Islands little intermediate high Human disturbance

CeBiTec, Bielefeld University Jens Stoye GO-Term Profiles Indicate Transition in Metabolic Activities Color indicates abundance of GO-terms in each sample Significantly different (p < 0.01)

CeBiTec, Bielefeld University Jens Stoye Community Structure

CeBiTec, Bielefeld University Jens Stoye Taxonomic Profiles Indicate Transition from Prochlorococcus to Synechococcus (most abundant marine Cyanobacteria)

Application Example: Comparative Analysis of Three Aquatic Microbial Communities L. Krause, N. N. Diaz, A. Goesmann, F. Rohwer, S. Kelley, R. A. Edwards and J. Stoye. Taxonomic classification of short environmental DNA fragments. submitted

CeBiTec, Bielefeld University Jens Stoye Sampling Locations Rios Mesquites stromatolites, Mexico San Diego solar salterns, USA Kingman coral reef, Northern Line Islands Sample data provided by Forest Rohwer and Robert Edwards

CeBiTec, Bielefeld University Jens Stoye Community Structure pEGTs: prokaryotic fraction of EGTs

CeBiTec, Bielefeld University Jens Stoye Community Structure Genus pEGTs: prokaryotic fraction of EGTs

CeBiTec, Bielefeld University Jens Stoye Taxonomic Diversity H' : Diversity, including richness and evenness (Shannon index) J : Evenness, relative commonness and rarity of organisms Sample PhylumClassOrderGenus H'H' J H'H' J H'H' J H'H' J Coral reef Stromatolite Solar Saltern

CeBiTec, Bielefeld University Jens Stoye Further Applications of CARMA Diversity of coral reef viruses (in cooperation with Stuart Sandin, Scripps Institution of Oceanography, San Diego, USA) Waste Water Treatment Plant plasmid sample (in cooperation with Andreas Schlüter, Bielefeld University)

CeBiTec, Bielefeld University Jens Stoye Conclusions Gene fragments identified using Pfam profile hidden Markov models Fragments can be assigned to functional role and taxonomic origin Profiling allows detection of trends in species composition, metabolism, and genetic potential Pyrosequencing combined with profiling techniques enables rapid and cost-effective assay of microbial communities

CeBiTec, Bielefeld University Jens Stoye Acknowledgements Co-authors: Lutz Krause 1, Robert A. Edwards 2, Forest Rohwer 2, Naryttza N. Diaz 1, Alexander Goesmann 1, Scott Kelley 2, and Alfred Pühler 1 Also many thanks to: Andreas Schlüter 1, Elisabeth Dinsdale 2, Scott Kelley 2, Beltran Rodriguez-Brito 2, and Christelle Desnues 2 1 Bielefeld University 2 San Diego State University

Thank you for your attention!!!!

CeBiTec, Bielefeld University Jens Stoye Taxonomic Diversity Diversity: Evenness: p i : proportion of EGTs classified into i-th taxonomic group H max : total number of taxa found Sample PhylumClassOrderGenus H'H' J H'H' J H'H' J H'H' J Coral reef Stromatolite Solar Saltern