Abstract Our current understanding of the taxonomic and phylogenetic diversity of cellular organisms, especially the bacteria and archaea, is mostly based.

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

Chapter 8 Microbial Genetics Biology 1009 Microbiology Johnson-Summer 2003.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Metabarcoding 16S RNA targeted sequencing
Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers.
Phylogeny Systematics Cladistics
THE EVOLUTIONARY HISTORY OF BIODIVERSITY
MICROBIAL TAXONOMY Phenotypic Analysis Genotypic Analysis.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Practical Bioinformatics Community structure measures for meta-genomics István Albert Bioinformatics Consulting Center Penn State.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Microbial Diversity.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
Zachary Bendiks. Jonathan Eisen  UC Davis Genome Center  Lab focus: “Our work focuses on genomic basis for the origin of novelty in microorganisms (how.
Metagenomic Analysis Using MEGAN4
T-COFFEE Multiple Alignments of Orthologous Sequences Horizontal Gene Transfer (Phylogenetic Trees) WebLogo.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Automatic ssu- rRNA novelty ranking pipeline Ssu-RNA sequence s One ranking score for each sequence for phylogenetic novelty Dongying Wu 03/2015.
Prokaryote Taxonomy & Diversity
The Origin of Eukaryotic Cells  With lots of perplexities and guesses, researchers did many experiments to bring it to light.
VAST 2011 Sebastian Bremm, Tatiana von Landesberger, Martin Heß, Tobias Schreck, Philipp Weil, and Kay Hamacher Interactive-Graphics Systems TU Darmstadt,
Identify gene markers for different taxonomic groups in Archaea and Bacteria Genomes Dongying Wu 1,2, Jonathan A. Eisen 1,2 1. DOE Joint Genome Institute,
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
BINF6201/8201 Molecular phylogenetic methods
AP Biology Chapter 25. Phylogeny & Systematics An unexpected family tree. What are the evolutionary relationships among a human, a mushroom,
Introduction to Phylogenetics
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
Anis Karimpour-Fard ‡, Ryan T. Gill †,
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
GEBA Project Summary Dongying Wu. Phylogenetic Tree Building (Martin Wu) Concatenate alignments of 31 marker genes build a PHYML tree 667 non-GEBA genomes,
PHYLOGENY and SYSTEMATICS CHAPTER 25. VOCABULARY Phylogeny – evolutionary history of a species or related species Systematics – study of biological diversity.
The Tree of Life How do we select a gene sequence for comparison?
Phylogeny & Systematics
Accurate estimation of microbial communities using 16S tags
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness Patric D. Schloss and Jo Handelsman Department.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
General Microbiology (Micr300)
Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments Xinjun Zhang.
RNA and protein synthesis
The Original Question:
PNAS 2012 Alpha diversity: how many species are in each sample?
Basics of Comparative Genomics
Prioritize Organism Selection for the Genomic Encyclopedia Project to Optimize Phylogenetic Diversity Dongying Wu April 10, 2007.
Linkage and Linkage Disequilibrium
Workshop on the analysis of microbial sequence data using ARB
Slide 1: Thank you Elizabeth for the introduction, and hello everybody. So, I have been a PhD student with Charles Semple and Mike Steel at the UoC since.
Protein Synthesis: Translation
There are four levels of structure in proteins
Volume 17, Issue 3, Pages (March 2015)
Gene Family Ancestral State Phylogenetic Profiling
Reading Phylogenetic Trees
RNA and protein synthesis
Fractions of 16S rRNA genes from bacteria (top panel) and archaea (bottom panel) in public databases from primer-amplified metagenomes (with and without.
Unit 7: Molecular Genetics
Volume 14, Issue 7, Pages (February 2016)
(A, left) Radial cladogram based on RAxML-based maximum-likelihood phylogeny (500 bootstraps, gamma distribution model, and LG+F substitution model) constructed.
Basics of Comparative Genomics
Phylogenetic Trees Jasmin sutkovic.
Phylogenetic comparison among selected Pasteurella multocida and Haemophilus influenzae species with completed genome sequences. Phylogenetic comparison.
Toward Accurate and Quantitative Comparative Metagenomics
Phylogeny and the Tree of Life
Presentation transcript:

Abstract Our current understanding of the taxonomic and phylogenetic diversity of cellular organisms, especially the bacteria and archaea, is mostly based upon studies of sequences of the ribosomal RNA gene sequences, especially those for the small-subunit rRNA (ss-rRNA). The current taxonomic classification of bacteria and archaea is also heavily based on ssu-rRNA. Despite the historical and current power of ssu-rRNA analysis, it does have some drawbacks including copy number variation among organism and complications introduced by horizontal gene transfer, convergent evolution, or evolution rate variations. Fortunately, genome sequencing and metagenomic sequencing are providing a wealth of information about other genes in the genomes of various bacteria and archaea. By analyzing complete genome sequences in the IMG database, we have identified 40 protein-coding genes with strong potential as broad phylogenetic markers across bacteria and archaea (e.g., they are highly universal, have low variation in copy number, and have relatively congruent phylogenetic trees). We report here the development and use of methods to make use of these 40 phylogenetic marker genes for operational taxonomic unit assignment and taxonomic classification of bacteria and archaea. Our method allows one to place an organism into a specific taxonomic group at various taxonomic levels while accounting for differences in rates of evolution between taxa and between genes. We compare the OTUs and taxonomic classifications for these protein coding marker genes with OTUs and classifications based on phylogenetic trees of ss-rRNA and those from sequence clustering (non phylogenetic) methods. Our analysis demonstrates that, at the species level, phylogenetic tree-based methods examining these 40 protein coding genes identify OTUs that are comparable to ss- rRNA sequence similarity based OTUs. Our phylogenetic tree based taxonomic classifications of IMG genomes at the genus, order, family, class, phylum levels will be discussed. Methods: 1. Measurement of the position of a node in a rooted phylogenetic tree Phylogenetic Tree Based Taxonomic Classification Dongying Wu* 1,2, Jonathan A. Eisen 2 1. DOE Joint Genome Institute, Walnut Creek, California 94598, USA 2. University of California, Davis, Davis, California 95616, USA 2. Identify OTUs based on a phylogenetic tree Position Values of nodes (PN) in a tree are calculated, an edges is cut if the PN value of the node closer to the root is larger than a cutoff (value of 0 to 1). The leaves under such a edge define one OTU. The sequence of node and edge evaluation is illustrated in Figure 2. 99,000 random sampling of the 74,789,356 pairs Figure 1. An example of PN (Position of a node) calculation. Figure 2. OTU (operational taxonomic unit) identification based on PN of the nodes in a phylogenetic tree. Figure 4. Comparison of IMG taxonomic annotation with OTUs generated from the IMG genome tree at different PN (position of the node) cutoffs. IMG genome tree was build upon the concatenated alignments of 38 phylogenetic markers by Fasttree. Different PN cutoffs for tree-based OTUs generation are corresponding to different levels of IMG’s current taxonomic classification. (1)(2) (4)(3) The normalized distance of a node to the leaves of its sub-tree is used to measure the position of a node in a rooted phylogenetic tree (PN, Position of a Node). PN is defined by equation (1): Rn is the distance of the node to the tree root, Dn is the distance of the node to the leaves of its sub-tree. Dn is defined by equation (2): Di is the distance between leaf i and the node, Pi is the phylogenetic contribution of leaf i to the sub-tree defined by the node. Di and Pi are defined in equation (3) and (4): Li is the length of the edge connects leaf i to its parent node, m is a node between leaf i and the node that equation (1) measures, Vm is the the length of the edge connects node m and its parent, Cm is the number of leaves in the sub-tree defined by node m. D A = = 2.2 P A = /2 = 1.6 D B = = 3.2 P B = /2 = 2.6 D C = 3.0 P C = 3.0 A B C N2 N1 ROOT D D N2 = P A x D A + P B x D B + P A x D B P A + P B + P A = 2.9 PN N2 = 2.9/( ) = 0.62 Mark All Leaves “CURRENT” Calculate PN values of the parent nodes of nodes/leaves marked “CURRENT” Is PN value of a parent node larger than the input cutoff? Remove the all nodes that defines no sub-trees, and change the node/leaf label from “CURRENT” to “PROCESSED” Identify nodes with all the nodes and leaves in their sub- tree marked “PROCESSED”, and mark them “CURRENT” Yes Cut the edge below the parent node into one OTU 3. Compare two sets of OTUs Adjusted mutual information (AMI) is used to compare two sets of OTUs. X and Y are two clusters of OTUs. (1) (2) (3) The adjusted mutual information (AMI) between cluster X and cluster Y is calculated by equation (1). H(X), H(Y), H(X,Y) is the entropies of X, Y and their joint cluster calculated by equation (2). I(X;Y) is the mutual information between cluster X and Y defined by equation (3). E is the average mutual information of 100 comparison between randomized X and Y using the “permutation randomization model”. 4. Phylogenetic tree building Peptide sequences of 40 phylogenetic markers genes were retrieved from the bacterial and archaeal genomes in the IMG database. The 40 genes include: ribosomal protein S2, S10, L1, L22, L4, L2, S9, L3, L14, S5, S19, S7, L16, S13, L15, L25/L23, L6, L11, L5, S12/S23, L29, S3, S11, L10, S8, L18, S15, S17, L13 and L24; translation elongation factor EF-2; translation initiation factor IF-2; Metalloendopeptidase; ffh signal recognition particle protein; phenylalanyl-tRNA synthetase beta subunit, alpha subunit; tRNA pseudouridine synthase B; Porphobilinogen deaminase; phosphoribosylformylglycinamidine cyclo-ligase; ribonuclease HII. Alignments were built by MUSCLE and phylogenetic trees were built by Fasttree. Alignments of 38 markers were concatenated and a tree was built by Fasttree (excluded Porphobilinogen deaminase and phosphoribosylformylglycinamidine cyclo-ligase). Small subunit rRNA sequences from the IMG database were aligned through SINA server. Alignments and a raxml tree of ssu-rRNA were retrieved from the “all-species living tree project” at the SILVA database. AMI compared to the mothur OTUs (cutoff 0.03) PN cutoffs for OTU identification from the SILVA raxml tree Figure 3. Adjusted mutual information (AMI) between OTUs (operational taxonomic unit) generated by MOTHUR at a cutoff of 0.03 and OTUs generated from the raxml 16S tree at different PN (position of the node) cutoffs. The distances for MOTHUR OTU classification was base on the same alignments that the phylogenetic tree was built upon, both were retrieved from the “all-species living tree project” at the SILVA database. The PN cutoff of 0.04 defines species in this tree. Results and Discussion AMI compared to IMG taxonomic grouping PN cutoffs for OTU identification from the IMG concatenated 38 marker tree Concatenated 38 markers ss-rRNA tree ss-rRNA mothur AMI Concatenated 38 markers Ribosomal protein S2 Ribosomal protein S10 Ribosomal protein L1 Concatenated 38 markers FliL CobS CobW AMI Figure 5. Comparison of OTUs generated from the IMG genome tree, IMG ssu-rRNA and sequences similarity based OTUs (MOTHUR) at different cutoffs. IMG genome tree yields OTUs that are comparable to those built from ssu-rRNA tree and MOTHUR. Figure 6. Comparison of OTUs generated from the IMG genome tree, ribosomal protein S2, S10 and L1 trees. Our results indicate that it is feasible to compare OTUs building from phylogenetic trees of different marker genes. Figure 7. Comparison of OTUs generated from the IMG genome tree, Flagellar protein FliL, Vitamin B12 synthesis protein CobS and CobW. Only single-copied FliL, CobS and CobW were included in the analysis. Out study demonstrates that FliL and CobS have co-evolved with phylogenetic marker genes such as ribosomal protein coding genes and ss-rRNA, while the evolving history of CobW is less clear. AMI