The Tree of Life (TOL) in the age of Genomics or a journey through the Phylogenetic Forest Eugene Koonin, NCBI / NLM / NIH RECOM BE, San Diego, May 23,

Slides:



Advertisements
Similar presentations
The Cobweb of Life Revealed by Genome-Scale Estimates of Horizontal Gene Transfer By Fan Ge, Li-San Wang, Junhyong Kim Published: August 30, 2005 Presented.
Advertisements

An Overview of Microbial Life
Phylogeny Systematics Cladistics
THE EVOLUTIONARY HISTORY OF BIODIVERSITY
Cell Structure and Evolutionary History Structure, p. 22.
Tree of Life Chapter 26.
Phylogeny and Systematics
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Types of homology BLAST
Lecture 2 Overview of Microbial Diversity Prokaryotic and Eukaryotic Cells Taxonomy and Nomenclature (Text Chapters: 2; 11)
Molecular Evolution Revised 29/12/06
Bioinformatics and Phylogenetic Analysis
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
Brock Biology of Microorganisms
Example of bipartition analysis for five genomes of photosynthetic bacteria (188 gene families) total 10 bipartitions R: Rhodobacter capsulatus, H: Heliobacillus.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Summer 2008 Workshop in Biology and Multimedia for High School Teachers.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
The diversity of genomes and the tree of life
Scientific FieldsScientific Fields  Different fields of science have contributed evidence for the theory of evolution  Anatomy  Embryology  Biochemistry.
Microbial Evolution Zoology/Anthro/Botany 410 Nicole T. Perna April24, 2014.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Chapter 17 Prokaryotic Taxonomy How many species of bacteria are there? How many species can be grown in culture? Bergey’s Manual Classification Schemes.
Microbial taxonomy and phylogeny
Taxonomy & Classification Taxonomy- science of identifying and classifying organisms; all about the naming Classification- systematic grouping and naming.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
The Evolutionary History of Biodiversity
3- NON-RIBOSOMAL GENE RECONSTRUCTION  Core / auxiliary / strain specific genes  Housekeeping genes and accordance with global reconstruction  MLSA 
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
RECONSTRUCTING A “UNIVERSAL TREE” Classical view Prokaryotes Eukaryotes 1977: C. Woese 3 “primordial kingdoms” (or domains) - based on ribosomal RNA sequence.
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
Introduction to Phylogenetics
Construction of Substitution Matrices
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
ARE THESE ALL BEARS? WHICH ONES ARE MORE CLOSELY RELATED?
Phylogenetic analyses of cyanobacterial genomes: Quantification of horizontal gene transfer events Olga Zhaxybayeva, J. Peter Gogarten, Robert L. Charlebois,
Copyright © by Holt, Rinehart and Winston. All rights reserved. ResourcesChapter menu To View the presentation as a slideshow with effects select “View”
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Introduction to History of Life. Biological evolution consists of change in the hereditary characteristics of groups of organisms over the course of generations.
RECONSTRUCTING A “UNIVERSAL TREE” Classical view Prokaryotes Eukaryotes 1977: C. Woese 3 “primordial kingdoms” (or domains) - based on ribosomal RNA sequence.
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies How We Name Living Things Chapter 12 Copyright © McGraw-Hill Companies.
Significance Tests for Max-Gap Gene Clusters Rose Hoberman joint work with Dannie Durand and David Sankoff.
In brief Vertical vs. Horizontal Homologous vs. Unequal
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
Major characteristics used in taxonomy
Chapter 26 Phylogeny and Systematics. Tree of Life Phylogeny – evolutionary history of a species or group - draw information from fossil record - organisms.
PHYOGENY & THE Tree of life Represent traits that are either derived or lost due to evolution.
Classification Biology I. Lesson Objectives Compare Aristotle’s and Linnaeus’s methods of classifying organisms. Explain how to write a scientific name.
Taxonomy & Phylogeny. B-5.6 Summarize ways that scientists use data from a variety of sources to investigate and critically analyze aspects of evolutionary.
General Microbiology (Micr300)
Phylogeny and the Tree of Life
Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments Xinjun Zhang.
Phylogenetic genome analysis, phylogenomics
How to Use This Presentation
Announcements.
Comparative Genomics and New Evolutionary Biology
Evolutionary genomics can now be applied beyond ‘model’ organisms
Notes/Homework Section 1.2 Campbell Biology in Focus
Molecular Phylogeny Similarity among organisms (and their genes) is the result of descent from a common ancestor. Variation occurs via genetic drift and.
Pipelines for Computational Analysis (Bioinformatics)
Domains & Dogma.
Ch. 4 Taxonomy and Phylogeny of Animals
SEG5010 Presentation Zhou Lanjun.
Archaea and the Origin(s) of DNA Replication Proteins
Domains & Dogma.
Evolution Biology Mrs. Johnson.
Phylogeny and the Tree of Life
Presentation transcript:

The Tree of Life (TOL) in the age of Genomics or a journey through the Phylogenetic Forest Eugene Koonin, NCBI / NLM / NIH RECOM BE, San Diego, May 23, 2010

Thinking of the history of life in terms of phylogenetic trees is as old as scientific biology (if not older) Charles Darwin (1859) Origin of Species [one and only illustration]: "descent with modification" Ernst Haeckel (1879) The Evolution of Man A brief history of TOL

Advent of molecular phylogenetics – trees can be made from alignments of homologous genes - expectations of objectively reconstructed complete Tree of Life Woese et al. (1990) Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. PNAS 87, [Figure 1, modified] A brief history of TOL ribosomal RNA phylogeny

Enter genomics…and things get complicated Fleischmann et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science Jul 28;269(5223):

Massive horizontal gene transfer (HGT) from archaea to a hyperthermophilic bacterium

Significant HGT from eukaryotes to a (facultative) symbiotic bacterium

Phylogenetic trees for aaRSs – 20 key enzymes of protein biosynthesis diversity of HGT signals Wolf Y I et al. Genome Res. 1999;9: ©1999 by Cold Spring Harbor Laboratory Press

Wolf Y I et al. Genome Res. 1999;9: ©1999 by Cold Spring Harbor Laboratory Press Phylogenetic trees for aaRS: diversity of HGT signals

Wolf Y I et al. Genome Res. 1999;9: ©1999 by Cold Spring Harbor Laboratory Press

Wolf Y I et al. Genome Res. 1999;9: ©1999 by Cold Spring Harbor Laboratory Press Phylogenetic trees for aaRS: diversity of HGT signals

Wolf Y I et al. Genome Res. 1999;9: ©1999 by Cold Spring Harbor Laboratory Press Phylogenetic trees for aaRS: diversity of HGT signals

Wolf Y I et al. Genome Res. 1999;9: ©1999 by Cold Spring Harbor Laboratory Press Phylogenetic trees for aaRS: diversity of HGT signals

Doolittle WF. (2000) Uprooting the tree of life. Sci. Am. 282, [modified] History of TOL BacteriaArchaea Eukaryotes BacteriaArchaea Eukaryotes Troubled times – "uprooting" of TOL for prokaryotes. horizontal gene transfer is rampant; no gene is exempt histories of individual genes are non-coherent with each other tree-like signal is completely lost (or never existed at all) there are no species (or other taxa) in prokaryotes the consistent tree-likesignal we observe is created by biases in HGT "Standard Model""Net of Life"

So – in the light of HGT - is there a Tree of Life? Or was the history of life more like a net? To address this question, explore the Forest of Life – trees for ALL conserved genes

ANALYSIS OF THE FOL RECONSTRUCTION OF THE FOL 1. SELECTION OF ORTHOLOGOUS GENES 2. MULTIPLE ALIGNMENT 3. CONSTRUCTION OF ML PHYLOGENETIC TREES Tree11 Tree Tree ………… Tree N Tree comparison networks cluster analysis 4. TREE COMPARISON METHODS Matrix of distances between trees FOL NUTs > 90% species Exploration of the Forest of Life (FOL): bioinformatic pipeline

FOL and NUTs Forest of Life (FOL) and Nearly Universal Trees (NUTs) most of the trees contain relatively few species 102 "nearly universal" trees (90+ species): NUTs 6901 trees (FOL)

Archaea and Bacteria in NUTs How well archaea and bacteria are separated? 56% of NUTs show perfect separation between archaea and bacteria; 92% of NUTs show partial, non-random separation archaea and bacteria separated archaea and bacteria mixed archaea invade bacterial subtree bacteria invade archaeal subtree

NUTs vs Random Trees Inconsistency Score (IS) compares a tree to a set of trees. The score is based on the frequency of bipartitions derived from the given tree among the whole set of trees. Range ~[0..1]. NUTs are much more consistent other than random trees random trees NUTs

NUTs Pattern of Similarity An edge connects two vertices (trees) if the distance between them is below the threshold A single connected cluster appears and gradually grows to encompass all NUTs as the threshold is lowered.

Are NUTs Clustered? The 102x102 matrix of distances between NUTs is projected into a lower- dimensional space using multidimensional scaling (CMDS). Analysis using gap function approach (Tibshirani et al. 2001) shows lack of distinct separate clusters in the tree topology space

NUTs vs FOL Similarity between NUTs and the rest of the FOL. The NUTs are connected to 2505 trees (36%) from the FOL at a 0.8 similarity cut-off. The mean similarity between the NUTs and the FOL is ~0.5 (only ~0.1 for random trees).

NUTs vs FOL The matrix of distances between all the trees in the FOL is projected into a lower-dimensional space using CMDS. FOL/COG trees form 7 distinct clusters in tree topology space. Clusters differ largely by phyletic patterns. NUTs form a tight group within one of the clusters and are approximately equidistant to all clusters. NUTs NUTs vs FOL clusters

Interim conclusions Analysis of the full Forest of Life in comparison to NUTs shows that: a considerable fraction of FOL trees are very similar to NUTs: average FOL-NUTs similarity is dramatically above the random level unlike NUTs, topologies of the FOL trees show distinct clustering largely determined by the phyletic patterns (i.e. set of species present) in the tree topology space NUTs form a comparatively tight centrally located group compared to NUTs, FOL trees show much wider diversity of their topologies; however, the "central" tree-like signal still exists for a large part of the FOL a "consensus" tree make sense at least as a crude representation of the common trend in the FOL (especially so for the NUTs)

Distinguishing Tree and Net signals in the FOL using Quartets of species The FOL shows a great diversity of phyletic and phylogenetic patterns. We employ the quartet analysis to measure the vertical and horizontal evolutionary signal in different areas of the Forest. there are C  4x10 6 species quartets from the set of 100 species each quartet of species can resolve into 3 different tree topologies (~12x10 6 combinations total) any tree containing all 4 species from a given quartet resolves them into one of these 3 topologies for any quartet one can compute the support for all 3 topologies within a set of trees (i.e. relative frequencies of the topologies); ~8x10 10 comparisons for the whole FOL (or any subset of trees)) f 1 + f 2 + f 3 = 1

Looking for tree-like and net-like evolution in the FOL Species Matrix – NUTs – mostly tree-like signal NUTs ArchaeaBacteria Crenarchaeota Euryarchaeota Cyanobacteria Proteobacteria often rarely together

Species Matrix – Rest of the FOL – much stronger net-like (horizontal) signal FOL-NUTs ArchaeaBacteria Crenarchaeota Euryarchaeota Cyanobacteria Proteobacteria often rarely together

Species Matrix vs Function J U K L F H I N O S D MECG RQ P T V

Conclusions – 3 Quartet-based analysis of species position in the Forest of Life shows that: relationships between species in NUTs roughly follow the conventional microbial taxonomy – probably, a consequence of significant contribution of tree-like evolution the “TOL” signal declines with decreasing number of species in the tree different functional classes of genes show substantially different balance between tree-like and net-like modes of gene transfer and possibly in preferred routes of HGT

The Take-Home Message There is no single "Tree of Life" describing the evolutionary history of all or even the majority of the prokaryote genomes Yet, there is a central tree-like trend of evolution compatible with a common history of descent of prokaryotic groups This trend is more evident at shallow phylogenetic depths, in more ubiquitous genes and among some functional categories of genes (eg, translation) Observations are compatible with the ancient divergence between the bacterial and archaeal clades followed by rapid diversification of major phyla followed by HGT that distorted but did not destroy the tree-like signal Altogether, HGT might dominate evolution but the tree-like signal is stronger than the signal from any particular route of HGT Puigbo P, Wolf YI, Koonin EV. (2009) Search for a 'Tree of Life' in the thicket of the phylogenetic forest. J. Biol. 8, 59 Koonin EV, Wolf YI, Puigbo P. (2009) The Phylogenetic Forest and the Quest for the Elusive Tree of Life. Cold Spring Harb Symp Quant Biol. [advanced epub] Koonin EV, Wolf YI. (2009) The fundamental units, processes and patterns of evolution, and the tree of life conundrum. Biol Direct. 4, 33 Puigbo P, Wolf YI, Koonin EV. The tree and net signals in the evolution of prokaryotes, In preparation

The Take-Home Message To see the forest for the trees… …but also to see trees for the forest.

A quick redux on bioinformatics bioinformatics is a toolbox for computational analysis of biological problems the toolbox is already quite large and versatile, so in many cases, combination of available tools into pipelines allows researchers to address new questions of course, new tools often have to be developed and added to the toolbox biologists must understand the capacities but also the limitations of the tools Can bioinformatics – using the vast amounts of data produced by genomics and systems biology – help transform biology “from stamp collection to physics”?

Acknowledgments Pere Puigbo Fellow explorers of the Forest of Life Yuri Wolf