New Tools for Visualizing Genome Evolution Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island J. Peter Gogarten Dept. of Molecular.

Slides:



Advertisements
Similar presentations
Chapter 15 Table of Contents Section 1 History of Evolutionary Thought
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Maria Poptsova University of Connecticut Dept. of Molecular and Cell Biology August 18, 2006, Stanford University, CA AUTOMATED ASSEMBLY OF GENE FAMILIES.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Plant Molecular Systematics (Phylogenetics). Systematics classifies species based on similarity of traits and possible mechanisms of evolution, a change.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
Comparative genomics Joachim Bargsten February 2012.
CHAPTER 25 PHYLOGENY AND SYSTEMATICS. Phylogeny- the evolutionary history of a species or group of related species. The Fossil Record and Geological Time.
A Web Interface to analyse SOM of Bipartitions of Gene Phylogenies - A Walk Through J. Peter Gogarten, Maria Poptsova Dept. of Molecular and Cell Biology.
What are microbes? helminths yeast viruses cyanobacteria protozoa
Molecular Evolution Revised 29/12/06
© Wiley Publishing All Rights Reserved. Phylogeny.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
The Cobweb of life revealed by Genome-Scale estimates of Horizontal Gene Transfer Fan Ge, Li-San Wang, Junhyong Kim Mourya Vardhan.
Sequence alignment: Removing ambiguous positions: Generation of pseudosamples: Calculating and evaluating phylogenies: Comparing phylogenies: Comparing.
Bioinformatics and Phylogenetic Analysis
"Nothing in biology makes sense except in the light of evolution" Theodosius Dobzhansky.
Isolation and Diversion in Allopatry Colonization events are more common on islands. When a physical barrier separates a population, a vicariance event.
Example of bipartition analysis for five genomes of photosynthetic bacteria (188 gene families) total 10 bipartitions R: Rhodobacter capsulatus, H: Heliobacillus.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
CHAPTER 25 TRACING PHYLOGENY. I. PHYLOGENY AND SYSTEMATICS A.TAXONOMY EMPLOYS A HIERARCHICAL SYSTEM OF CLASSIFICATION  SYSTEMATICS, THE STUDY OF BIOLOGICAL.
Trees as a Tool to Visualize Evolutionary History
MCB 372 #12: Tree, Quartets and Supermatrix Approaches Collaborators: Olga Zhaxybayeva (Dalhousie) Jinling Huang (ECU) Tim Harlow (UConn) Pascal Lapierre.
MCB 372 #14: Student Presentations, Discussion, Clustering Genes Based on Phylogenetic Information J. Peter Gogarten University of Connecticut Dept. of.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
Phylogenetic trees Sushmita Roy BMI/CS 576
Scientific FieldsScientific Fields  Different fields of science have contributed evidence for the theory of evolution  Anatomy  Embryology  Biochemistry.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Metagenomic Analysis Using MEGAN4
Chapter 15 Table of Contents Section 1 History of Evolutionary Thought
Chapter 25: Tracing Phylogeny. Phylogeny Phylon = tribe, geny = genesis or origin The evolutionary history of a species or a group of related species.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Classification and Systematics Tracing phylogeny is one of the main goals of systematics, the study of biological diversity in an evolutionary context.
Coalescence and the Cenancestor J. Peter Gogarten University of Connecticut Department of Molecular and Cell Biology.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.
MCB 3421 class 25. student evaluations Please follow this link to the on-line surveys that are open for you this semester.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Bacterial Classification- A. Need for a classification system B
Systematics and the Phylogenetic Revolution Chapter 23.
Chapter 8 Molecular Phylogenetics: Measuring Evolution.
Introduction to Phylogenetics
Phylogenetic analyses of cyanobacterial genomes: Quantification of horizontal gene transfer events Olga Zhaxybayeva, J. Peter Gogarten, Robert L. Charlebois,
PHYLOGENY AND SYSTEMATICS Chapter 25. Sedimentary rocks are the richest source of fossils  Fossils are the preserved remnants or impressions left by.
Part I and Chapter 1 Biology Sixth Edition Raven/Johnson (c) The McGraw-Hill Companies, Inc.
The theory of evolution is supported with the following evidence 1. Fossil record- using relative dating and carbon-14 dating to determine age of extinct.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
MCB 3421 class 26.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Chapter 26 Phylogeny and Systematics. Tree of Life Phylogeny – evolutionary history of a species or group - draw information from fossil record - organisms.
Molecular Clocks and Continued Research
Taxonomy & Phylogeny. B-5.6 Summarize ways that scientists use data from a variety of sources to investigate and critically analyze aspects of evolutionary.
Introduction to Bioinformatics Resources for DNA Barcoding
MCB 3421 class 26.
Evidence of Evolution Bio Explain how fossil, biochemical, and anatomical evidence support the theory of evolution.
Office hours aplenty...
Methods of molecular phylogeny
Evolution Notes.
Comments on bipartitions, quartets and supertrees
Chapter 19 Molecular Phylogenetics
Unit Genomic sequencing
Phylogeny and the Tree of Life
Presentation transcript:

New Tools for Visualizing Genome Evolution Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island J. Peter Gogarten Dept. of Molecular and Cell Biology University of Connecticut

Motivation  Early life on Earth has left a variety of traces that can be utilized to reconstruct the history of life the fossil and geological records information retained in living organisms  Our research focuses on how information can be gained from the molecular record: information about the history of life that is retained in the structure and sequences of macromolecules found in extant organisms  The analyses of the mosaic nature of genomes using phylogenetics will be a key ingredient to unravel the life's early history.

Relevance to NASA  The analyses are relevant in the context of NASA's Origin theme: Understand the origin and evolution of life on Earth.  We address questions that are central to NASA's Astrobiology program: Understand how past life on Earth interacted with its changing planetary and Solar System environment. Understand the evolutionary mechanisms and environmental limits of life.

Phylogenetics Phylogenetics (Greek: phylon = race and genetic = birth) is the taxonomical classification of organisms based on how closely they are related in terms of evolutionary differences.

Phylogenetic progression as envisioned by Darwin From C. Darwin Origin of Species

Euglena Trypanosoma Zea Paramecium Dictyostelium Entamoeba Naegleria Coprinus Porphyra Physarum Homo Tritrichomonas Sulfolobus Thermofilum Thermoproteus pJP 27 pJP 78 pSL 22 pSL 4 pSL 50 pSL 12 E.coli Agrobacterium Epulopiscium Aquifex Thermotoga Deinococcus Synechococcus Bacillus Chlorobium Vairimorpha Cytophaga Hexamita Giardia mitochondria chloroplast Haloferax Methanospirillum Methanosarcina Methanobacterium Thermococcus Methanopyrus Methanococcus A RCHAEA B ACTERIA E UCARYA Encephalitozoon Thermus EM 17 Marine group 1 Riftia Chromatium ORIGIN Treponema Fig. modified from Norman Pace SSU-rRNA Tree of Life

Phylogenetics: Classic View  All genes are inherited from ancestor.  Branching reflects speciation events.  Evolutionary tree follows very closely the SSU-rRNA tree.

However… Science, 280 p.672ff (1998) Aquifex is assigned to different branches of the tree…

Horizontal Gene Transfer (HGT) leads to Mosaic Genomes, where different parts of the genome have different histories. (a) concordant genes, (b) according to 16S (and other conserved genes) (c) according to phylogenetically discordant genes Gophna, U., Doolittle, W.F. & Charlebois, R.L.: Weighted genome trees: refinements and applications. J. Bacteriol. (in press)

A Revised Tree of Life

Evolutionary Processes Analogous to the Ones Proposed to Occur in the Microbial World + = + = Cartoons from Science Made Stupid, T. Weller, from

Visualizing Phylogenies  Visualize the relation of four organisms at a time  three unrooted trees  plot the support of various genes for each of the tree topologies in an equilateral triangle Orthologous Gene Families

Visualizing Phylogenies Synechocystis sp. (cyanobact.) Chlorobium tepidum (GSB) Rhodobacter capsulatus (  -prot) Rhodopseudomonas palustris (  -prot)

Constructing the Visualization Download four genomes (genome quartet) [a.a.sequences ] Download four genomes (genome quartet) [a.a.sequences ] “BLAST” every genome against every other genome “BLAST” every genome against every other genome Select top hit of every BLAST search Select top hit of every BLAST search Detect quartets of orthologs Detect quartets of orthologs Align quartets of orthologues using ClustalW Align quartets of orthologues using ClustalW Calculate maximum-likelihood values and posterior probabilities for all three tree topologies Calculate maximum-likelihood values and posterior probabilities for all three tree topologies Convert probabilities (barycentric coordinates) into Cartesian coordinates Convert probabilities (barycentric coordinates) into Cartesian coordinates Plot all points onto equilateral triangle Plot all points onto equilateral triangle

Visualizing Five Genomes  Five genomes => fifteen unrooted trees  Rather than triangle - dekapentagon A: Archaeoglobus S: Sulfolobus Y: Yeast R: Rhodobacter B: Bacillus A: Archaeoglobus S: Sulfolobus Y: Yeast R: Rhodobacter B: Bacillus Zhaxybayeva O, Hamel L, Raymond J, Gogarten JP: Visualization of Phylogenetic Content of Five Genomes with Dekapentagonal Maps. Genome Biology 2004, 5:R20

Visualizing Multiple Genomes Number of GenomesNumber of Unrooted Trees E E E E E E E+137 For comparison the universe contains only about protons and has an age of about 5*10 17 seconds or 5*10 29 picoseconds.  Given this explosion, plotting all possible relationships as unrooted trees is impossible.

Visualizing Multiple Genomes: SOMs  SOM  Self-Organizing Map  An artificial neural network approach to clustering we are looking for clusters of genes which favor certain tree topologies  Advantages over other clustering approaches: No a priori knowledge of how many clusters to expect Explicit summary of commonalities and differences between clusters Cluster membership is not exclusive – a gene can indicate membership in multiple clusters at the same time Visually appealing representation T. Kohonen, Self-organizing maps, 3rd ed. Berlin ; New York: Springer, 2001.

Visualization with SOMs /

Training a SOM Tree #1 (T 1 )…Tree #k (T k ) Support value vector for a set #1 of orthologous genes P 11 …P 1k Support value vector for a set #2 of orthologous genes P 21 …P 2k … ……… Support value vector for a set #n of orthologous genes P n1 …P nk x y k SOM Regression Equations:  - learning rate  - neighborhood distance mimi Data Set SOM Neural Elements SOM Visualization

Anatomy of a Trained SOM In our case k = 15 for the fifteen tree topologies

Visualizing a Larger Number of Genomes 13 gamma-proteobacterial genomes (258 putative orthologs): E.coli Buchnera Haemophilus Pasteurella Salmonella Yersinia pestis (2 strains) Vibrio Xanthomonas (2 sp.) Pseudomonas Wigglesworhtia Vibrio Cholerae There are 13,749,310,575 possible unrooted tree topologies for 13 genomes  switch to bipartitions

Bipartition of a Phylogenetic Tree Bipartition – a division of a phylogenetic tree into two parts that are connected by a single branch. It divides a dataset into two groups, but it does not consider the relationships within each of the two groups. 95 Here 95 represents the bootstrap support for the internal branch. The number of bipartitions for N genomes is equal to 2 (N-1) -N-1.

Bipartitions: Lento Plot & SOM Strongly supported bipartitions in SOM Strongly supported bipartitions

Conclusions & Future Work  Self-Organizing Maps seem to be an effective way to visualize mosaic genome evolution.  Corroborate findings with other methodologies  Scalable  In the Future: Larger data sets Locally Linear Embedding (LLE)

Acknowledgements  Olga A. Zhaxybayeva Dept. of Biochemistry and Molecular Biology Dalhousie University  Maria Poptsova Dept. of Molecular and Cell Biology University of Connecticut