Presentation is loading. Please wait.

Presentation is loading. Please wait.

New Tools for Visualizing Genome Evolution Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island J. Peter Gogarten Dept. of Molecular.

Similar presentations


Presentation on theme: "New Tools for Visualizing Genome Evolution Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island J. Peter Gogarten Dept. of Molecular."— Presentation transcript:

1 New Tools for Visualizing Genome Evolution Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island J. Peter Gogarten Dept. of Molecular and Cell Biology University of Connecticut

2 Motivation  Early life on Earth has left a variety of traces that can be utilized to reconstruct the history of life the fossil and geological records information retained in living organisms  Our research focuses on how information can be gained from the molecular record: information about the history of life that is retained in the structure and sequences of macromolecules found in extant organisms  The analyses of the mosaic nature of genomes using phylogenetics will be a key ingredient to unravel the life's early history.

3 Relevance to NASA  The analyses are relevant in the context of NASA's Origin theme: Understand the origin and evolution of life on Earth.  We address questions that are central to NASA's Astrobiology program: Understand how past life on Earth interacted with its changing planetary and Solar System environment. Understand the evolutionary mechanisms and environmental limits of life.

4 Phylogenetics Phylogenetics (Greek: phylon = race and genetic = birth) is the taxonomical classification of organisms based on how closely they are related in terms of evolutionary differences.

5 Phylogenetic progression as envisioned by Darwin From C. Darwin Origin of Species

6 Euglena Trypanosoma Zea Paramecium Dictyostelium Entamoeba Naegleria Coprinus Porphyra Physarum Homo Tritrichomonas Sulfolobus Thermofilum Thermoproteus pJP 27 pJP 78 pSL 22 pSL 4 pSL 50 pSL 12 E.coli Agrobacterium Epulopiscium Aquifex Thermotoga Deinococcus Synechococcus Bacillus Chlorobium Vairimorpha Cytophaga Hexamita Giardia mitochondria chloroplast Haloferax Methanospirillum Methanosarcina Methanobacterium Thermococcus Methanopyrus Methanococcus A RCHAEA B ACTERIA E UCARYA Encephalitozoon Thermus EM 17 Marine group 1 Riftia Chromatium ORIGIN Treponema Fig. modified from Norman Pace SSU-rRNA Tree of Life

7 Phylogenetics: Classic View  All genes are inherited from ancestor.  Branching reflects speciation events.  Evolutionary tree follows very closely the SSU-rRNA tree.

8 However… Science, 280 p.672ff (1998) Aquifex is assigned to different branches of the tree…

9 Horizontal Gene Transfer (HGT) leads to Mosaic Genomes, where different parts of the genome have different histories. (a) concordant genes, (b) according to 16S (and other conserved genes) (c) according to phylogenetically discordant genes Gophna, U., Doolittle, W.F. & Charlebois, R.L.: Weighted genome trees: refinements and applications. J. Bacteriol. (in press)

10 A Revised Tree of Life

11 Evolutionary Processes Analogous to the Ones Proposed to Occur in the Microbial World + = + = Cartoons from Science Made Stupid, T. Weller, 1986. from http://www.besse.at/sms/

12 Visualizing Phylogenies  Visualize the relation of four organisms at a time  three unrooted trees  plot the support of various genes for each of the tree topologies in an equilateral triangle Orthologous Gene Families

13 Visualizing Phylogenies Synechocystis sp. (cyanobact.) Chlorobium tepidum (GSB) Rhodobacter capsulatus (  -prot) Rhodopseudomonas palustris (  -prot)

14 Constructing the Visualization Download four genomes (genome quartet) [a.a.sequences ] Download four genomes (genome quartet) [a.a.sequences ] “BLAST” every genome against every other genome “BLAST” every genome against every other genome Select top hit of every BLAST search Select top hit of every BLAST search Detect quartets of orthologs Detect quartets of orthologs Align quartets of orthologues using ClustalW Align quartets of orthologues using ClustalW Calculate maximum-likelihood values and posterior probabilities for all three tree topologies Calculate maximum-likelihood values and posterior probabilities for all three tree topologies Convert probabilities (barycentric coordinates) into Cartesian coordinates Convert probabilities (barycentric coordinates) into Cartesian coordinates Plot all points onto equilateral triangle Plot all points onto equilateral triangle

15 Visualizing Five Genomes  Five genomes => fifteen unrooted trees  Rather than triangle - dekapentagon A: Archaeoglobus S: Sulfolobus Y: Yeast R: Rhodobacter B: Bacillus A: Archaeoglobus S: Sulfolobus Y: Yeast R: Rhodobacter B: Bacillus Zhaxybayeva O, Hamel L, Raymond J, Gogarten JP: Visualization of Phylogenetic Content of Five Genomes with Dekapentagonal Maps. Genome Biology 2004, 5:R20

16 Visualizing Multiple Genomes Number of GenomesNumber of Unrooted Trees 31 43 515 6105 7945 810395 9135135 102027025 202.22E+020 308.69E+036 401.31E+055 502.84E+074 605.01E+094 705.00E+115 802.18E+137 For comparison the universe contains only about 10 89 protons and has an age of about 5*10 17 seconds or 5*10 29 picoseconds.  Given this explosion, plotting all possible relationships as unrooted trees is impossible.

17 Visualizing Multiple Genomes: SOMs  SOM  Self-Organizing Map  An artificial neural network approach to clustering we are looking for clusters of genes which favor certain tree topologies  Advantages over other clustering approaches: No a priori knowledge of how many clusters to expect Explicit summary of commonalities and differences between clusters Cluster membership is not exclusive – a gene can indicate membership in multiple clusters at the same time Visually appealing representation T. Kohonen, Self-organizing maps, 3rd ed. Berlin ; New York: Springer, 2001.

18 Visualization with SOMs 12 11 11/4 4 15 9

19 Training a SOM Tree #1 (T 1 )…Tree #k (T k ) Support value vector for a set #1 of orthologous genes P 11 …P 1k Support value vector for a set #2 of orthologous genes P 21 …P 2k … ……… Support value vector for a set #n of orthologous genes P n1 …P nk x y k SOM Regression Equations:  - learning rate  - neighborhood distance mimi Data Set SOM Neural Elements SOM Visualization

20 Anatomy of a Trained SOM In our case k = 15 for the fifteen tree topologies. 1 - 5 6 - 10 11 - 15

21 Visualizing a Larger Number of Genomes 13 gamma-proteobacterial genomes (258 putative orthologs): E.coli Buchnera Haemophilus Pasteurella Salmonella Yersinia pestis (2 strains) Vibrio Xanthomonas (2 sp.) Pseudomonas Wigglesworhtia Vibrio Cholerae There are 13,749,310,575 possible unrooted tree topologies for 13 genomes  switch to bipartitions

22 Bipartition of a Phylogenetic Tree Bipartition – a division of a phylogenetic tree into two parts that are connected by a single branch. It divides a dataset into two groups, but it does not consider the relationships within each of the two groups. 95 Here 95 represents the bootstrap support for the internal branch. The number of bipartitions for N genomes is equal to 2 (N-1) -N-1.

23 Bipartitions: Lento Plot & SOM Strongly supported bipartitions in SOM Strongly supported bipartitions

24 Conclusions & Future Work  Self-Organizing Maps seem to be an effective way to visualize mosaic genome evolution.  Corroborate findings with other methodologies  Scalable  In the Future: Larger data sets Locally Linear Embedding (LLE)

25 Acknowledgements  Olga A. Zhaxybayeva Dept. of Biochemistry and Molecular Biology Dalhousie University  Maria Poptsova Dept. of Molecular and Cell Biology University of Connecticut


Download ppt "New Tools for Visualizing Genome Evolution Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island J. Peter Gogarten Dept. of Molecular."

Similar presentations


Ads by Google