TreeBASE and Phyloinformatics Roderic Page University of Glasgow
At the core of a ToL effort must be a “phyloinformatic intrastructure” Tools for: data and tree storage analysis (supertrees, supermatrices) collaboration meta analysis
It’s a scandal We cannot answer even the most basic question: “what is the phylogeny for group x?” GenBank is currently the best phylogenetic database(!) Can't even say how many species are in a given group Little idea of who is doing what
Tree of Life tolweb.org Provides text and images Relies on extensive manual effort (e.g., writing text) Can’t do any computations with it Limited research value
TreeBASE Relational database Query by author, taxon, study number Compute supertrees Submit NEXUS data files
TreeBASE and mincut supertrees User selects two or more trees Clicks on button and script on darwin.zoology.gla.ac.uk is run to create supertree Can view as PS, PDF, treefile, or in Java applet (ATV)
Dependencies amongst studies (Gatesy et al.)
What’s wrong with TreeBASE? No consistency of taxon names (e.g., Human, Homo sapiens, Homo sapiens X ) No consistency of data names (e.g., gene names, morphological characters, etc.)
What needs to be done to TreeBASE? Consistency of taxon names Consistency of data names (e.g., gene names)
General issues Develop tools for rapid construction of supertrees and supermatrices Visualisation of trees (and other graphs) Queries to highlight areas of uncertainty Easy submission of rigorously annotated data Resolve centralisation versus distributed (one database or many?)
The single most important thing we could do is to create a phyloloinformatic infrastructure to support ToL studies (IMHO)