Christian M Zmasek, PhD Burnham Institute for Medical Research Bioinformatics and Systems Biology www.phylosoft.org www.phyloxml.org.

Slides:



Advertisements
Similar presentations
Introductory to database handling Endre Sebestyén.
Advertisements

XML: Extensible Markup Language
13:10:58 A New Tool for Mapping Microarray Data onto the Gene Ontology Structure ( Abstract e GOn (explore Gene Ontology) is a.
Centers of Excellence for Influenza Research and Surveillance 6 th Annual Meeting Aug 1, 2012 Status of IRD Development.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Nomenclature is the science of naming organisms Evolution has created an enormous diversity, so how do we deal with it? Names allow us to talk about groups.
Plant Molecular Systematics (Phylogenetics). Systematics classifies species based on similarity of traits and possible mechanisms of evolution, a change.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Molecular Evolution Revised 29/12/06
© Wiley Publishing All Rights Reserved. Phylogeny.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
Bioinformatics and Phylogenetic Analysis
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
Han, Mira and Zmasek, Christian. Biological database NCBI,Uniprot… Application Bioperl, BioRuby, BioPhyton Visual Representaion In browser TreeJuxtaposer,
Comparative Genomics of Viruses: VirGen as a case study Dr. Urmila Kulkarni-Kale Bioinformatics Centre University of Pune Pune
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
CPSC 203 Introduction to Computers Lab 39, 40 By Jie (Jeff) Gao.
Moving beyond free text. Authors Scientist does research Scientist publishes research results in journal article Old Paradigm:
Input for the Bayesian Phylogenetic Workflow All Input values could be loaded as text file or typing directly. Only for the multifasta file is advised.
XML – Extensible Markup Language XML eXtensible – add to language. Markup – delimit info using tags. Language – a way to express info.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
Christian M Zmasek, PhD 15 June 2010.
Copyright OpenHelix. No use or reproduction without express written consent1.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Tutorial session 2 Network annotation Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Introduction to Phylogenetic Trees
Evolutionary Biology Concepts Molecular Evolution Phylogenetic Inference BIO520 BioinformaticsJim Lund Reading: Ch7.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Phylogenomics “The intersection of phylogenetics and genomics”
Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Phylogeny and visualization: MEGA and iTOL Yanbin Yin Spring
P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.
Statistical Testing with Genes Saurabh Sinha CS 466.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
A collaborative tool for sequence annotation. Contact:
TreeBASE and Phyloinformatics Roderic Page University of Glasgow.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Tree Reconciliation: Notung Reconciliations Notes on how to map Notung format files to a reconciliation map that can be imported to TR database.
Copyright OpenHelix. No use or reproduction without express written consent1.
Phylogenetics.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
XML Extensible Markup Language
Biocomputational Languages December 1, 2011 Greg Antell & Khoa Nguyen.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Phylogeny and the Tree of Life
Bioinformatics Overview
XML: Extensible Markup Language
Introduction to Bioinformatics Resources for DNA Barcoding
Evolutionary genomics can now be applied beyond ‘model’ organisms
Pipelines for Computational Analysis (Bioinformatics)
Evolutionary history of related organisms
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Why could a gene tree be different from the species tree?
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Supporting High-Performance Data Processing on Flat-Files
Introduction to Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Christian M Zmasek, PhD Burnham Institute for Medical Research Bioinformatics and Systems Biology

Phylogenomics Original definition the application of phylogenetic information for gene function analysis (Eisen, 1998) Recent usage species evolution based on whole genome analyses (for example, Dunn et al., 2008) various types of studies at the intersection of genomics and phylogenetics 2www.phyloxml.org

The application of phylogenetic information for gene function analysis RAT MOUSE HUMAN CIONA RAT CIONA MOUSE HUMAN CIONA RAT CIONA Y Z : query sequence : orthologous to query : most similar to query : gene duplication RAT X Z Y 3www.phyloxml.org

What information do we need for a phylogenomic analysis (sequence function analysis type)? In phylogenomic analyzes, tree nodes might be annotated with: Sequence name Species name Duplication: true/false Branches might be annotated with: Branch lengths Support values (bootstrap, probability, …) 4www.phyloxml.org

What information might we need for other types of phylogenomic analyses? Support values (possible multiple) Taxonomy information (possibly detailed) Geographic information Host/parasite data (relation between tree nodes) Gene expression values Genomic location Mutations, variation, disease … 5www.phyloxml.org

How is this information processed and stored? Tree topologies are described by hierarchical parenthesis: ((A,B),C) Unique tree node labels mapped to text files, spreadsheets, databases Manual processing of text files with text editors Macros, shell scripts, Perl scripts New Hamphshire eXtended (NHX) format Adds tags for different fields: Species: S= Bootstrap support: B= Example: ADH2:0.1[&&NHX:S=human:B=90] 6www.phyloxml.org

How is this information published? Mostly as images of phylogenetic trees in journals not suitable as input for further studies! Submission to (publicly accessible) databases rare 7www.phyloxml.org

Problems with this approach Tedious Error prone Published images are difficult to use as input for further studies Meta-analyzes are hard Different, and incompatible, “dialects” of NHX appeared Limited expressiveness 8www.phyloxml.org

phyloXML by example example from Prof. Joe Felsenstein's book "Inferring Phylogenies“ 0.06 A B 0.23 C 0.4 9www.phyloxml.org

phyloXML Important elements: Taxonomy Sequence Confidence Events (duplication, speciation) Property (“custom data”) Typed relations (between clades, sequences) XSD schema, examples, description, applications: Current version: 1.o 10www.phyloxml.org

Important clade level elements

phyloXML applications/implementations (examples) BioPerl: Parser, writer ATV — A Tree Viewer Java based tree display tool suitable for large (>10 000) and highly decorated phylogenetic/taxonomic trees phyloxml_converter Command line tool to convert Newick (NH), NHX, and Nexus formatted trees to phyloXML