PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-10-17 BeNeLux BioInformatics Conference 2006.

Slides:



Advertisements
Similar presentations
1 / 30 Data Mining with BioMart
Advertisements

Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
MDI Retraite 2007 Evolution of the immune system from model organism to man Tim Hulsen 1, Wilco W.M. Fleuren 1, Peter M.A. Groenen 2 1 CMBI, Radboud University.
Bioinformatics Werkbespreking – PhyloPat: phylogenetic pattern analysis of eukaryotic genes (20 slides) 2 – Chicken-human immunogenomics project.
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Benchmarking Orthology in Eukaryotes Nijmegen Tim Hulsen.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.
Introduction to Bioinformatics - Tutorial no. 12
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
An update on ongoing projects within Biorange SP Biorange Project Meeting Leiden, September 15 Tim Hulsen.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
1 SRI International Bioinformatics Advanced PGDB Editing: Regulation GO Terms Ingrid M. Keseler Bioinformatics Research Group SRI International
ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University.
HOGENOM a phylogenomic database
Genomics in Drug Organon, Oss Tim Hulsen.
Christian M Zmasek, PhD Burnham Institute for Medical Research Bioinformatics and Systems Biology
Managing Data Modeling GO Workshop 3-6 August 2010.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
Biorange Meeting PhyloPat phylogenetic pattern analysis of eukaryotic genes & Immunophyle its application on the evolution of the immune system.
DAVID Genome Biol. 2003;4(5):P3 Analysis of gene lists using DAVID
Data Mining in Ensembl with BioMart Nov,
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Copyright © 2005 Brooks/Cole — Thomson Learning Biology, Seventh Edition Solomon Berg Martin Chapter 16 Genes and Development.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Protein and RNA Families
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14: , Genome research EBI, Wellcome Trust.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Copyright OpenHelix. No use or reproduction without express written consent1.
P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.
Copyright OpenHelix. No use or reproduction without express written consent1.
ID Mapping to accessions from different databases. COST Functional Modeling Workshop April, Helsinki.
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Gerstein Lab Aims in ModENCODE.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Levels at which eukaryotic gene expression is controlled
1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Organon Tim Hulsen, Oss, November 11, 2003.
The evolution of the immune system in chicken and higher Organon, Oss Tim Hulsen.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Patterns in Development Pattern formation must be established via induction prior to morphogenesis. The pattern formation is related to the body plan (its.
Genes and Body plans How does an organism become a zygote
Copyright OpenHelix. No use or reproduction without express written consent1.
Testing sequence comparison methods with structure Organon, Oss Tim Hulsen.
Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment Raja Jothi, Teresa.
Biocomputational Languages December 1, 2011 Greg Antell & Khoa Nguyen.
Data Mining with BioMart
GSEA-Pro Tutorial Anne de Jong University of Groningen.
University of Pittsburgh
UniProt: Universal Protein Resource
Stage 10 Human Embryo (~ 22 Days)
ID Mapping tools: Converting Accessions between Databases
Genome organization and Bioinformatics
Ensembl Genome Repository.
PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics.
Conservation in Evolution
Problems from last section
Identical Genomic Organization of Two Hemichordate Hox Clusters
Origins and Impacts of New Mammalian Exons
Presentation transcript:

PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen BeNeLux BioInformatics Conference 2006

Introduction (1) Phylogenetic patterns show presence/absence of genes over a certain set of species: e.g. for 10 species: Very useful for all kinds of evolutionary analyses: –Origin of certain genes –Deletion of certain genes –Clustering of genes with similar patterns: likely to have similar function / be in same pathway

Introduction (2) Earlier phylogenetic pattern initiatives: –Phylogenetic Pattern Search (PPS), incorporated into COG (Natale et al., 2000) –Extended Phylogenetic Patterns Search (EPPS) (Reichard & Kaufmann, 2003) –Incorporated into OrthoMCL-DB (Chen et al., 2006) All applied on proteins, not on genes!  PhyloPat: phylogenetic pattern analysis of eukaryotic genes

Method Genes: easier to check for lineage-specific expansions (no alternative transcripts or splice forms); less redundant Basis: Ensembl (EnsMart) database: 21 fully available genomes (i.e. no Pre! versions or low coverage genomes): S. cer. to H. sap. Make use of accurate Ensembl orthology pipeline (combination of BLAST,SW,MUSCLE and PHYML) Single linkage cluster algorithm: create orthologous groups containing ALL genes in Ensembl

Results 446,825 genes were clustered into 147,922 groups, using 3,164,088 orthologies from 21 species Species ordered from ‘low’ ( ) to ‘high’ ( ), i.e. approximate distance to human : Can be queried in several ways Output in HTML, Excel or plain text format

Web interface

Pattern/ID Search Binary string: 0=absent, 1=present, *=absent/present e.g. ‘00000******** ’:  must be absent in non-chordata, must be present in all mammals MySQL regular expression: e.g. ‘^0*1{10}0*$’  gives all genes that occur only in ten subsequent species Input list of Ensembl/EMBL IDs (PhyloPat contains EMBL to Ensembl mapping)

Output

Phylogenetic Tree

Oligo-/Polypresent Genes Oligopresent: present in only one/two species (oligo=few), e.g. ‘ ’ These two species should be highly related 1. C. sav C. int 1737div. 100 Mya (Boffelli et al., 2004) 2. T. nig T. rub 1572div. 85 Mya (Yakanoue et al., 2006) 3. A. gam A. Aeg 1058div. 140 Mya (Service, 1993) 4. P. tro H. sap 887div. 6 Mya (Glazko & Nei, 2003) 5. R. nor M. Mus 713div. 20 Mya (Springer et al., 2003) Polypresent: present in all species, except for one/two (poly=many), e.g. ‘ ’ These two species should be related too; similar analysis possible

Omnipresent genes Omnipresent: present in all 21 species (omni=all): ‘ ’ Currently 1001 omnipresent groups Tend to have very general/important functions, mostly involved in transcription/translation

FatiGO analysis FatiGO: connection with GO terms, KEGG pathways, InterPro domains, etc. (El-Shahrour et al., 2004) Analysis of all human genes in output by just one mouse click e.g. omnipresent genes:

Other possibilities Anti-correlating patterns: e.g. ‘ ’ and ‘ ’  could be completely different, or very similar (analogous)! Easy homology-inferred functional annotation (using information from other genes in the same lineage)

Case study: Hox genes (1) Hox genes determine where limbs and other body segments will grow in a developing embryo Should exist mostly in vertebrates Expansion in teleost fish species (, 8-11); seven Hox clusters instead of the mammalian four Search Ensembl database for human genes with term ‘hox’ in annotation 44 genes found -> enter in PhyloPat -> 32 groups found (PP######)

Case study: Hox genes (2) PPID # genes per species phylogenetic pattern gene name(s) PP MSX1, MSX2 PP HOXC4 PP TLX1, TLX2, TLX3 PP HOXB8, HOXC8, HOXD8 PP HOXD11 PP HOXA10 PP HOXC13, HOXD13 PP HOXA1, HOXB1 PP HOXB4 PP HOXA5 PP HOXB2 PP HOXD3 PP HOXA9 PP HOXA3 PP HOXC12 PP HOXD4 PP HOXC11 PP HOXA13 PP HOXB5 PP HOXB3 PP HOXD10 PP HOXA2 PP HOXA6, HOXB6, HOXC6 PP HOXA4 PP HOXB9, HOXC9, HOXD9 PP HOXA11 PP HOXA7, HOXB7 PP HOXC5 PP HOXC10 PP HOXD1 PP HOXD12 PP HOXB13

Case study: Hox genes (3) PPID(s) name cl.A cl.B cl.C cl.D first sp. position PP053829, HOX1 HOXA1 HOXB1 HOXD1 T. nigrov. anterior PP053847, HOX2 HOXA2 HOXB2 T. nigrov. anterior PP053836,053845, HOX3 HOXA3 HOXB3 HOXD3 T. nigrov. PG3 PP053832,053844, HOX5 HOXA5 HOXB5 HOXC5 T. nigrov. central PP HOX6 HOXA6 HOXB6 HOXC6 T. nigrov. central PP053835, HOX9 HOXA9 HOXB9 HOXC9 HOXD9 T. nigrov. posterior PP053827,084287, HOX10 HOXA10 HOXC10 HOXD10 T. nigrov. posterior PP053858,053840, HOX11 HOXA11 HOXC11 HOXD11 T. nigrov. posterior PP053838, HOX12 HOXC12 HOXD12 T. nigrov. posterior PP053842,089685, HOX13 HOXA13 HOXB13 HOXC13 HOXD13 T. nigrov. posterior PP053853,053830,024984, HOX4 HOXA4 HOXB4 HOXC4 HOXD4 A. gamb. central PP TLX TLX1 TLX2 TLX3 A. gamb. PP HOX7 HOXA7 HOXB7 G. acul. central PP HOX8 HOXB8 HOXC8 HOXD8 C. intest. central PP MSX MSX1 MSX2 C. eleg. ‘First’ vertebrate Non- vertebrate Non- vertebrate Non- vertebrate Vertebrate

Conclusions PhyloPat: quick and easy tool for phylogenetic pattern search on complete Ensembl database Also usable for study of lineage-specific expansions of genes Just updated to Ensembl v41 (released last Thursday); 5 new species: D.nov E.tel L.afr O.cun O.lat

Acknowledgements Supervision: Peter Groenen Jacob de Vlieg Fruitful discussions: Wilco Fleuren Erik Franck Nanning de Jong Arnold Kuzniar supervisor head of group suggestions

Where to find Web interface: (accessible through and Publication: Hulsen T., Groenen P.M.A., de Vlieg J. BMC Bioinformatics 2006, 7: Powered by Ensembl: Poster P-20