PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen BeNeLux BioInformatics Conference 2006
Introduction (1) Phylogenetic patterns show presence/absence of genes over a certain set of species: e.g. for 10 species: Very useful for all kinds of evolutionary analyses: –Origin of certain genes –Deletion of certain genes –Clustering of genes with similar patterns: likely to have similar function / be in same pathway
Introduction (2) Earlier phylogenetic pattern initiatives: –Phylogenetic Pattern Search (PPS), incorporated into COG (Natale et al., 2000) –Extended Phylogenetic Patterns Search (EPPS) (Reichard & Kaufmann, 2003) –Incorporated into OrthoMCL-DB (Chen et al., 2006) All applied on proteins, not on genes! PhyloPat: phylogenetic pattern analysis of eukaryotic genes
Method Genes: easier to check for lineage-specific expansions (no alternative transcripts or splice forms); less redundant Basis: Ensembl (EnsMart) database: 21 fully available genomes (i.e. no Pre! versions or low coverage genomes): S. cer. to H. sap. Make use of accurate Ensembl orthology pipeline (combination of BLAST,SW,MUSCLE and PHYML) Single linkage cluster algorithm: create orthologous groups containing ALL genes in Ensembl
Results 446,825 genes were clustered into 147,922 groups, using 3,164,088 orthologies from 21 species Species ordered from ‘low’ ( ) to ‘high’ ( ), i.e. approximate distance to human : Can be queried in several ways Output in HTML, Excel or plain text format
Web interface
Pattern/ID Search Binary string: 0=absent, 1=present, *=absent/present e.g. ‘00000******** ’: must be absent in non-chordata, must be present in all mammals MySQL regular expression: e.g. ‘^0*1{10}0*$’ gives all genes that occur only in ten subsequent species Input list of Ensembl/EMBL IDs (PhyloPat contains EMBL to Ensembl mapping)
Output
Phylogenetic Tree
Oligo-/Polypresent Genes Oligopresent: present in only one/two species (oligo=few), e.g. ‘ ’ These two species should be highly related 1. C. sav C. int 1737div. 100 Mya (Boffelli et al., 2004) 2. T. nig T. rub 1572div. 85 Mya (Yakanoue et al., 2006) 3. A. gam A. Aeg 1058div. 140 Mya (Service, 1993) 4. P. tro H. sap 887div. 6 Mya (Glazko & Nei, 2003) 5. R. nor M. Mus 713div. 20 Mya (Springer et al., 2003) Polypresent: present in all species, except for one/two (poly=many), e.g. ‘ ’ These two species should be related too; similar analysis possible
Omnipresent genes Omnipresent: present in all 21 species (omni=all): ‘ ’ Currently 1001 omnipresent groups Tend to have very general/important functions, mostly involved in transcription/translation
FatiGO analysis FatiGO: connection with GO terms, KEGG pathways, InterPro domains, etc. (El-Shahrour et al., 2004) Analysis of all human genes in output by just one mouse click e.g. omnipresent genes:
Other possibilities Anti-correlating patterns: e.g. ‘ ’ and ‘ ’ could be completely different, or very similar (analogous)! Easy homology-inferred functional annotation (using information from other genes in the same lineage)
Case study: Hox genes (1) Hox genes determine where limbs and other body segments will grow in a developing embryo Should exist mostly in vertebrates Expansion in teleost fish species (, 8-11); seven Hox clusters instead of the mammalian four Search Ensembl database for human genes with term ‘hox’ in annotation 44 genes found -> enter in PhyloPat -> 32 groups found (PP######)
Case study: Hox genes (2) PPID # genes per species phylogenetic pattern gene name(s) PP MSX1, MSX2 PP HOXC4 PP TLX1, TLX2, TLX3 PP HOXB8, HOXC8, HOXD8 PP HOXD11 PP HOXA10 PP HOXC13, HOXD13 PP HOXA1, HOXB1 PP HOXB4 PP HOXA5 PP HOXB2 PP HOXD3 PP HOXA9 PP HOXA3 PP HOXC12 PP HOXD4 PP HOXC11 PP HOXA13 PP HOXB5 PP HOXB3 PP HOXD10 PP HOXA2 PP HOXA6, HOXB6, HOXC6 PP HOXA4 PP HOXB9, HOXC9, HOXD9 PP HOXA11 PP HOXA7, HOXB7 PP HOXC5 PP HOXC10 PP HOXD1 PP HOXD12 PP HOXB13
Case study: Hox genes (3) PPID(s) name cl.A cl.B cl.C cl.D first sp. position PP053829, HOX1 HOXA1 HOXB1 HOXD1 T. nigrov. anterior PP053847, HOX2 HOXA2 HOXB2 T. nigrov. anterior PP053836,053845, HOX3 HOXA3 HOXB3 HOXD3 T. nigrov. PG3 PP053832,053844, HOX5 HOXA5 HOXB5 HOXC5 T. nigrov. central PP HOX6 HOXA6 HOXB6 HOXC6 T. nigrov. central PP053835, HOX9 HOXA9 HOXB9 HOXC9 HOXD9 T. nigrov. posterior PP053827,084287, HOX10 HOXA10 HOXC10 HOXD10 T. nigrov. posterior PP053858,053840, HOX11 HOXA11 HOXC11 HOXD11 T. nigrov. posterior PP053838, HOX12 HOXC12 HOXD12 T. nigrov. posterior PP053842,089685, HOX13 HOXA13 HOXB13 HOXC13 HOXD13 T. nigrov. posterior PP053853,053830,024984, HOX4 HOXA4 HOXB4 HOXC4 HOXD4 A. gamb. central PP TLX TLX1 TLX2 TLX3 A. gamb. PP HOX7 HOXA7 HOXB7 G. acul. central PP HOX8 HOXB8 HOXC8 HOXD8 C. intest. central PP MSX MSX1 MSX2 C. eleg. ‘First’ vertebrate Non- vertebrate Non- vertebrate Non- vertebrate Vertebrate
Conclusions PhyloPat: quick and easy tool for phylogenetic pattern search on complete Ensembl database Also usable for study of lineage-specific expansions of genes Just updated to Ensembl v41 (released last Thursday); 5 new species: D.nov E.tel L.afr O.cun O.lat
Acknowledgements Supervision: Peter Groenen Jacob de Vlieg Fruitful discussions: Wilco Fleuren Erik Franck Nanning de Jong Arnold Kuzniar supervisor head of group suggestions
Where to find Web interface: (accessible through and Publication: Hulsen T., Groenen P.M.A., de Vlieg J. BMC Bioinformatics 2006, 7: Powered by Ensembl: Poster P-20