Biorange Meeting PhyloPat phylogenetic pattern analysis of eukaryotic genes & Immunophyle its application on the evolution of the immune system Tim Hulsen
Introduction (1) Phylogenetic patterns show presence/absence of genes over a certain set of species: e.g. for 10 species: Very useful for all kinds of evolutionary analyses: –Origin of certain genes –Deletion of certain genes –Clustering of genes with similar patterns: likely to have similar function / be in same pathway
Introduction (2) Earlier phylogenetic pattern initiatives: –Phylogenetic Pattern Search (PPS), incorporated into COG (Natale et al., 2000) –Extended Phylogenetic Patterns Search (EPPS) (Reichard & Kaufmann, 2003) –Incorporated into OrthoMCL-DB (Chen et al., 2006) All applied on proteins, not on genes! PhyloPat: phylogenetic pattern analysis of eukaryotic genes
Method Genes: easier to check for lineage-specific expansions (no alternative transcripts or splice forms); less redundant Originally performed on Ensembl (EnsMart) database v40: 21 fully available genomes (i.e. no Pre! versions or low coverage genomes): S. cer. to H. sap. Make use of accurate Ensembl orthology pipeline (combination of BLAST,SW,MUSCLE and PHYML) Single linkage cluster algorithm: create orthologous groups containing ALL genes in Ensembl
Results 446,825 genes were clustered into 147,922 groups, using 3,164,088 orthologies from 21 species Species ordered from ‘low’ ( ) to ‘high’ ( ), i.e. approximate distance to human : Can be queried in several ways Output in HTML, Excel or plain text format
Web interface
Pattern/ID Search Binary string: 0=absent, 1=present, *=absent/present e.g. ‘00000******** ’: must be absent in non-chordata, must be present in all mammals MySQL regular expression: e.g. ‘^0*1{10}0*$’ gives all genes that occur only in ten subsequent species Input list of Ensembl/EMBL IDs (PhyloPat contains EMBL to Ensembl mapping)
Output
Phylogenetic Tree
Oligo-/Polypresent Genes Oligopresent: present in only one/two species (oligo=few), e.g. ‘ ’ These two species should be highly related 1. C. sav C. int 1737div. 100 Mya (Boffelli et al., 2004) 2. T. nig T. rub 1572div. 85 Mya (Yakanoue et al., 2006) 3. A. gam A. Aeg 1058div. 140 Mya (Service, 1993) 4. P. tro H. sap 887div. 6 Mya (Glazko & Nei, 2003) 5. R. nor M. Mus 713div. 20 Mya (Springer et al., 2003) Polypresent: present in all species, except for one/two (poly=many), e.g. ‘ ’ These two species should be related too; similar analysis possible
Omnipresent genes Omnipresent: present in all 21 species (omni=all): ‘ ’ Currently 1001 omnipresent groups Tend to have very general/important functions, mostly involved in transcription/translation
FatiGO analysis FatiGO: connection with GO terms, KEGG pathways, InterPro domains, etc. (El-Shahrour et al., 2004) Analysis of all human genes in output by just one mouse click e.g. omnipresent genes:
Other possibilities Anti-correlating patterns: e.g. ‘ ’ and ‘ ’ could be completely different, or very similar (analogous)! Easy homology-inferred functional annotation (using information from other genes in the same lineage)
Case study: Hox genes (1) Hox genes determine where limbs and other body segments will grow in a developing embryo Should exist mostly in vertebrates Expansion in teleost fish species (, 8-11); seven Hox clusters instead of the mammalian four Search Ensembl database for human genes with term ‘hox’ in annotation 44 genes found -> enter in PhyloPat -> 32 groups found (PP######)
Case study: Hox genes (2) PPID # genes per species phylogenetic pattern gene name(s) PP MSX1, MSX2 PP HOXC4 PP TLX1, TLX2, TLX3 PP HOXB8, HOXC8, HOXD8 PP HOXD11 PP HOXA10 PP HOXC13, HOXD13 PP HOXA1, HOXB1 PP HOXB4 PP HOXA5 PP HOXB2 PP HOXD3 PP HOXA9 PP HOXA3 PP HOXC12 PP HOXD4 PP HOXC11 PP HOXA13 PP HOXB5 PP HOXB3 PP HOXD10 PP HOXA2 PP HOXA6, HOXB6, HOXC6 PP HOXA4 PP HOXB9, HOXC9, HOXD9 PP HOXA11 PP HOXA7, HOXB7 PP HOXC5 PP HOXC10 PP HOXD1 PP HOXD12 PP HOXB13
Case study: Hox genes (3) PPID(s) name cl.A cl.B cl.C cl.D first sp. position PP053829, HOX1 HOXA1 HOXB1 HOXD1 T. nigrov. anterior PP053847, HOX2 HOXA2 HOXB2 T. nigrov. anterior PP053836,053845, HOX3 HOXA3 HOXB3 HOXD3 T. nigrov. PG3 PP053832,053844, HOX5 HOXA5 HOXB5 HOXC5 T. nigrov. central PP HOX6 HOXA6 HOXB6 HOXC6 T. nigrov. central PP053835, HOX9 HOXA9 HOXB9 HOXC9 HOXD9 T. nigrov. posterior PP053827,084287, HOX10 HOXA10 HOXC10 HOXD10 T. nigrov. posterior PP053858,053840, HOX11 HOXA11 HOXC11 HOXD11 T. nigrov. posterior PP053838, HOX12 HOXC12 HOXD12 T. nigrov. posterior PP053842,089685, HOX13 HOXA13 HOXB13 HOXC13 HOXD13 T. nigrov. posterior PP053853,053830,024984, HOX4 HOXA4 HOXB4 HOXC4 HOXD4 A. gamb. central PP TLX TLX1 TLX2 TLX3 A. gamb. PP HOX7 HOXA7 HOXB7 G. acul. central PP HOX8 HOXB8 HOXC8 HOXD8 C. intest. central PP MSX MSX1 MSX2 C. eleg. ‘First’ vertebrate Non- vertebrate Non- vertebrate Non- vertebrate Vertebrate
Conclusions PhyloPat: quick and easy tool for phylogenetic pattern search on complete Ensembl database Also usable for study of lineage-specific expansions of genes Updated immediately with each new Ensembl version: –v41; 5 new species: –v42; 1 new species: –v43: 4 new species: + extra option: gene neighborhood
Gene neighborhood Equal color = belonging to same orthologous group Conservation of gene order = functionally related
Where to find PhyloPat Web interface: (accessible through and Publication: Hulsen T., Groenen P.M.A., de Vlieg J. BMC Bioinformatics 2006, 7: Powered by Ensembl:
Goals: - study evolution of genes/proteins involved in immune system, from chicken to human - check for expansions and deletions in families - zoom in to interesting families Chicken-human immunogenomics project (part of Biorange SP3.2.2) In collaboration with Martien Groenen, Hinri Kerstens (Animal Sciences Group, Wageningen UR)
Proteins -> Genes Earlier initiatives: based on proteins (Protein World, IPI, ParAlign, MCL) Disadvantages: –large scale computations needed for orthology determination –Difficult to study lineage-specific expansions because of alternative transcripts, isoforms –Difficult to connect to WUR synteny data --> Genes: connect to PhyloPat tool
PhyloPat (dis)advantages Advantages: –Usage of accurate orthology determination of Ensembl (BLAST/SW, MUSCLE, PHYML), single linkage clustering by ourselves) –No alternative transcripts, isoforms –Easy to connect to WUR synteny data –26 species, from S.cer. to H.sap. Disadvantage: –Genome information sometimes incomplete (but Pre-versions and low coverage genomes are not included)
Immunophyle Application to immune system: parse through PhyloPat set using IRIS database Take all HUGO IDs from IRIS database, input in PhyloPat (v41)-> 585 immunologic lineages containing 18,933 genes from 26 species Divided into immunologic 22 categories from IRIS database (adaptive immunity, innate immunity, inflammation, chemotaxis, etc. Connected to GO, InterPro, KEGG, etc. by FatiGO
Immunophyle
Immunologic categories Nr.Abbrev.Description# HUGO IDs# ImmunoPhyle lineages# Genes 1InImmInnate Immunity InflmInflammation ChmtxChemotaxis PhagoPhagocytosis ComplComplement Cy_ChCytokines and Chemokines AdImmAdaptive Immunity ClRspCellular Response HmRspHumoral Response BMImmBarrier and Mucosal Immunity DevlpDevelopment of Immune System AgPrcAntigen Processing PtSigImmune Pathway or Signalling RecptReceptor IndImInduced by Immunomodulator ImDefInvolved in Immunodeficiency AutImInvolved in Autoimmunity ExpITExpressed Primarily in Immune Tissues Other InKilInnate NK Killing RlDisRelated to Disease CoaglCoagulation AllAll immunologic lineages
Categories & species Cat :: Category (# lineages, # genes)LinScCeAgAaDmCsCiTnTrOlGaDrXtGgMdDnBtCfEtLaRnMmOcMmPtHsTotal All :: All immunologic lineages (585,18933) InImm :: Innate Immunity (272, 8640) Inflm :: Inflammation (117, 4568) Chmtx :: Chemotaxis (54, 2374) Phago :: Phagocytosis (17, 890) Compl :: Complement (33, 958) Cy_Ch :: Cytokines and Chemokines (109, 2947) AdImm :: Adaptive Immunity (140, 4983) ClRsp :: Cellular Response (63, 2358) HmRsp :: Humoral Response (34, 1087) BMImm :: Barrier and Mucosal Immunity (18, 713) Devlp :: Development of Immune System (50, 2044) AgPrc :: Antigen Processing (31, 830) PtSig :: Immune Pathway or Signalling (224, 8245) Recpt :: Receptor (118, 3506) IndIm :: Induced by Immunomodulator (86, 3487) ImDef :: Involved in Immunodeficiency (30, 1013) AutIm :: Involved in Autoimmunity (19, 530) ExpIT :: Expressed Primarily in Immune Tissues (134, 3970) Other :: Other (43, 1843) InKil :: Innate NK Killing (33, 1015) RlDis :: Related to Disease (91, 3141) Coagl :: Coagulation (51, 2624)
Nr.Species#IPHUGO 1S.cer.4IP017HSPA1A,HSPA1B,HSPA1L,HSPA8 2C.ele.7 IP008CPB2 IP090NR3C1 3A.gam.10IP008CPB2 4A.aeg.21IP008CPB2 5D.mel.14IP008CPB2 6C.sav.9IP069TRAF3,TRAF4,TRAF5 7C.int.10 IP069TRAF3,TRAF4,TRAF5 IP162C6 8T.nig.15IP033MAPK8,MAPK10,MAPK11,MAPK12,MAPK13,MAPK14 9T.rub.12IP035SLC4A1 10O.lat.13IP035SLC4A1 11G.acu.14IP061ADORA1,ADORA2A,ADORA3,NCR2,PIGR,TREM1 12D.rer.19IP047A2M 13X.tro.16IP229SIGLEC5,SIGLEC6,SIGLEC7,SIGLEC8,SIGLEC9,SIGLEC10,SIGLEC11 14G.gal.9 IP035SLC4A1 IP047A2M 15M.dom.54IP463CEACAM1,CEACAM5,CEACAM6,CEACAM8 16D.nov.10 IP061ADORA1,ADORA2A,ADORA3,NCR2,PIGR,TREM1 IP116LYZ IP129ANKRD15 IP229SIGLEC5,SIGLEC6,SIGLEC7,SIGLEC8,SIGLEC9,SIGLEC10,SIGLEC11 17B.tau.46IP377IFNA10,IFNA13,IFNA14,IFNA16,IFNA17,IFNA21 18C.fam.14IP377IFNA10,IFNA13,IFNA14,IFNA16,IFNA17,IFNA21 19E.tel.12IP061ADORA1,ADORA2A,ADORA3,NCR2,PIGR,TREM1 20L.afr.10 IP035SLC4A1 IP061ADORA1,ADORA2A,ADORA3,NCR2,PIGR,TREM1 IP147MS4A12,MS4A4A,MS4A6A,MS4A6E,MS4A8B 21R.nor.17IP530KLRA1 22M.mus.23IP061ADORA1,ADORA2A,ADORA3,NCR2,PIGR,TREM1 23O.cun.18IP291HMGB2 24M.mul.16IP377IFNA10,IFNA13,IFNA14,IFNA16,IFNA17,IFNA21 25P.tro.17IP377IFNA10,IFNA13,IFNA14,IFNA16,IFNA17,IFNA21 26H.sap.16 IP061ADORA1,ADORA2A,ADORA3,NCR2,PIGR,TREM1 IP377IFNA10,IFNA13,IFNA14,IFNA16,IFNA17,IFNA21 IP463CEACAM1,CEACAM5,CEACAM6,CEACAM8 Largest expansion(s) for each species
Interleukin evolution: receptors & ligands IPIDScCeAgAaDmCsCiTnTrOlGaDrXtGgMdDnBtCfEtLaRnMmOcMmPtHsHUGO IP IL6R IP IL13RA2 IP IL23R IP IL21R IP IL12RB2 IP IL7R IP IL28RA IP IL20 IP IL8RA IL8RB IP IL8 IP ILF3 IP IL20RA IL22RA2 IP IL1R2 IP IL4R IP IL6 IP IL13RA1 IP IL22RA1 IP IL9 IP IL5RA IP IL21 IP IL2RA IP IL22 IL26 IP IL12RB1 IP IL5 IP IL1F6 IL1F9 IP IL27 IP IL24 IP IL2 IP IL7 IP IL1F8 IP IL23A IP IL27RA IP IL3 IP IL4
Example pathway: Toll-like receptors GeneGo MetaCore, canonical pathway Interspecies differences can possibly be explained by looking at number of orthologs for each gene in the pathway
Example pathway: Toll-like receptors LineageScCeAgAaDmCsCiTnTrOlGaDrXtGgMdDnBtCfEtLaRnMmOcMmPtHsHUGO IP TLR1/6/10 IP TLR2 IP TLR3 IP TLR4 IP TLR5 IP TLR7/8 IP TLR9 IP IRAK1 IP IRAK2 IP IRAK3/IRAK-M IP IRAK4 IP IL4 IP IL6 IP IL8 IP LBP IP LTA IP TOLLIP IP NFKB1,NFKB2,NFKBIA IP TRAF6 IP JUN/JUNB/JUND IP MAP3K7/TAK1 IP MAP3K7IP2 IP MAP3K14 IP MAP4K4 IP MAP2K3 Check ImmunoPhyle for each gene involved in the TLR pathway: Green: ‘first’ occurrenceRed: deletion
Current/future directions Differences in immune system between model organism and man cannot be explained only by looking at numbers of orthologs connect to literature, expression data, protein interaction data, structural data Zoom in to families with help of immunologists
Acknowledgements Peter Groenen Wilco Fleuren … and others (Martien Groenen, Hinri Kerstens, Erik Franck, Arnold Kuzniar, etc.) for suggestions!