Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biorange Meeting 2007-04-03 PhyloPat phylogenetic pattern analysis of eukaryotic genes & Immunophyle its application on the evolution of the immune system.

Similar presentations


Presentation on theme: "Biorange Meeting 2007-04-03 PhyloPat phylogenetic pattern analysis of eukaryotic genes & Immunophyle its application on the evolution of the immune system."— Presentation transcript:

1 Biorange Meeting 2007-04-03 PhyloPat phylogenetic pattern analysis of eukaryotic genes & Immunophyle its application on the evolution of the immune system Tim Hulsen

2 Introduction (1) Phylogenetic patterns show presence/absence of genes over a certain set of species: e.g. for 10 species: 0011101011 Very useful for all kinds of evolutionary analyses: –Origin of certain genes –Deletion of certain genes –Clustering of genes with similar patterns: likely to have similar function / be in same pathway

3 Introduction (2) Earlier phylogenetic pattern initiatives: –Phylogenetic Pattern Search (PPS), incorporated into COG (Natale et al., 2000) –Extended Phylogenetic Patterns Search (EPPS) (Reichard & Kaufmann, 2003) –Incorporated into OrthoMCL-DB (Chen et al., 2006) All applied on proteins, not on genes!  PhyloPat: phylogenetic pattern analysis of eukaryotic genes

4 Method Genes: easier to check for lineage-specific expansions (no alternative transcripts or splice forms); less redundant Originally performed on Ensembl (EnsMart) database v40: 21 fully available genomes (i.e. no Pre! versions or low coverage genomes): S. cer. to H. sap. Make use of accurate Ensembl orthology pipeline (combination of BLAST,SW,MUSCLE and PHYML) Single linkage cluster algorithm: create orthologous groups containing ALL genes in Ensembl

5 Results 446,825 genes were clustered into 147,922 groups, using 3,164,088 orthologies from 21 species Species ordered from ‘low’ ( ) to ‘high’ ( ), i.e. approximate distance to human : Can be queried in several ways Output in HTML, Excel or plain text format

6 Web interface http://www.cmbi.ru.nl/phylopat

7 Pattern/ID Search Binary string: 0=absent, 1=present, *=absent/present e.g. ‘00000********11111111’:  must be absent in non-chordata, must be present in all mammals MySQL regular expression: e.g. ‘^0*1{10}0*$’  gives all genes that occur only in ten subsequent species Input list of Ensembl/EMBL IDs (PhyloPat contains EMBL to Ensembl mapping)

8 Output

9 Phylogenetic Tree

10 Oligo-/Polypresent Genes Oligopresent: present in only one/two species (oligo=few), e.g. ‘000000010000000000100’ These two species should be highly related 1. C. sav C. int 1737div. 100 Mya (Boffelli et al., 2004) 2. T. nig T. rub 1572div. 85 Mya (Yakanoue et al., 2006) 3. A. gam A. Aeg 1058div. 140 Mya (Service, 1993) 4. P. tro H. sap 887div. 6 Mya (Glazko & Nei, 2003) 5. R. nor M. Mus 713div. 20 Mya (Springer et al., 2003) Polypresent: present in all species, except for one/two (poly=many), e.g. ‘111110111110111111111’ These two species should be related too; similar analysis possible

11 Omnipresent genes Omnipresent: present in all 21 species (omni=all): ‘111111111111111111111’ Currently 1001 omnipresent groups Tend to have very general/important functions, mostly involved in transcription/translation

12 FatiGO analysis FatiGO: connection with GO terms, KEGG pathways, InterPro domains, etc. (El-Shahrour et al., 2004) Analysis of all human genes in output by just one mouse click e.g. omnipresent genes:

13 Other possibilities Anti-correlating patterns: e.g. ‘001111100011000000000’ and ‘110000011100111111111’  could be completely different, or very similar (analogous)! Easy homology-inferred functional annotation (using information from other genes in the same lineage)

14 Case study: Hox genes (1) Hox genes determine where limbs and other body segments will grow in a developing embryo Should exist mostly in vertebrates Expansion in teleost fish species (, 8-11); seven Hox clusters instead of the mammalian four Search Ensembl database for human genes with term ‘hox’ in annotation 44 genes found -> enter in PhyloPat -> 32 groups found (PP######)

15 Case study: Hox genes (2) PPID # genes per species phylogenetic pattern gene name(s) PP022041 011111136562233233222 011111111111111111111 MSX1, MSX2 PP024984 001000011111001111111 001000011111001111111 HOXC4 PP027791 001110023343233333333 001110011111111111111 TLX1, TLX2, TLX3 PP049478 000000221153112322223 000000111111111111111 HOXB8, HOXC8, HOXD8 PP053824 000000011120010101011 000000011110010101011 HOXD11 PP053827 000000022211111111111 000000011111111111111 HOXA10 PP053828 000000021111212122222 000000011111111111111 HOXC13, HOXD13 PP053829 000000063341122222222 000000011111111111111 HOXA1, HOXB1 PP053830 000000011110010111111 000000011110010111111 HOXB4 PP053832 000000021111011111111 000000011111011111111 HOXA5 PP053833 000000021110111111011 000000011110111111011 HOXB2 PP053834 000000031101011111111 000000011101011111111 HOXD3 PP053835 000000021110111111101 000000011110111111101 HOXA9 PP053836 000000021111111111111 000000011111111111111 HOXA3 PP053838 000000021110101111111 000000011110101111111 HOXC12 PP053839 000000011111111110111 000000011111111110111 HOXD4 PP053840 000000021111201011101 000000011111101011101 HOXC11 PP053842 000000043221111111111 000000011111111111111 HOXA13 PP053844 000000032231011111111 000000011111011111111 HOXB5 PP053845 000000021111111111011 000000011111111111011 HOXB3 PP053846 000000021121111111111 000000011111111111111 HOXD10 PP053847 000000022211111111111 000000011111111111111 HOXA2 PP053849 000000034151132333323 000000011111111111111 HOXA6, HOXB6, HOXC6 PP053853 000000011101111111011 000000011101111111011 HOXA4 PP053854 000000032252223133213 000000011111111111111 HOXB9, HOXC9, HOXD9 PP053858 000000011120011111111 000000011110011111111 HOXA11 PP070659 000000000121212222222 000000000111111111111 HOXA7, HOXB7 PP075622 000000000010001111111 000000000010001111111 HOXC5 PP084287 000000000001101111111 000000000001101111111 HOXC10 PP085049 000000000001011011111 000000000001011011111 HOXD1 PP087941 000000000000111011111 000000000000111011111 HOXD12 PP089685 000000000000111111111 000000000000111111111 HOXB13

16 Case study: Hox genes (3) PPID(s) name cl.A cl.B cl.C cl.D first sp. position PP053829,085049 HOX1 HOXA1 HOXB1 HOXD1 T. nigrov. anterior PP053847,053833 HOX2 HOXA2 HOXB2 T. nigrov. anterior PP053836,053845,053834 HOX3 HOXA3 HOXB3 HOXD3 T. nigrov. PG3 PP053832,053844,075622 HOX5 HOXA5 HOXB5 HOXC5 T. nigrov. central PP053849 HOX6 HOXA6 HOXB6 HOXC6 T. nigrov. central PP053835,053854 HOX9 HOXA9 HOXB9 HOXC9 HOXD9 T. nigrov. posterior PP053827,084287,053846 HOX10 HOXA10 HOXC10 HOXD10 T. nigrov. posterior PP053858,053840,053824 HOX11 HOXA11 HOXC11 HOXD11 T. nigrov. posterior PP053838,087941 HOX12 HOXC12 HOXD12 T. nigrov. posterior PP053842,089685,053828 HOX13 HOXA13 HOXB13 HOXC13 HOXD13 T. nigrov. posterior PP053853,053830,024984,053839 HOX4 HOXA4 HOXB4 HOXC4 HOXD4 A. gamb. central PP027791 TLX TLX1 TLX2 TLX3 A. gamb. PP070659 HOX7 HOXA7 HOXB7 G. acul. central PP049478 HOX8 HOXB8 HOXC8 HOXD8 C. intest. central PP022041 MSX MSX1 MSX2 C. eleg. ‘First’ vertebrate Non- vertebrate Non- vertebrate Non- vertebrate Vertebrate

17 Conclusions PhyloPat: quick and easy tool for phylogenetic pattern search on complete Ensembl database Also usable for study of lineage-specific expansions of genes Updated immediately with each new Ensembl version: –v41; 5 new species: –v42; 1 new species: –v43: 4 new species: + extra option: gene neighborhood

18 Gene neighborhood Equal color = belonging to same orthologous group Conservation of gene order = functionally related

19 Where to find PhyloPat Web interface: http://www.cmbi.ru.nl/phylopat (accessible through www.cmbi.ru.nl and www.nbic.nl)www.cmbi.ru.nlwww.nbic.nl Publication: Hulsen T., Groenen P.M.A., de Vlieg J. BMC Bioinformatics 2006, 7: 398 http://www.biomedcentral.com/1471-2105/7/398 Powered by Ensembl: http://www.ensembl.org/info/about/ensembl_powered.html

20 Goals: - study evolution of genes/proteins involved in immune system, from chicken to human - check for expansions and deletions in families - zoom in to interesting families Chicken-human immunogenomics project (part of Biorange SP3.2.2) In collaboration with Martien Groenen, Hinri Kerstens (Animal Sciences Group, Wageningen UR)

21 Proteins -> Genes Earlier initiatives: based on proteins (Protein World, IPI, ParAlign, MCL) Disadvantages: –large scale computations needed for orthology determination –Difficult to study lineage-specific expansions because of alternative transcripts, isoforms –Difficult to connect to WUR synteny data --> Genes: connect to PhyloPat tool

22 PhyloPat (dis)advantages Advantages: –Usage of accurate orthology determination of Ensembl (BLAST/SW, MUSCLE, PHYML), single linkage clustering by ourselves) –No alternative transcripts, isoforms –Easy to connect to WUR synteny data –26 species, from S.cer. to H.sap. Disadvantage: –Genome information sometimes incomplete (but Pre-versions and low coverage genomes are not included)

23 Immunophyle Application to immune system: parse through PhyloPat set using IRIS database Take all HUGO IDs from IRIS database, input in PhyloPat (v41)-> 585 immunologic lineages containing 18,933 genes from 26 species Divided into immunologic 22 categories from IRIS database (adaptive immunity, innate immunity, inflammation, chemotaxis, etc. Connected to GO, InterPro, KEGG, etc. by FatiGO

24 Immunophyle http://www.cmbi.ru.nl/immunophyle

25 Immunologic categories Nr.Abbrev.Description# HUGO IDs# ImmunoPhyle lineages# Genes 1InImmInnate Immunity6382728640 2InflmInflammation3141174568 3ChmtxChemotaxis192542374 4PhagoPhagocytosis3717890 5ComplComplement6233958 6Cy_ChCytokines and Chemokines2611092947 7AdImmAdaptive Immunity4221404983 8ClRspCellular Response145632358 9HmRspHumoral Response98341087 10BMImmBarrier and Mucosal Immunity4518713 11DevlpDevelopment of Immune System130502044 12AgPrcAntigen Processing14831830 13PtSigImmune Pathway or Signalling4702248245 15RecptReceptor2461183506 16IndImInduced by Immunomodulator200863487 20ImDefInvolved in Immunodeficiency71301013 21AutImInvolved in Autoimmunity4419530 22ExpITExpressed Primarily in Immune Tissues3321343970 23Other 107431843 25InKilInnate NK Killing82331015 26RlDisRelated to Disease172913141 27CoaglCoagulation111512624 0AllAll immunologic lineages154258518933

26 Categories & species Cat :: Category (# lineages, # genes)LinScCeAgAaDmCsCiTnTrOlGaDrXtGgMdDnBtCfEtLaRnMmOcMmPtHsTotal All :: All immunologic lineages (585,18933)58554156193211214219239876830824855101568668596974011219488708021070116311318181087115718933 InImm :: Innate Immunity (272, 8640)2721751778981 933513553393514203042954663555664354163845175715393845355688640 Inflm :: Inflammation (117, 4568)117133445574353552022001941972371791502272002672212151972633022711942652874568 Chmtx :: Chemotaxis (54, 2374)544121824182228107118112122125966915790135121112103124147132861411512374 Phago :: Phagocytosis (17, 890)1714910 8 46434147503231423451454642495844335153890 Compl :: Complement (33, 958)330313771119454143 543731503658454834606255435459958 Cy_Ch :: Cytokines and Chemokines (109, 2947)109211142018 122119124119148921201441062191731431331751871951431901942947 AdImm :: Adaptive Immunity (140, 4983)140174437404859622122072042192531581702461883302602252232763153242443033194983 ClRsp :: Cellular Response (63, 2358)636262023223641106101102100119789611693138112105104124148 1111371462358 HmRsp :: Humoral Response (34, 1087)343988989484648454737404943605855506165766865721087 BMImm :: Barrier and Mucosal Immunity (18, 713)1801109152420251727241811333068423832485847255257713 Devlp :: Development of Immune System (50, 2044)50518232523222910990899612464721067410910810386114122116921081172044 AgPrc :: Antigen Processing (31, 830)3138911 101234313638562225394049353736394063355457830 PtSig :: Immune Pathway or Signalling (224, 8245)2241363708781931024003813823904803012964463024544153713444595084893374805018245 Recpt :: Receptor (118, 3506)118218162018 241481511501581871251241651412051911701542272402261562312413506 IndIm :: Induced by Immunomodulator (86, 3487)867232825402932172163159175200129122171130224184154 1982181971511932093487 ImDef :: Involved in Immunodeficiency (30, 1013)304815129182844 413864354248346145 435456585256591013 AutIm :: Involved in Autoimmunity (19, 530)190112615323242623321820251929302822293031293133530 ExpIT :: Expressed Primarily in Immune Tissues (134, 3970) 134112236273132421571601531731861371182491552382011791702572742601582612833970 Other :: Other (43, 1843)4392832313723259982809010586788478988275749392967394991843 InKil :: Innate NK Killing (33, 1015)331499668313728323544233849704952507488723880821015 RlDis :: Related to Disease (91, 3141)916143228322534151143133144176115128159131190159153133170184 1451821903141 Coagl :: Coagulation (51, 2624)515253644333133132123122124154114811241081411231211091451621391061381512624

27 Nr.Species#IPHUGO 1S.cer.4IP017HSPA1A,HSPA1B,HSPA1L,HSPA8 2C.ele.7 IP008CPB2 IP090NR3C1 3A.gam.10IP008CPB2 4A.aeg.21IP008CPB2 5D.mel.14IP008CPB2 6C.sav.9IP069TRAF3,TRAF4,TRAF5 7C.int.10 IP069TRAF3,TRAF4,TRAF5 IP162C6 8T.nig.15IP033MAPK8,MAPK10,MAPK11,MAPK12,MAPK13,MAPK14 9T.rub.12IP035SLC4A1 10O.lat.13IP035SLC4A1 11G.acu.14IP061ADORA1,ADORA2A,ADORA3,NCR2,PIGR,TREM1 12D.rer.19IP047A2M 13X.tro.16IP229SIGLEC5,SIGLEC6,SIGLEC7,SIGLEC8,SIGLEC9,SIGLEC10,SIGLEC11 14G.gal.9 IP035SLC4A1 IP047A2M 15M.dom.54IP463CEACAM1,CEACAM5,CEACAM6,CEACAM8 16D.nov.10 IP061ADORA1,ADORA2A,ADORA3,NCR2,PIGR,TREM1 IP116LYZ IP129ANKRD15 IP229SIGLEC5,SIGLEC6,SIGLEC7,SIGLEC8,SIGLEC9,SIGLEC10,SIGLEC11 17B.tau.46IP377IFNA10,IFNA13,IFNA14,IFNA16,IFNA17,IFNA21 18C.fam.14IP377IFNA10,IFNA13,IFNA14,IFNA16,IFNA17,IFNA21 19E.tel.12IP061ADORA1,ADORA2A,ADORA3,NCR2,PIGR,TREM1 20L.afr.10 IP035SLC4A1 IP061ADORA1,ADORA2A,ADORA3,NCR2,PIGR,TREM1 IP147MS4A12,MS4A4A,MS4A6A,MS4A6E,MS4A8B 21R.nor.17IP530KLRA1 22M.mus.23IP061ADORA1,ADORA2A,ADORA3,NCR2,PIGR,TREM1 23O.cun.18IP291HMGB2 24M.mul.16IP377IFNA10,IFNA13,IFNA14,IFNA16,IFNA17,IFNA21 25P.tro.17IP377IFNA10,IFNA13,IFNA14,IFNA16,IFNA17,IFNA21 26H.sap.16 IP061ADORA1,ADORA2A,ADORA3,NCR2,PIGR,TREM1 IP377IFNA10,IFNA13,IFNA14,IFNA16,IFNA17,IFNA21 IP463CEACAM1,CEACAM5,CEACAM6,CEACAM8 Largest expansion(s) for each species

28 Interleukin evolution: receptors & ligands IPIDScCeAgAaDmCsCiTnTrOlGaDrXtGgMdDnBtCfEtLaRnMmOcMmPtHsHUGO IP18400000001111111112111111111IL6R IP18700000001111111111111211111IL13RA2 IP18800000001010101111111211111IL23R IP19000000001111012111111111111IL21R IP19200000001011111111111111111IL12RB2 IP19500000001101001111111111111IL7R IP21400000001223101111100111111IL28RA IP23200000001221212333332333333IL20 IP25400000003344320313431340444 IL8RA IL8RB IP29400000001111210111111001111IL8 IP34800000001111310111110211111ILF3 IP35000000001222202212212221222 IL20RA IL22RA2 IP40100000000000111111111111111IL1R2 IP40500000000000100101111111111IL4R IP42100000000000011111111110111IL6 IP42500000000000011111111211111IL13RA1 IP42800000000000001101101111111IL22RA1 IP43300000000000001011111111111IL9 IP43800000000000001111111111111IL5RA IP44200000000000001101111111111IL21 IP44700000000000001011111112111IL2RA IP44800000000000001122225122222 IL22 IL26 IP48100000000000000101111110111IL12RB1 IP48200000000000000111111111111IL5 IP49500000000000000302120222222 IL1F6 IL1F9 IP49600000000000000111100111011IL27 IP49900000000000000101101111111IL24 IP51500000000000000011111111111IL2 IP52200000000000000011111111111IL7 IP52300000000000000011100010101IL1F8 IP53100000000000000001111111111IL23A IP53400000000000000001111111111IL27RA IP53600000000000000001111001111IL3 IP53900000000000000001111111111IL4

29 Example pathway: Toll-like receptors GeneGo MetaCore, canonical pathway  Interspecies differences can possibly be explained by looking at number of orthologs for each gene in the pathway

30 Example pathway: Toll-like receptors LineageScCeAgAaDmCsCiTnTrOlGaDrXtGgMdDnBtCfEtLaRnMmOcMmPtHsHUGO IP40600000000000021122201223233TLR1/6/10 IP30800000001111102101100111111TLR2 IP19700000001111111111111111111TLR3 IP43000000000000001111011111111TLR4 IP28900000002223111101110101001TLR5 IP35900000001112411110122122122TLR7/8 IP55000000000000000001000011011TLR9 IP45800000000000000101111111011IRAK1 IP47500000000000000111111011111IRAK2 IP39700000000001300111111111111IRAK3/IRAK-M IP32100000001111101111111111111IRAK4 IP53900000000000000001111111111IL4 IP42100000000000011111111111011IL6 IP29400000001111210111111001111IL8 IP07805000123334134534434444444LBP IP48400000000000000101111111111LTA IP05701100111110111101111111011TOLLIP IP04501111224434433314433444244NFKB1,NFKB2,NFKBIA IP13200101101111111122111112111TRAF6 IP05901111117756440303220232023JUN/JUNB/JUND IP14500010111112111111111111111MAP3K7/TAK1 IP15800000112222322222221222222MAP3K7IP2 IP22200000001111111111111111101MAP3K14 IP10101111005455413433333333333MAP4K4 IP43400000000000001101101111011MAP2K3 Check ImmunoPhyle for each gene involved in the TLR pathway: Green: ‘first’ occurrenceRed: deletion

31 Current/future directions Differences in immune system between model organism and man cannot be explained only by looking at numbers of orthologs  connect to literature, expression data, protein interaction data, structural data Zoom in to families with help of immunologists

32 Acknowledgements Peter Groenen Wilco Fleuren … and others (Martien Groenen, Hinri Kerstens, Erik Franck, Arnold Kuzniar, etc.) for suggestions!


Download ppt "Biorange Meeting 2007-04-03 PhyloPat phylogenetic pattern analysis of eukaryotic genes & Immunophyle its application on the evolution of the immune system."

Similar presentations


Ads by Google