Biorange Meeting 2007-04-03 PhyloPat phylogenetic pattern analysis of eukaryotic genes & Immunophyle its application on the evolution of the immune system.

Slides:



Advertisements
Similar presentations
Dissecting plant genomes using PLAZA 2.5 Michiel Van Bel 1,2+, Sebastian Proost 1,2+, Elisabeth Wischnitzki 1,2, Sara Mohavedi 1,2, Christopher Scheerlinck.
Advertisements

Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Dr. Mauricio Rodriguez-Lanetty
Centers of Excellence for Influenza Research and Surveillance 6 th Annual Meeting Aug 1, 2012 Status of IRD Development.
PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen BeNeLux BioInformatics Conference 2006.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
MDI Retraite 2007 Evolution of the immune system from model organism to man Tim Hulsen 1, Wilco W.M. Fleuren 1, Peter M.A. Groenen 2 1 CMBI, Radboud University.
Bioinformatics Werkbespreking – PhyloPat: phylogenetic pattern analysis of eukaryotic genes (20 slides) 2 – Chicken-human immunogenomics project.
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Benchmarking Orthology in Eukaryotes Nijmegen Tim Hulsen.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.
Introduction to Bioinformatics - Tutorial no. 12
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
An update on ongoing projects within Biorange SP Biorange Project Meeting Leiden, September 15 Tim Hulsen.
Protein and Function Databases
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
Peer Support: Francesca Peters + Reesha Ranat. A system of biological structures and process that exits to protect against disease Can be divided based.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Comparative Genomics of the Eukaryotes
Embryonic Development
ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University.
HOGENOM a phylogenomic database
Genomics in Drug Organon, Oss Tim Hulsen.
Networks and Interactions Boo Virk v1.0.
Managing Data Modeling GO Workshop 3-6 August 2010.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Overview of Immunology Organs and tissues Cells Molecules Components of IS Functions of IS Pathology of IR * IS: Immune system IR: Immune response Applications.
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
DAVID Genome Biol. 2003;4(5):P3 Analysis of gene lists using DAVID
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Copyright © 2005 Brooks/Cole — Thomson Learning Biology, Seventh Edition Solomon Berg Martin Chapter 16 Genes and Development.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Protein and RNA Families
EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14: , Genome research EBI, Wellcome Trust.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Copyright OpenHelix. No use or reproduction without express written consent1.
P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.
“software” of life. Genomes to function Lessons from genome projects Most genes have no known function Most genes w/ known function assigned from sequence-similarity.
Copyright OpenHelix. No use or reproduction without express written consent1.
Levels at which eukaryotic gene expression is controlled
1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Organon Tim Hulsen, Oss, November 11, 2003.
The evolution of the immune system in chicken and higher Organon, Oss Tim Hulsen.
1. Understand the molecular mechanisms underlying early embryonic development in vertebrates. 2. Explain, in general, how organizers function to pattern.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Copyright OpenHelix. No use or reproduction without express written consent1.
Testing sequence comparison methods with structure Organon, Oss Tim Hulsen.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Biocomputational Languages December 1, 2011 Greg Antell & Khoa Nguyen.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Data Mining with BioMart
Pathway Analysis June 13, 2017.
University of Pittsburgh
UniProt: Universal Protein Resource
Pathway Informatics December 5, 2018 Ansuman Chattopadhyay, PhD
PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics.
Conservation in Evolution
Network biology An introduction to STRING and Cytoscape
Gautam Dey, Tobias Meyer  Cell Systems 
Problems from last section
Volume 11, Issue 7, Pages (May 2015)
Origins and Impacts of New Mammalian Exons
Presentation transcript:

Biorange Meeting PhyloPat phylogenetic pattern analysis of eukaryotic genes & Immunophyle its application on the evolution of the immune system Tim Hulsen

Introduction (1) Phylogenetic patterns show presence/absence of genes over a certain set of species: e.g. for 10 species: Very useful for all kinds of evolutionary analyses: –Origin of certain genes –Deletion of certain genes –Clustering of genes with similar patterns: likely to have similar function / be in same pathway

Introduction (2) Earlier phylogenetic pattern initiatives: –Phylogenetic Pattern Search (PPS), incorporated into COG (Natale et al., 2000) –Extended Phylogenetic Patterns Search (EPPS) (Reichard & Kaufmann, 2003) –Incorporated into OrthoMCL-DB (Chen et al., 2006) All applied on proteins, not on genes!  PhyloPat: phylogenetic pattern analysis of eukaryotic genes

Method Genes: easier to check for lineage-specific expansions (no alternative transcripts or splice forms); less redundant Originally performed on Ensembl (EnsMart) database v40: 21 fully available genomes (i.e. no Pre! versions or low coverage genomes): S. cer. to H. sap. Make use of accurate Ensembl orthology pipeline (combination of BLAST,SW,MUSCLE and PHYML) Single linkage cluster algorithm: create orthologous groups containing ALL genes in Ensembl

Results 446,825 genes were clustered into 147,922 groups, using 3,164,088 orthologies from 21 species Species ordered from ‘low’ ( ) to ‘high’ ( ), i.e. approximate distance to human : Can be queried in several ways Output in HTML, Excel or plain text format

Web interface

Pattern/ID Search Binary string: 0=absent, 1=present, *=absent/present e.g. ‘00000******** ’:  must be absent in non-chordata, must be present in all mammals MySQL regular expression: e.g. ‘^0*1{10}0*$’  gives all genes that occur only in ten subsequent species Input list of Ensembl/EMBL IDs (PhyloPat contains EMBL to Ensembl mapping)

Output

Phylogenetic Tree

Oligo-/Polypresent Genes Oligopresent: present in only one/two species (oligo=few), e.g. ‘ ’ These two species should be highly related 1. C. sav C. int 1737div. 100 Mya (Boffelli et al., 2004) 2. T. nig T. rub 1572div. 85 Mya (Yakanoue et al., 2006) 3. A. gam A. Aeg 1058div. 140 Mya (Service, 1993) 4. P. tro H. sap 887div. 6 Mya (Glazko & Nei, 2003) 5. R. nor M. Mus 713div. 20 Mya (Springer et al., 2003) Polypresent: present in all species, except for one/two (poly=many), e.g. ‘ ’ These two species should be related too; similar analysis possible

Omnipresent genes Omnipresent: present in all 21 species (omni=all): ‘ ’ Currently 1001 omnipresent groups Tend to have very general/important functions, mostly involved in transcription/translation

FatiGO analysis FatiGO: connection with GO terms, KEGG pathways, InterPro domains, etc. (El-Shahrour et al., 2004) Analysis of all human genes in output by just one mouse click e.g. omnipresent genes:

Other possibilities Anti-correlating patterns: e.g. ‘ ’ and ‘ ’  could be completely different, or very similar (analogous)! Easy homology-inferred functional annotation (using information from other genes in the same lineage)

Case study: Hox genes (1) Hox genes determine where limbs and other body segments will grow in a developing embryo Should exist mostly in vertebrates Expansion in teleost fish species (, 8-11); seven Hox clusters instead of the mammalian four Search Ensembl database for human genes with term ‘hox’ in annotation 44 genes found -> enter in PhyloPat -> 32 groups found (PP######)

Case study: Hox genes (2) PPID # genes per species phylogenetic pattern gene name(s) PP MSX1, MSX2 PP HOXC4 PP TLX1, TLX2, TLX3 PP HOXB8, HOXC8, HOXD8 PP HOXD11 PP HOXA10 PP HOXC13, HOXD13 PP HOXA1, HOXB1 PP HOXB4 PP HOXA5 PP HOXB2 PP HOXD3 PP HOXA9 PP HOXA3 PP HOXC12 PP HOXD4 PP HOXC11 PP HOXA13 PP HOXB5 PP HOXB3 PP HOXD10 PP HOXA2 PP HOXA6, HOXB6, HOXC6 PP HOXA4 PP HOXB9, HOXC9, HOXD9 PP HOXA11 PP HOXA7, HOXB7 PP HOXC5 PP HOXC10 PP HOXD1 PP HOXD12 PP HOXB13

Case study: Hox genes (3) PPID(s) name cl.A cl.B cl.C cl.D first sp. position PP053829, HOX1 HOXA1 HOXB1 HOXD1 T. nigrov. anterior PP053847, HOX2 HOXA2 HOXB2 T. nigrov. anterior PP053836,053845, HOX3 HOXA3 HOXB3 HOXD3 T. nigrov. PG3 PP053832,053844, HOX5 HOXA5 HOXB5 HOXC5 T. nigrov. central PP HOX6 HOXA6 HOXB6 HOXC6 T. nigrov. central PP053835, HOX9 HOXA9 HOXB9 HOXC9 HOXD9 T. nigrov. posterior PP053827,084287, HOX10 HOXA10 HOXC10 HOXD10 T. nigrov. posterior PP053858,053840, HOX11 HOXA11 HOXC11 HOXD11 T. nigrov. posterior PP053838, HOX12 HOXC12 HOXD12 T. nigrov. posterior PP053842,089685, HOX13 HOXA13 HOXB13 HOXC13 HOXD13 T. nigrov. posterior PP053853,053830,024984, HOX4 HOXA4 HOXB4 HOXC4 HOXD4 A. gamb. central PP TLX TLX1 TLX2 TLX3 A. gamb. PP HOX7 HOXA7 HOXB7 G. acul. central PP HOX8 HOXB8 HOXC8 HOXD8 C. intest. central PP MSX MSX1 MSX2 C. eleg. ‘First’ vertebrate Non- vertebrate Non- vertebrate Non- vertebrate Vertebrate

Conclusions PhyloPat: quick and easy tool for phylogenetic pattern search on complete Ensembl database Also usable for study of lineage-specific expansions of genes Updated immediately with each new Ensembl version: –v41; 5 new species: –v42; 1 new species: –v43: 4 new species: + extra option: gene neighborhood

Gene neighborhood Equal color = belonging to same orthologous group Conservation of gene order = functionally related

Where to find PhyloPat Web interface: (accessible through and Publication: Hulsen T., Groenen P.M.A., de Vlieg J. BMC Bioinformatics 2006, 7: Powered by Ensembl:

Goals: - study evolution of genes/proteins involved in immune system, from chicken to human - check for expansions and deletions in families - zoom in to interesting families Chicken-human immunogenomics project (part of Biorange SP3.2.2) In collaboration with Martien Groenen, Hinri Kerstens (Animal Sciences Group, Wageningen UR)

Proteins -> Genes Earlier initiatives: based on proteins (Protein World, IPI, ParAlign, MCL) Disadvantages: –large scale computations needed for orthology determination –Difficult to study lineage-specific expansions because of alternative transcripts, isoforms –Difficult to connect to WUR synteny data --> Genes: connect to PhyloPat tool

PhyloPat (dis)advantages Advantages: –Usage of accurate orthology determination of Ensembl (BLAST/SW, MUSCLE, PHYML), single linkage clustering by ourselves) –No alternative transcripts, isoforms –Easy to connect to WUR synteny data –26 species, from S.cer. to H.sap. Disadvantage: –Genome information sometimes incomplete (but Pre-versions and low coverage genomes are not included)

Immunophyle Application to immune system: parse through PhyloPat set using IRIS database Take all HUGO IDs from IRIS database, input in PhyloPat (v41)-> 585 immunologic lineages containing 18,933 genes from 26 species Divided into immunologic 22 categories from IRIS database (adaptive immunity, innate immunity, inflammation, chemotaxis, etc. Connected to GO, InterPro, KEGG, etc. by FatiGO

Immunophyle

Immunologic categories Nr.Abbrev.Description# HUGO IDs# ImmunoPhyle lineages# Genes 1InImmInnate Immunity InflmInflammation ChmtxChemotaxis PhagoPhagocytosis ComplComplement Cy_ChCytokines and Chemokines AdImmAdaptive Immunity ClRspCellular Response HmRspHumoral Response BMImmBarrier and Mucosal Immunity DevlpDevelopment of Immune System AgPrcAntigen Processing PtSigImmune Pathway or Signalling RecptReceptor IndImInduced by Immunomodulator ImDefInvolved in Immunodeficiency AutImInvolved in Autoimmunity ExpITExpressed Primarily in Immune Tissues Other InKilInnate NK Killing RlDisRelated to Disease CoaglCoagulation AllAll immunologic lineages

Categories & species Cat :: Category (# lineages, # genes)LinScCeAgAaDmCsCiTnTrOlGaDrXtGgMdDnBtCfEtLaRnMmOcMmPtHsTotal All :: All immunologic lineages (585,18933) InImm :: Innate Immunity (272, 8640) Inflm :: Inflammation (117, 4568) Chmtx :: Chemotaxis (54, 2374) Phago :: Phagocytosis (17, 890) Compl :: Complement (33, 958) Cy_Ch :: Cytokines and Chemokines (109, 2947) AdImm :: Adaptive Immunity (140, 4983) ClRsp :: Cellular Response (63, 2358) HmRsp :: Humoral Response (34, 1087) BMImm :: Barrier and Mucosal Immunity (18, 713) Devlp :: Development of Immune System (50, 2044) AgPrc :: Antigen Processing (31, 830) PtSig :: Immune Pathway or Signalling (224, 8245) Recpt :: Receptor (118, 3506) IndIm :: Induced by Immunomodulator (86, 3487) ImDef :: Involved in Immunodeficiency (30, 1013) AutIm :: Involved in Autoimmunity (19, 530) ExpIT :: Expressed Primarily in Immune Tissues (134, 3970) Other :: Other (43, 1843) InKil :: Innate NK Killing (33, 1015) RlDis :: Related to Disease (91, 3141) Coagl :: Coagulation (51, 2624)

Nr.Species#IPHUGO 1S.cer.4IP017HSPA1A,HSPA1B,HSPA1L,HSPA8 2C.ele.7 IP008CPB2 IP090NR3C1 3A.gam.10IP008CPB2 4A.aeg.21IP008CPB2 5D.mel.14IP008CPB2 6C.sav.9IP069TRAF3,TRAF4,TRAF5 7C.int.10 IP069TRAF3,TRAF4,TRAF5 IP162C6 8T.nig.15IP033MAPK8,MAPK10,MAPK11,MAPK12,MAPK13,MAPK14 9T.rub.12IP035SLC4A1 10O.lat.13IP035SLC4A1 11G.acu.14IP061ADORA1,ADORA2A,ADORA3,NCR2,PIGR,TREM1 12D.rer.19IP047A2M 13X.tro.16IP229SIGLEC5,SIGLEC6,SIGLEC7,SIGLEC8,SIGLEC9,SIGLEC10,SIGLEC11 14G.gal.9 IP035SLC4A1 IP047A2M 15M.dom.54IP463CEACAM1,CEACAM5,CEACAM6,CEACAM8 16D.nov.10 IP061ADORA1,ADORA2A,ADORA3,NCR2,PIGR,TREM1 IP116LYZ IP129ANKRD15 IP229SIGLEC5,SIGLEC6,SIGLEC7,SIGLEC8,SIGLEC9,SIGLEC10,SIGLEC11 17B.tau.46IP377IFNA10,IFNA13,IFNA14,IFNA16,IFNA17,IFNA21 18C.fam.14IP377IFNA10,IFNA13,IFNA14,IFNA16,IFNA17,IFNA21 19E.tel.12IP061ADORA1,ADORA2A,ADORA3,NCR2,PIGR,TREM1 20L.afr.10 IP035SLC4A1 IP061ADORA1,ADORA2A,ADORA3,NCR2,PIGR,TREM1 IP147MS4A12,MS4A4A,MS4A6A,MS4A6E,MS4A8B 21R.nor.17IP530KLRA1 22M.mus.23IP061ADORA1,ADORA2A,ADORA3,NCR2,PIGR,TREM1 23O.cun.18IP291HMGB2 24M.mul.16IP377IFNA10,IFNA13,IFNA14,IFNA16,IFNA17,IFNA21 25P.tro.17IP377IFNA10,IFNA13,IFNA14,IFNA16,IFNA17,IFNA21 26H.sap.16 IP061ADORA1,ADORA2A,ADORA3,NCR2,PIGR,TREM1 IP377IFNA10,IFNA13,IFNA14,IFNA16,IFNA17,IFNA21 IP463CEACAM1,CEACAM5,CEACAM6,CEACAM8 Largest expansion(s) for each species

Interleukin evolution: receptors & ligands IPIDScCeAgAaDmCsCiTnTrOlGaDrXtGgMdDnBtCfEtLaRnMmOcMmPtHsHUGO IP IL6R IP IL13RA2 IP IL23R IP IL21R IP IL12RB2 IP IL7R IP IL28RA IP IL20 IP IL8RA IL8RB IP IL8 IP ILF3 IP IL20RA IL22RA2 IP IL1R2 IP IL4R IP IL6 IP IL13RA1 IP IL22RA1 IP IL9 IP IL5RA IP IL21 IP IL2RA IP IL22 IL26 IP IL12RB1 IP IL5 IP IL1F6 IL1F9 IP IL27 IP IL24 IP IL2 IP IL7 IP IL1F8 IP IL23A IP IL27RA IP IL3 IP IL4

Example pathway: Toll-like receptors GeneGo MetaCore, canonical pathway  Interspecies differences can possibly be explained by looking at number of orthologs for each gene in the pathway

Example pathway: Toll-like receptors LineageScCeAgAaDmCsCiTnTrOlGaDrXtGgMdDnBtCfEtLaRnMmOcMmPtHsHUGO IP TLR1/6/10 IP TLR2 IP TLR3 IP TLR4 IP TLR5 IP TLR7/8 IP TLR9 IP IRAK1 IP IRAK2 IP IRAK3/IRAK-M IP IRAK4 IP IL4 IP IL6 IP IL8 IP LBP IP LTA IP TOLLIP IP NFKB1,NFKB2,NFKBIA IP TRAF6 IP JUN/JUNB/JUND IP MAP3K7/TAK1 IP MAP3K7IP2 IP MAP3K14 IP MAP4K4 IP MAP2K3 Check ImmunoPhyle for each gene involved in the TLR pathway: Green: ‘first’ occurrenceRed: deletion

Current/future directions Differences in immune system between model organism and man cannot be explained only by looking at numbers of orthologs  connect to literature, expression data, protein interaction data, structural data Zoom in to families with help of immunologists

Acknowledgements Peter Groenen Wilco Fleuren … and others (Martien Groenen, Hinri Kerstens, Erik Franck, Arnold Kuzniar, etc.) for suggestions!