Presentation is loading. Please wait.

Presentation is loading. Please wait.

An update on ongoing projects within Biorange SP3.2.2.1 Biorange Project Meeting Leiden, September 15 Tim Hulsen.

Similar presentations


Presentation on theme: "An update on ongoing projects within Biorange SP3.2.2.1 Biorange Project Meeting Leiden, September 15 Tim Hulsen."— Presentation transcript:

1 An update on ongoing projects within Biorange SP3.2.2.1 Biorange Project Meeting Leiden, September 15 Tim Hulsen

2 Biorange SP3.2.2 CoPub Knowledge integration ArrayExpress db User Xref db  Gene annotation through applications: PhyloPat, BioVenn, OrthoPath, CoPub

3 Overview PhyloPat Published in BMC Bioinformatics (2006) Update submitted to Nucleic Acids Res. Database issue BioVenn Revised version submitted to BMC Genomics Orthologous networks & OrthoPath Manuscript in preparation CoPub (Taverna workflows) Published in Nucleic Acids Res. Web Server issue (2008)

4 PhyloPat - Introduction Phylogenetic patterns show the presence or absence of certain genes in a set of full genomes derived from different species PhyloPat allows the complete Ensembl gene database to be queried using phylogenetic patterns Published in september 2006, now new version with: Ensembl v50 Support of HGNC and EntrezGene IDs FASTA-format sequences of the members of a phylogenetic lineage Gene neighborhood view http://www.cmbi.ru.nl/phylopat

5 PhyloPat: Update to Ensembl v50 39 species, under which model organisms such as C. elegans, D. melanogaster, D. rerio, G. gallus, M. musculus, R. norvegicus, C. familiaris, M. mulatta, and human In total 814,936 genes In total 244,114 orthologous groups, created by clustering the orthologous gene pairs predicted by Ensembl

6 PhyloPat: Support of HGNC and EntrezGene IDs HGNC-Ensembl mapping for 29 species EntrezGene-Ensembl mapping for 18 species Choose form four types of IDs

7 PhyloPat: FASTA-format sequences “L”: Longest peptide sequences from this orthologous group (only the longest peptide per gene) “A”: All peptide sequences from this orthologous group (all peptides per gene)

8 PhyloPat: Gene neighborhood view The ‘Gene neighborhood view’ shows all genes from all species in a certain phylogenetic lineage, and all genes in their proximity on the genome (10 genes to both sides) Neighbouring genes are color-coded according to the orthologous groups they belong to Gene neighborhood gives information about functional relationships (genes involved in similar processes are often clustered together) Can be used to find the ‘true’ ortholog from a set of genes, by using not only phylogenetic information but also genomic context

9 PhyloPat: Gene neighborhood view Each cell:- Ensembl Gene ID - PhyloPat ID - HGNC Symbol ERN1 and ERN2 can be distinguished by looking At gene context

10 Overview PhyloPat Published in BMC Bioinformatics (2006) Update submitted to Nucleic Acids Res. Database issue BioVenn Revised version submitted to BMC Genomics Orthologous networks & OrthoPath Manuscript in preparation CoPub (Taverna workflows) Published in Nucleic Acids Res. Web Server issue (2008)

11 BioVenn Web application to see the overlap between different lists of biological identifiers, using area-proportional Venn diagrams Support of wide range of IDs, which are recognized and linked to the corresponding database: Affymetrix, COG, Ensembl, EntrezGene, Gene Ontology, InterPro, IPI, KEGG Pathway, KOG, PhyloPat and RefSeq Optional mapping of Affymetrix and EntrezGene to Ensembl Output in SVG (with drag-and-drop functionality) or PNG http://www.cmbi.ru.nl/biovenn/

12 BioVenn Embedded / standalone, SVG / PNG ID mapping Absolute numbers / percentages

13 BioVenn Lists for all 13 sets (X total, X only, XY total overlap, XY only overlap, XYZ overlap, etc.) If type of ID (e.g. Affymetrix, Ensembl) is recognized, output is linked to the corresponding database

14 Overview PhyloPat Published in BMC Bioinformatics (2006) Update submitted to Nucleic Acids Res. Database issue BioVenn Revised version submitted to BMC Genomics Orthologous networks & OrthoPath Manuscript in preparation CoPub (Taverna workflows) Published in Nucleic Acids Res. Web Server issue (2008)

15 Assessing orthologous biology in groups of genes: Application to GC induced insulin resistance Biorange meeting 2008-03-11: Goal: Gain better insight into the conservation of genes involved in glucocorticoid induced insulin resistance (GC induced IR) between human, mouse and rat.  Use CoPub to build literature networks, map orthology  Validation needed

16 Network Cytochrome P450s Lipid transport Adipocyte differentiation Jak/Stat/IL6 Insulin signaling Fatty acid oxidation/catabolism Misc: amino acid metabolism, MAPK signaling, osteoblast Dexamethosone & insulin

17 Validation approach Get all genes from a KEGG pathway Select random 10% of these genes Create Gene Network using these genes (CoPub) Compare with original KEGG pathway Repeat with varying thresholds

18 Results PathwayID # genes in pathway FP_rateTP_ratePos.Pred.Val Percentage Manual Pos Hematopoietic cell lineagehsa04640880.040.690.110.53 Jak-STAT signaling pathwayhsa046301530.020.540.200.33 Cytokine-cytokine receptor interactionhsa040602560.030.510.240.53 Toll-like receptor signaling pathwayhsa04620900.020.500.170.40 Metabolism of xenobiotics by cytochrome P450hsa00980700.000.500.430.32 Melanomahsa05218710.020.490.110.20 Renal cell carcinomahsa05211690.040.490.060.40 VEGF signaling pathwayhsa04370700.040.490.060.20 GnRH signaling pathwayhsa04912970.030.480.090.13 Endometrial cancerhsa05213520.020.480.070.47 Average TP = 0.24 Average FP = 0.01

19 Application to all human genes Create network for each gene with R scaled =30, literature count = 5 Calculate average conservation for each network based on conservation for all the genes in the network in 4 species (P.tro.,M.mus.,R.nor.,C.fam.) Get all genes in 100 least conserved networks Get all genes in 100 most conserved networks Calculate GO enrichment Compare 211 genes309 genes 6,181 networks with size>2 Non-conserved Conserved

20 OrthoPath OrthoPath is a gene centric search tool for literature networks and their orthologs Three input methods: Single gene search: Get the literature network for a given gene. OrthoPath will create a network of genes that are connected to this single gene. Keyword Search: Get the literature network based on a certain keyword. OrthoPath looks for genes that are connected to the keyword, and creates a network from all these genes. Multi Gene Search: Get the literature network for a set of genes. OrthoPath creates a network from only these genes that are entered by the user. http://ws2.grid.sara.nl/cgi-bin/orthopath/op.pl

21 OrthoPath Search with a single gene Search with a keyword Search with a list of genes Output in HTML, SVG, Cytoscape or Ingenuity format Set the minimum strength of a co-citation between two keywords Set the minimum number of abstracts in which a co-citation between the 2 genes is found

22 OrthoPath Each node in the network: -EntrezGene information: ID, symbol, description -number of neighbours -number of orthologs from human (for all five species)

23 Overview PhyloPat Published in BMC Bioinformatics (2006) Update submitted to Nucleic Acids Res. Database issue BioVenn Revised version submitted to BMC Genomics Orthologous networks & OrthoPath Manuscript in preparation CoPub (Taverna workflows) Published in Nucleic Acids Res. Web Server issue (2008)

24 CoPub Taverna Workflows Taverna: free software tool for designing and executing workflows Workflow files for CoPub have been developed: (1) Search gene (2) Get literature neighbours

25 CoPub Taverna Workflows (3) Get a list of categories (4) Get the complete network

26 Acknowledgements Wynand Alkema Wilco Fleuren Raoul Frijters Peter Groenen


Download ppt "An update on ongoing projects within Biorange SP3.2.2.1 Biorange Project Meeting Leiden, September 15 Tim Hulsen."

Similar presentations


Ads by Google