An update on ongoing projects within Biorange SP3.2.2.1 Biorange Project Meeting Leiden, September 15 Tim Hulsen.

Slides:



Advertisements
Similar presentations
Overviews and Omics Viewers. SRI International Bioinformatics Introduction Each overview is a genome-scale diagram of a different aspect of the cellular.
Advertisements

Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
Pathways analysis Iowa State Workshop 11 June 2009.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen BeNeLux BioInformatics Conference 2006.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
MDI Retraite 2007 Evolution of the immune system from model organism to man Tim Hulsen 1, Wilco W.M. Fleuren 1, Peter M.A. Groenen 2 1 CMBI, Radboud University.
How we assist knowledge collection Serving the monks Chris Evelo Dept of Bioinformatics – BiGCaT Maastricht University.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Benchmarking Orthology in Eukaryotes Nijmegen Tim Hulsen.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Babelomics Functional interpretation of genome-scale experiments Barcelona, 28 November de 2007 Ignacio Medina David Montaner
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
>>> Korean BioInformation Center >>> KRIBB Korea Research institute of Bioscience and Biotechnology GS2PATH: Linking Gene Ontology and Pathways Jin Ok.
Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Ranking-Aware Integration and Explorative Search of Distributed Bio-Data Dipartimento di Elettronica e Informazione NETTAB 2012 Integrated Bio-Search November.
Bioinformatics Dr. Víctor Treviño BT4007
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
Networks and Interactions Boo Virk v1.0.
CANDID: A candidate gene identification tool Janna Hutz March 19, 2007.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Managing Data Modeling GO Workshop 3-6 August 2010.
I529: Lab5 02/20/2009 AI : Kwangmin Choi. Today’s topics Gene Ontology prediction/mapping – AmiGo –
EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
UBio Training Courses Micro-RNA web tools Gonzalo
Data Mining in Ensembl with BioMart Nov,
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
PaLS: Pathways and Literature Strainer Filtering common literature, ontology terms and pathway information. Andrés Cañada Pallarés Instituto Nacional de.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Protein and RNA Families
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Copyright OpenHelix. No use or reproduction without express written consent1.
P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
A collaborative tool for sequence annotation. Contact:
Cool BaRC Web Tools Prat Thiru. BaRC Web Tools We have.
1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Organon Tim Hulsen, Oss, November 11, 2003.
The evolution of the immune system in chicken and higher Organon, Oss Tim Hulsen.
Copyright OpenHelix. No use or reproduction without express written consent1.
GO based data analysis Iowa State Workshop 11 June 2009.
Copyright OpenHelix. No use or reproduction without express written consent1.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
1 Computational functional genomics Lital Haham Sivan Pearl.
Copyright OpenHelix. No use or reproduction without express written consent1.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Importing KEGG pathway and mapping custom node graphics on Cytoscape Kozo Nishida Keiichiro Ono Cytoscape retreat 2010 University of Michigan Jul 18, 2010.
Biocomputational Languages December 1, 2011 Greg Antell & Khoa Nguyen.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Networks and Interactions
Using ArrayExpress.
Functional Annotation of Transcripts
Functional Annotation of the Horse Genome
ID Mapping tools: Converting Accessions between Databases
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Network biology An introduction to STRING and Cytoscape
Presentation transcript:

An update on ongoing projects within Biorange SP Biorange Project Meeting Leiden, September 15 Tim Hulsen

Biorange SP3.2.2 CoPub Knowledge integration ArrayExpress db User Xref db  Gene annotation through applications: PhyloPat, BioVenn, OrthoPath, CoPub

Overview PhyloPat Published in BMC Bioinformatics (2006) Update submitted to Nucleic Acids Res. Database issue BioVenn Revised version submitted to BMC Genomics Orthologous networks & OrthoPath Manuscript in preparation CoPub (Taverna workflows) Published in Nucleic Acids Res. Web Server issue (2008)

PhyloPat - Introduction Phylogenetic patterns show the presence or absence of certain genes in a set of full genomes derived from different species PhyloPat allows the complete Ensembl gene database to be queried using phylogenetic patterns Published in september 2006, now new version with: Ensembl v50 Support of HGNC and EntrezGene IDs FASTA-format sequences of the members of a phylogenetic lineage Gene neighborhood view

PhyloPat: Update to Ensembl v50 39 species, under which model organisms such as C. elegans, D. melanogaster, D. rerio, G. gallus, M. musculus, R. norvegicus, C. familiaris, M. mulatta, and human In total 814,936 genes In total 244,114 orthologous groups, created by clustering the orthologous gene pairs predicted by Ensembl

PhyloPat: Support of HGNC and EntrezGene IDs HGNC-Ensembl mapping for 29 species EntrezGene-Ensembl mapping for 18 species Choose form four types of IDs

PhyloPat: FASTA-format sequences “L”: Longest peptide sequences from this orthologous group (only the longest peptide per gene) “A”: All peptide sequences from this orthologous group (all peptides per gene)

PhyloPat: Gene neighborhood view The ‘Gene neighborhood view’ shows all genes from all species in a certain phylogenetic lineage, and all genes in their proximity on the genome (10 genes to both sides) Neighbouring genes are color-coded according to the orthologous groups they belong to Gene neighborhood gives information about functional relationships (genes involved in similar processes are often clustered together) Can be used to find the ‘true’ ortholog from a set of genes, by using not only phylogenetic information but also genomic context

PhyloPat: Gene neighborhood view Each cell:- Ensembl Gene ID - PhyloPat ID - HGNC Symbol ERN1 and ERN2 can be distinguished by looking At gene context

Overview PhyloPat Published in BMC Bioinformatics (2006) Update submitted to Nucleic Acids Res. Database issue BioVenn Revised version submitted to BMC Genomics Orthologous networks & OrthoPath Manuscript in preparation CoPub (Taverna workflows) Published in Nucleic Acids Res. Web Server issue (2008)

BioVenn Web application to see the overlap between different lists of biological identifiers, using area-proportional Venn diagrams Support of wide range of IDs, which are recognized and linked to the corresponding database: Affymetrix, COG, Ensembl, EntrezGene, Gene Ontology, InterPro, IPI, KEGG Pathway, KOG, PhyloPat and RefSeq Optional mapping of Affymetrix and EntrezGene to Ensembl Output in SVG (with drag-and-drop functionality) or PNG

BioVenn Embedded / standalone, SVG / PNG ID mapping Absolute numbers / percentages

BioVenn Lists for all 13 sets (X total, X only, XY total overlap, XY only overlap, XYZ overlap, etc.) If type of ID (e.g. Affymetrix, Ensembl) is recognized, output is linked to the corresponding database

Overview PhyloPat Published in BMC Bioinformatics (2006) Update submitted to Nucleic Acids Res. Database issue BioVenn Revised version submitted to BMC Genomics Orthologous networks & OrthoPath Manuscript in preparation CoPub (Taverna workflows) Published in Nucleic Acids Res. Web Server issue (2008)

Assessing orthologous biology in groups of genes: Application to GC induced insulin resistance Biorange meeting : Goal: Gain better insight into the conservation of genes involved in glucocorticoid induced insulin resistance (GC induced IR) between human, mouse and rat.  Use CoPub to build literature networks, map orthology  Validation needed

Network Cytochrome P450s Lipid transport Adipocyte differentiation Jak/Stat/IL6 Insulin signaling Fatty acid oxidation/catabolism Misc: amino acid metabolism, MAPK signaling, osteoblast Dexamethosone & insulin

Validation approach Get all genes from a KEGG pathway Select random 10% of these genes Create Gene Network using these genes (CoPub) Compare with original KEGG pathway Repeat with varying thresholds

Results PathwayID # genes in pathway FP_rateTP_ratePos.Pred.Val Percentage Manual Pos Hematopoietic cell lineagehsa Jak-STAT signaling pathwayhsa Cytokine-cytokine receptor interactionhsa Toll-like receptor signaling pathwayhsa Metabolism of xenobiotics by cytochrome P450hsa Melanomahsa Renal cell carcinomahsa VEGF signaling pathwayhsa GnRH signaling pathwayhsa Endometrial cancerhsa Average TP = 0.24 Average FP = 0.01

Application to all human genes Create network for each gene with R scaled =30, literature count = 5 Calculate average conservation for each network based on conservation for all the genes in the network in 4 species (P.tro.,M.mus.,R.nor.,C.fam.) Get all genes in 100 least conserved networks Get all genes in 100 most conserved networks Calculate GO enrichment Compare 211 genes309 genes 6,181 networks with size>2 Non-conserved Conserved

OrthoPath OrthoPath is a gene centric search tool for literature networks and their orthologs Three input methods: Single gene search: Get the literature network for a given gene. OrthoPath will create a network of genes that are connected to this single gene. Keyword Search: Get the literature network based on a certain keyword. OrthoPath looks for genes that are connected to the keyword, and creates a network from all these genes. Multi Gene Search: Get the literature network for a set of genes. OrthoPath creates a network from only these genes that are entered by the user.

OrthoPath Search with a single gene Search with a keyword Search with a list of genes Output in HTML, SVG, Cytoscape or Ingenuity format Set the minimum strength of a co-citation between two keywords Set the minimum number of abstracts in which a co-citation between the 2 genes is found

OrthoPath Each node in the network: -EntrezGene information: ID, symbol, description -number of neighbours -number of orthologs from human (for all five species)

Overview PhyloPat Published in BMC Bioinformatics (2006) Update submitted to Nucleic Acids Res. Database issue BioVenn Revised version submitted to BMC Genomics Orthologous networks & OrthoPath Manuscript in preparation CoPub (Taverna workflows) Published in Nucleic Acids Res. Web Server issue (2008)

CoPub Taverna Workflows Taverna: free software tool for designing and executing workflows Workflow files for CoPub have been developed: (1) Search gene (2) Get literature neighbours

CoPub Taverna Workflows (3) Get a list of categories (4) Get the complete network

Acknowledgements Wynand Alkema Wilco Fleuren Raoul Frijters Peter Groenen