Semantic Similarity Measures Across The Gene Ontology. Relating Sequence to Annotation. P.W. Lord, R.D. Stevens, A.Brass, and C. Goble Department of Computer.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.
Social networks, in the form of bibliographies and citations, have long been an integral part of the scientific process. We examine how to leverage the.
Using Semantic Similarity Measures in the Biomedical Domain for Computing Similarity between Genes based on Gene Ontology By : Elham Khabiri Adviser :
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
GOAT: The Gene Ontology Annotation Tool Dr. Mike Bada Department of Computer Science University of Manchester
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Contents of this Talk [Used as intro to Genome Databases Seminar, 2002] Overview of bioinformatics Motivations for genome databases Analogy of virus reverse-eng.
Semantic Similarity over the Gene Ontology F. M. Couto, M. J. Silva, P. M. Coutinho Family Correlation and Selecting Disjunctive Ancestors
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Storing and Retrieving Biological Instances with the Instance Store Daniele Turi, Phillip Lord, Michael Bada, Robert Stevens.
TAMBIS Transparent Access to Multiple Biological Information Sources This presentation will take about five minutes.
COG and GO tutorial.
Bioinformatics and Phylogenetic Analysis
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
TAMBIS Transparent Access to Multiple Biological Information Sources.
BioText Infrastructure Ariel Schwartz Gaurav Bhalotia 10/07/2002.
Using Information Content to Evaluate Semantic Similarity in a Taxonomy Presenter: Cosmin Adrian Bejan Philip Resnik Sun Microsystems Laboratories.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
SRI International Bioinformatics 1 Searching BioCyc Ron Caspi.
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Sharing of Community Practice through Semantics: A Case Study in Academic.
Automatic methods for functional annotation of sequences Petri Törönen.
Set similarity: given two gene products, G 1 and G 2, we can consider them as being represented by collections of terms: Based on the two sets, the goal.
Enriching the Ontology for Biomedical Investigations (OBI) to Improve Its Suitability for Web Service Annotations Chaitanya Guttula, Alok Dhamanaskar,
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Tae-Hyung Kim 1 Gil-Mi Ryu 1,2 InSong Koh 2 Jong Park 3 1.
BioKnOT Biological Knowledge through Ontology and TFIDF By: James Costello Advisor: Mehmet Dalkilic.
Search Update April 1-3, 2009 Joshua Ganderson Laura Baalman.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Department of computer science and engineering Two Layer Mapping from Database to RDF Martin Švihla Research Group Webing Department.
Condor: BLAST Monday, July 19 th, 3:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
Expanding GO annotations with text classification Nicko Goncharoff Reel Two, Inc.
Condor: BLAST Rob Quick Open Science Grid Indiana University.
Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University.
Proposed Research Problem Solving Environment for T. cruzi Intuitive querying of multiple sets of heterogeneous databases Formulate scientific workflows.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
University of Illinois at Urbana-Champaign. BeeSpace Project 5-year NSF-funded project Project Goals  Develop open bioinformatics resources  Support.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania.
Computer Science and Engineering PhD in Computer Science Monday, November 07, :00 a.m. – 11:00 a.m. Swearingen Conference Room 3A75 Network Based.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Bioinformatics Computation in the Cloud A Joint Collaboration Between Microsoft’s External Research and eXtreme Computing Groups
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
TDM in the Life Sciences Application to Drug Repositioning *
Metagenomic Species Diversity.
Scientific Reproducibility using the Provenance for Healthcare and Clinical Research Framework Satya S. Sahoo Collaborators/Co-Authors: Joshua Valdez,
Databases, Ontologies and Text mining Session Introduction Part 2
Grid Portal Services IeSE (the Integrated e-Science Environment)
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Saccharomyces Genome Database (SGD)
Functional Annotation of the Horse Genome
Genome Annotation Continued
Ontology-Based Information Integration Using INDUS System
Research related to Health Informatics
Bioinformatics Biological Data Computer Calculations +
A User’s Guide to GO: Structural and Functional Annotation
Lesson 3 Bioinformatics Laboratory
Hsin-Nan Lin, Ching-Tai Chen, Ting-Yi Sung,
Collaborative RO1 with NCBO
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Information Retrieval and Web Design
Condor: BLAST Tuesday, Dec 7th, 10:45am
Presentation transcript:

Semantic Similarity Measures Across The Gene Ontology. Relating Sequence to Annotation. P.W. Lord, R.D. Stevens, A.Brass, and C. Goble Department of Computer Science, The University of Manchester, M13 9PL, UK. Abstract: ● Bioinformatics Resources are rich in knowledge, but are often held as free text. ● Ontologies provide a way of representing knowledge in a form which is computationally accessible ● The Gene Ontology (GO) represents knowledge about:- ● The molecular function of a gene product ● The biological process it is involved in ● The cellular compartment of which it is a part. ● Can we ask a database for proteins with “semantically similar” annotation to a query protein? ● We present, and validate a measure which enables us to measure semantic similarity, and show several uses for this measure. Information Content Measures ● Originally by Resnik (1995) developed for WordNet (Fellbaum, 1998), but adaptable to GO. ● Less frequently occurring terms are “more informative”. ● To calculate:- ● For each term count the number of occurrences of that term, or any children ● Divide by the total number of terms to give a probability The Information Content For Each Node. ● The similarity is then given ● Where p ms is the information content of any shared parents. Validation ● Two proteins which have similar sequences should probably also have semantically similar annotation. ● We tested this by BLAST searching all SWISS-PROT proteins, taking the top bit scores, and comparing to semantic similarity. ● Semantic similarity over the molecular function aspect is most strongly correlated with sequence similarity. ● A similar experiment shows “Traceable Author Statement” associations are mostly tightly correlated with sequence similarity. ● This results fit well with biological expectations, and therefore serve to validate the semantic similarity measure. Applications ● We have developed two prototype applications ● A simple search tool, which uses the similarity score for ranking ● An annotation checker, which looks for high semantic similarity and low sequence similarity. ● The annotation checker has identified several “misannotations”, and errors in GO. ● The search tool, while primitive, appears to be producing results which intuitively appear “correct”. Future Work ● We are currently investigating several other information content based measures, and their behaviour over the GO dataset. ● We plan to offer a web based portal, to enable us to seek user feedback on our search tool. Acknowledgements ● The GO curators, and SWISS-PROT annotators for helpful comments ● The GO database, and API, and bioperl, were used during this work ● This work was funded under EPSRC/BBSRC Bioinformatics Programme (Grant number BIF/10507) References C.Fellbaum (1998) WordNet:- an electronic lexical database. MIT Press P. Resnik (1995) Using information content to evaulate semantic similarity in a t taxonomy Proc. 14 th Intl Joint Conf. On Artifical Intelligence pg Morgan Kaufman.