@Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Slides:



Advertisements
Similar presentations
Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.
Advertisements

1 Working together to strengthen research in Europe Open access and preservation: how can knowledge sharing be improved in ERA? (session 1.5) Alma Swan.
How to write a Research Grant? or How to get a grant rejected? Spencer Gibson Provincial Director, Research CancerCare Manitoba.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
RDB2RDF: Incorporating Domain Semantics in Structured Data Satya S. Sahoo Kno.e.sis CenterKno.e.sis Center, Computer Science and Engineering Department,
Scientific RDF Databases Michael Mertens K.U.Leuven.
Representing the Immune Epitope Database in OWL Jason A. Greenbaum 1, Randi Vita 1, Laura Zarebski 1, Hussein Emami 2, Alessandro Sette 1, Alan Ruttenberg.
Global Alignment and Collaboration Jo
Who am I Gianluca Correndo PhD student (end of PhD) Work in the group of medical informatics (Paolo Terenziani) PhD thesis on contextualization techniques.
Ontology Notes are from:
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
Introduction to the biological pathway POSTECH NLP lab 발표자 : 정설경.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Information Management for the Life Sciences M. Scott Marshall Marco Roos Adaptive Information Disclosure University of Amsterdam.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Open Biomedical Ontologies. Open Biomedical Ontologies (OBO) An umbrella project for grouping different ontologies in biological/medical field –a repository.
Emerging Semantic Web Commercialization Opportunities Ken Baclawski Northeastern University.
Resource Curation and Automated Resource Discovery.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
How to read a scientific paper
Alan Ruttenberg PONS R&D Task force Alan Ruttenberg Science Commons.
Neuroscience Information Framework Ontologies: Nerve cells in Neurolex and NIFSTD Maryann Martone University of California, San Diego.
Claim, Evidence, Reasoning and Experimental Design Review.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
W HAT IS I NTEROPERABILITY ? ( AND HOW DO WE MEASURE IT ?) INSPIRE Conference 2011 Edinburgh, UK.
Workshop Aims NMSU GO Workshop 20 May Aims of this Workshop  WIIFM? modeling examples background information about GO modeling  Strategies for.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
Copyright OpenHelix. No use or reproduction without express written consent1.
ECE450 - Software Engineering II1 ECE450 – Software Engineering II Today: Introduction to Software Architecture.
OWL Representing Information Using the Web Ontology Language.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Using Open Data to Create Value for Citizens. Data.gov Provides instant access to ~400,000 datasets in easy to use formats Contributions from UN, World.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari
Mining the Biomedical Research Literature Ken Baclawski.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Towards a Top-Domain Ontology for Linking Biomedical Ontologies Holger Stenzhorn a,b Elena Beißwanger c Stefan Schulz a a Department of Medical Informatics,
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
The Web-Enabled Research Commons: Applications, Goals, and Trends Thinh Nguyen October 2009.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
Describing Bioinformatic Metadata at EBI James Malone
Alan Ruttenberg School of Dental Medicine Applications Alan Ruttenberg Oral Diagnostic Sciences Clinical and Translational Data Exchange.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Big Data that might benefit from ontology technology, but why this usually fails Barry Smith National Center for Ontological Research 1.
Semantic Media Wiki Open Terminology Development - Initial Steps - Frank Hartel, Ph.D. Associate Director, Enterprise Vocabulary Services National Cancer.
Ukpmc.ac.uk As a result of the mandates Research in the open How mandates work in practice 29 th May, 2009 Paul Davey, UK PubMed Central Engagement Manager,
EBI is an Outstation of the European Molecular Biology Laboratory. Semantic Interoperability Framework Sarala M. Wimalaratne (RICORDO project)
Syntax and semantics >AMYLASEE1 TGCATNGY A very simple FASTA file.
Introduction to Ontology Introductions Alan Ruttenberg Science Commons.
Semantic Web Adoption Ivan Herman, W3C First China Semantic Web Symposium (CSWS 2007), Beijing, China, First China Semantic Web Symposium.
Towards a unified MOD resource: An Overview
Harnessing the Semantic Web to Answer Scientific Questions:
Entrez Neuron: an OWL/RDFa–based Web Application for Information Exploration and Integration in Neuroscience Matthias Samwald, Ernest Lim, Peter Masiar,
Harnessing the Semantic Web to Answer Scientific Questions:
BioRDF Task: Building a Knowledgebase for Neuroscience
Workshop Aims TAMU GO Workshop 17 May 2010.
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
An ecosystem of contributions
CCO: concept & current status
W3C Semantic Web for Health Care and Life Sciences Interest Group
Harnessing the Semantic Web to Answer Scientific Questions:
Presentation transcript:

@Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal Scientist

Weather conditions Open source ethic is mainstream Beginnings of a viable Semantic Web Funders: products of public science not optimally used Burgeoning quality-focused developer community

Initial standardizations OWL 1.0 (OWL 1.1 WG in progress) SPARQL Viable tools Scalable triple stores e.g. Virtuoso, Oracle… Reasoners: Pellet, Fact++, CEL, QuOnto… Beginnings of a viable Semantic Web

Funders: Products of public science not optimally used Both government and philanthropies Data sharing mandates Open access publication mandates Recognition that Ontology can play key role (and funding) Wonderweb, NCBO, JCOR, (more in Europe, beginnings in Australia, China) E.g. NIH Ontology grants

Burgeoning quality-focused developer community W3C Semantic Web for Life Sciences Interest Group Brings together scientists, medical researchers, science writers and informaticians from academia, government, non- profit organizations - health care, pharmaceuticals and industry vendors Chartering of second phase in progress OBO Foundry Principle-based development of science-based ontologies with the goal of creating a suite of interoperable reference ontologies for biomedicine. Process and governance are being refined Groups are lining up to join

Some projects I’m involved in The challenge of data integration at Web scales The Neurocommons Collaborative Ontology Development OBI – The Ontology for Biomedical Investigations Identifying and working through aspects of Ontology Working with, and on, the Basic Formal Ontology What is a Gene Ontology Annotation?

The Neurocommons AddGene Plasmids NeuronDB BAMS Neurocommons text mining Homologene SWAN Entrez Gene Gene ontology annotations Mammalian Phenotype PDSPki BrainPharm AlzGene Antibodies PubChem MESH Reactome Allen Brain Atlas Publications CCDB Neuronbank OBO Ontologies NeuroMorpho SAO Coriell cells

What’s a (Science) Commons? Built on open resources: public domain, open databases, open literature Encoded in open architectures and technical standards

Science Commons Science Commons is a project of Creative Commons Creative Commons provides free tools that let authors, scientists, artists, and educators easily mark their creative work with the freedoms they want it to carry 140,000,000 objects on the Web under CC licenses in 40+ countries 700+ peer-reviewed journals carry CC licensing, including Public Library of Science Science Commons specializes CC to science For consumers of knowledge: make it easy to use and re-use information and increase chances for discovery For providers of knowledge: provide legal certainty and automated attribution and tracking For funders: provide new metrics for tracking return on investment based on re-use

Neurocomons approach From OBO Foundry: Carefully model biology to enable integration of data sources. “Audit trail to reality” From Web: Assign all biological entities URIs (lots already provided by OBO) and translate to OWL/RDF From OWL: Add triples inferred by reasoner to increase expressiveness of queries with even simple query engine From software engineering: Provide data via SPARQL first (API). Build tools on top of that. From open source movement: Make it freely available, reproducible

The Gene Ontology The gene ontology names many biological processes and tells us which genes are known to be involved in those processes.

The Gene Ontology (a small portion) Activation of innate immune response Cell surface pattern recognition receptor signaling pathway Biological Process is_a part_of

A simple query: Biological processes in dendrites? Alzheimer’s disease is characterized by neural degeneration. Among other things, there is damage to dendrites and axons, parts of nerve cells. What resources do we have available to learn more about biological processes in dendrites?

Biological processes naming dendrites PREFIX owl: PREFIX go: PREFIX obo: PREFIX rdfs: select ?name ?class ?definition from where { graph {?class rdfs:subClassOf go:GO_ } ?class rdfs:label ?name. ?class obo:hasDefinition ?def. ?def rdfs:label ?definition filter(regex(?name,"[Dd]endrite")) } URI for Biological Process (OBO Foundry principles guarantee unique names for each Universal)

From the “console”

But answers are also available by a “GET” /sparql/?query=PREFIX%20owl%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2 F2002%2F07%2Fowl%23%3E%0APREFIX%20go%3A%20%3Chttp%3A%2F% 2Fpurl.org%2Fobo%2Fowl%2FGO%23%3E%0APREFIX%20obo%3A%20%3C http%3A%2F%2Fwww.geneontology.org%2Fformats%2FoboInOwl%23%3E%0 APREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01 %2Frdf- schema%23%3E%0A%0Aselect%20%20%3Fname%20%20%3Fclass%20%3F definition%0Afrom%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2 F %3E%0Awhere%0A%7B%20%20%20graph%20%3Chttp%3A%2F %2Fpurl.org%2Fcommons%2Fhcls%2F %2Fclassrelations%3E%0A% 20%20%20%20%20%7B%3Fclass%20rdfs%3AsubClassOf%20go%3AGO_ %7D%0A%20%20%20%20%3Fclass%20rdfs%3Alabel%20%3Fname.%0 A%20%20%20%20%3Fclass%20obo%3AhasDefinition%20%3Fdef.%0A%20% 20%20%20%3Fdef%20rdfs%3Alabel%20%3Fdefinition%20%0A%20%20%20% 20filter(regex(%3Fname%2C%22%5BDd%5Dendrite%22))%0A%7D%0A&form at=&maxrows=50 So someone, somewhere else, can build something better *Note: Different query than previous slide

Three levels of representing scientific knowledge Record level: Represent database records. Inconsistent if two sources disagree about contents of a field. Statement level: Represent what researchers say. Inconsistent if two people disagree about what a paper said Domain level: OBO Foundry approach. Represent your best understanding of consensus. Inconsistent if facts contradict. We need all three (but make clear which is which) Next slide query is hybrid of Record/Domain

A SPARQL query for processes involved in pyramidal neurons prefix go: prefix rdfs: prefix owl: prefix mesh: prefix sc: prefix ro: select ?genename ?processname where { graph { ?paper ?p mesh:D ?article sc:identified_by_pmid ?paper. ?gene sc:describes_gene_or_gene_product_mentioned_by ?article. } graph { ?protein rdfs:subClassOf ?res. ?res owl:onProperty ro:has_function. ?res owl:someValuesFrom ?res2. ?res2 owl:onProperty ro:realized_as. ?res2 owl:someValuesFrom ?process. graph {{?process go:GO_ } union {?process rdfs:subClassOf go:GO_ }} ?protein rdfs:subClassOf ?parent. ?parent owl:equivalentClass ?res3. ?res3 owl:hasValue ?gene. } graph { ?gene rdfs:label ?genename } graph { ?process rdfs:label ?processname} } Mesh: Pyramidal Neurons Pubmed: Journal Articles Entrez Gene: Genes GO: Signal Transduction Inference required

Google: 223,000 results

Results DRD1, 1812adenylate cyclase activation ADRB2, 154adenylate cyclase activation ADRB2, 154arrestin mediated desensitization of G-protein coupled receptor protein signaling pathway DRD1IP, 50632dopamine receptor signaling pathway DRD1, 1812dopamine receptor, adenylate cyclase activating pathway DRD2, 1813dopamine receptor, adenylate cyclase inhibiting pathway GRM7, 2917G-protein coupled receptor protein signaling pathway GNG3, 2785G-protein coupled receptor protein signaling pathway GNG12, 55970G-protein coupled receptor protein signaling pathway DRD2, 1813G-protein coupled receptor protein signaling pathway ADRB2, 154G-protein coupled receptor protein signaling pathway CALM3, 808G-protein coupled receptor protein signaling pathway HTR2A, 3356G-protein coupled receptor protein signaling pathway DRD1, 1812G-protein signaling, coupled to cyclic nucleotide second messenger SSTR5, 6755G-protein signaling, coupled to cyclic nucleotide second messenger MTNR1A, 4543G-protein signaling, coupled to cyclic nucleotide second messenger CNR2, 1269G-protein signaling, coupled to cyclic nucleotide second messenger HTR6, 3362G-protein signaling, coupled to cyclic nucleotide second messenger GRIK2, 2898glutamate signaling pathway GRIN1, 2902glutamate signaling pathway GRIN2A, 2903glutamate signaling pathway GRIN2B, 2904glutamate signaling pathway ADAM10, 102integrin-mediated signaling pathway GRM7, 2917negative regulation of adenylate cyclase activity LRP1, 4035negative regulation of Wnt receptor signaling pathway ADAM10, 102Notch receptor processing ASCL1, 429Notch signaling pathway HTR2A, 3356serotonin receptor signaling pathway ADRB2, 154transmembrane receptor protein tyrosine kinase activation (dimerization) PTPRG, 5793transmembrane receptor protein tyrosine kinase signaling pathway EPHA4, 2043transmembrane receptor protein tyrosine kinase signaling pathway NRTN, 4902transmembrane receptor protein tyrosine kinase signaling pathway CTNND1, 1500Wnt receptor signaling pathway Many of the genes are indeed related to Alzheimer’s Disease through gamma secretase (presenilin) activity

What happens when data is discoverable, queryable, and accessible on the open web? Allen Brain Institute Servers Javascript SPARQL AJAX Query URL Google Maps API Neurocommons Servers

Others can “view source”, use our code in their own applications

Background Technology So far about 350M triples in Openlink Virtuoso (~20Gb) Commodity Hardware: 2x2core duo/2 disks/8G Ram Biggest so far is MeSH associations to articles (200M triples) Smaller, from 10K to 10M triples/source A small fraction of biological knowledge (another element of the perfect storm is that computer hardware is so cheap and powerful)

Results are success, but process more so Sample of three interesting cases on the way to the neurocommons Integration of Senselab Finding and addressing inconsistency Modeling Gene Ontology Annotations

Process(1): NeuronDB Started with homegrown ontology. Problem: How to link with anything else Eg. No links to evidence, “receptors” versus proteins with receptor activity (like GOA) Process, iterate many times, fixing OWL, GO understanding/conformance, augmenting what is in ontology. Ends with something that links with GO Function. Accepted process for how to move both NeuronDB and GO forward. Next slides – in detail how the discussion/teaching goes

Words mix up functions and objects Ligand Neurotransmitter Hormone Peptide Looking for peptides?

Foundry approach connects words to their corresponding entities in reality PeptideReceptorLigand - A peptide that has a function which makes it able to bind to a receptor PeptideNeurotransmitter - A peptide expressed in a neuron that has a function which makes it able to regulate another neuron PeptideHormone - A peptide that produced in one organ and having an regulatory effect in another. Peptide - A “short” polymer of amino acids Looking for peptides?

Peptides from CHEBI Chemical Entities of Biological Interest

Hormone Activity from GO Molecular Function

Towards RDF/OWL (1) ALL instances of PeptideHormone are an instance of Peptide that has_role SOME instance of HormoneActivity

Towards RDF/OWL (3) ALL instances of PeptideHormone are an instance of Peptide that has_role SOME instance of HormoneActivity

Towards RDF/OWL (3) - Instances

Towards RDF/OWL (4) URIs chebi:25905 =

Towards OWL (5) : triples chebi:25905 rdfs:subClassOf chebi: chebi:25905 rdfs:subClassOf _:1. :_1 owl:onProperty ro:hasRole. :_1 owl:someValuesFrom go:GO_ …

SPARQLing: Put ?variables where you are looking for matches chebi:25905 rdfs:subClassOf chebi: chebi:25905 rdfs:subClassOf _:1. :_1 owl:onProperty ro:hasRole. :_1 owl:someValuesFrom go:GO_ select ?moleculeClass where { ?moleculeClass rdfs:subClassOf chebi: ?moleculeClass rdfs:subClassOf ?res. ?res owl:onProperty ro:hasRole. ?res owl:someValuesFrom go:GO_ } ?moleculeClass = chebi:25905

Process(2): Inconsistency! Once Neurondb is coded properly, and an OWL reasoner is run, it declares the ontology inconsistent Problem: There are contradictory assertions about whether a particular ionic current occurs in a particular cell type. What to do? “ Three levels of representing scientific knowledge” tell us how inconsistency arises in each Inconsistency is NOT acceptable, but might this be an issue of confusion over desired level?

The dispute: Ionic current? Yes or No Another investigation One investigation Illustration – not the particular cell/current

Resolving the inconsistency If at the statement level, there need be no inconsistency if the assertions are qualified as being statements of someone. Choice 1: Rework representation to make this so If at the domain level, then only one can be right. Choice 2) As curator make judgement about which is right, or, see if information missing in the representation that would have this not be a contradiction. Resolution: Domain level is desired. Closer examination of papers find results from different species. Example of “ontological commitment” and dealing with consequences.

Process(3): What is a GO Annotation

Problems with integrating annotations with other knowledge What are the entities? What are the relationships between the process and the entities. How can we make All-Some statements involving annotations?

A closer look Ask me about evidence?

Semantic Web technology and ontology in the service of science Let our tools help us find mistakes (and other insights) by having representation that is good enough to be wrong. Expressed formally, and in conjunction with a reasoner, we might find that it can't possibly be there are instances of this class (unsatisfiable)

Public science: What we’d like to do better Broader knowledge base - cells, anatomy, physiology, behavior, protocols, reagents Beyond simple interaction: More precise representations of mechanism to be able to query and exploit computationally Built in a open, scalable, scientifically credible way, to encourage sustained contribution, and to take advantage of “web effects”

How do we get there? Interoperation is paramount, but modeling is hard: Work with the OBO Foundry Build a skilled community Use (open!) Semantic Web Technologies to enable web effects Support and nurture a growing and vigorous community (SWAN, BIRN, OBI) all of whom build on the rest and enable others to build more Work to advance key technologies and infrastructure - text mining, structured abstracts, query, reasoning. Recruit more ontologists! (That’s you)