OBD : technical overview Chris Mungall. Outline  The annotation lifecycle  OBD Model and modeling requirements  Current OBD architecture  Discussion.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
Automated tools to help construction of Trait Ontologies Chris Mungall Monarch Initiative Gene.
Ontology Notes are from:
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Iowa State University Animal Science Department Bioinformatics & Computational Biology Program - 01/16/06 1 Overview of Animal Trait Ontology and PATO.
Storing and Retrieving Biological Instances with the Instance Store Daniele Turi, Phillip Lord, Michael Bada, Robert Stevens.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
1 CIS607, Fall 2006 Semantic Information Integration Instructor: Dejing Dou Week 10 (Nov. 29)
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Overview of Search Engines
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Practical RDF Chapter 1. RDF: An Introduction
Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.
Fundamentals of Information Systems, Fifth Edition
EXCS Sept Knowledge Engineering Meets Software Engineering Hele-Mai Haav Institute of Cybernetics at TUT Software department.
Provenance Metadata for Shared Product Model Databases Etiel Petrinja, Vlado Stankovski & Žiga Turk University of Ljubljana Faculty of Civil and Geodetic.
LexEVS Overview Mayo Clinic Rochester, Minnesota June 2009.
The VIVO Ontology Project Technology: Jon Corson-Rikert, Brian Caruso, Brian Lowe, Nick Cappadona Project Coordination: Medha Devare, Elaine Guidero, Jaron.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
XML Registries Source: Java TM API for XML Registries Specification.
Information System Development Courses Figure: ISD Course Structure.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Value Set Resolution: Build generalizable data normalization pipeline using LexEVS infrastructure resources Explore UIMA framework for implementing semantic.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
LexBIG/LexGrid Services for LexBIG 2.3 Model and API for the Grid.
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
Rational Unified Process Fundamentals Module 7: Process for e-Business Development Rational Unified Process Fundamentals Module 7: Process for e-Business.
Common Terminology Services 2 CTS 2 Submission Team Status Update HL7 Vocabulary Working Group May 17, 2011.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
PHS / Department of General Practice Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Knowledge representation in TRANSFoRm AMIA.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Introduction to the Semantic Web and Linked Data
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Mining the Biomedical Research Literature Ken Baclawski.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
2007 Mouse All Hands Meeting BIRN Ontology Day Jeff Grethe & Bill Bug (BIRN OTF) - March 7th, 2007.
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
OWL Web Ontology Language Summary IHan HSIAO (Sharon)
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
1 Integrating Databases into the Semantic Web through an Ontology-based Framework Dejing Dou, Paea LePendu, Shiwoong Kim Computer and Information Science,
Ontology Technology applied to Catalogues Paul Kopp.
Of 24 lecture 11: ontology – mediation, merging & aligning.
A Visual Web Query System for NeuronBank Ontology Weiling Li, Rajshekhar Sunderraman, and Paul Katz Georgia State University, Atlanta, GA.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
UCSD Neuron-Centered Database
LOD reference architecture
A framework for ontology Learning FROM Big Data
Presentation transcript:

OBD : technical overview Chris Mungall

Outline  The annotation lifecycle  OBD Model and modeling requirements  Current OBD architecture  Discussion

The need for OBD  The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data  Current knowledge encoded using ontologies are fragmented across multiple databases, multiple schemas  OBD provides a common means of accessing and querying across these annotations

OBD - What is it?  General purpose biomedical knowledgebase  Repository of biomedical annotations  Ontology-based queries and analysis  Annotations from multiple sources can be compared through use of ontologies and ontology mappings  Current primary use  Genotype-phenotype associations for DBPs  Future uses  Annotation of information entities  Documents, datasets, records, images  Annotation of any biomedical entity using bio-ontologies

The annotation lifecycle Shh Absence of aorta publish/ create Experiment/ investigation query/ meta-analysis Direct annotation Shh - Absence Of aorta X observation Computational representation Agent+tools (human/computer) Community/expert Information entity investigator read bio-entity Shh + Heart development Dev Biol 2005 Jul 15;283(2): “Sonic hedgehog is required for cardiac outflow tract and neural crest cell development” communicate Lab db

What is an annotation?  OBD has a very inclusive definition of annotation  An attributed statement positing some relation(s) between entities  Typically accompanied by associations to evidence-oriented entities and metadata  Examples: Shh participates_in heart development p53 implicated_in cancer p53 has_function DNA repair PMID:1234 mentions melanoma depicts (lesion that located_in CA4) Abc[-] influences blood pressure Trial3456 has_inclusion_criteria (age that < 65) Shh + Heart development Participates in

OBD and annotations Shh Absence of aorta publish/ create Experiment/ investigation query/ meta-analysis Direct annotation Shh - Absence Of aorta X observation Computational representation Agent (human/computer) Community/expert Information entity investigator read bio-entity Shh + Heart development Dev Biol 2005 Jul 15;283(2): “Sonic hedgehog is required for cardiac outflow tract and neural crest cell development” communicate local db Multiple schemas influences Participates in represents subjobj relation annotation submit/ consume

Flexibility of OBD  Most ontology-based bio-curation focuses on stating associations between bio-entities and types as represented in ontologies  Where bio-entities can be types or instances  Genes, proteins, genotypes, cells, organisms, strains  OBD can also accommodate ‘tagging’ annotations  E.g. Ontrez, term extraction from literature  Associations between information entities and ontology terms  E.g. documents, document parts, datasets, images

Ontrez in OBD Shh Absence of aorta publish/ Create/ Experiment/ investigation query/ meta-analysis Direct annotation Cardiac outflow tract PMID:1234 abstract X observation Computational representation Agent (computer) Community/expert Information entity investigator Read/ search bio-entity Shh PMID:1234 abstract Dev Biol 2005 Jul 15;283(2): “ Sonic hedgehog is required for cardiac outflow tract and neural crest cell development” communicate PMID:1234 describes representation subjobj relation annotation extraction

OBD model: Requirements  Generic  We can’t define a rigid schema for all of biomedicine  Let the domain ontologies do the modeling of the domain  Expressive  Use cases vary from simple ‘tagging’ to complex descriptions of biological phenomena  Formal semantics  Amenable to logical reasoning  FOL and/or OWL1.1  Standards-compatible  Integratable with semantic web

OBD Model: overview  Graph-based: nodes and links  Nodes: Classes, instances, relations  Links: Relation instances  Connect subject and object via relation plus additional properties  Annotations: Posited links with attribution / evidence  Equivalent expressivity as RDF and OWL  Links aka axioms and facts in OWL  Attributed links:  Named graphs  Reification  N-ary relation pattern  Supports construction of complex descriptions through graph model

Modeling requirement: descriptions  Descriptions are class expressions composed using multiple classes  Genus and differentia  Post-composed at annotation time  Examples (in owl manchester syntax * ):  GO Dendrite_spine that part_of CL Golgi_cell  PATO Decreased_length that inheres_in ( GO Dendrite_spine that part_of CL Golgi_cell)  Ontologies can also contain these class expressions  Pre-composed logical definitions  The ability to represent and reason over these descriptions is a key OBD requirement * Existential quantifier omitted

Reasoning over descriptions  Query requirement  Queries for annotations to “CNS neuron cell projection”  Should return:  Annotations to: GO Dendrite_spine that part_of CL Golgi_cell  Computational Requirements  Entailments  EL++ or greater  OWL constructs  intersectionOf  equivalentClass  Representing Phenotypes in OWL (OWLED 2007)

key Example of Annotation in OBD Post-composition of phenotype classes (PATO EQ formalism) Post-composition of complex anatomical entity descriptions

OBD Architecture  Two stacks  Semantic web stack  First iteration  Built using Sesame triplestore + OWLIM  Limited developer resources  Future iterations: Science-commons Virtuoso  OBD-SQL stack  Current focus  Traditional enterprise architecture  Plugs into Semantic Web stack via D2RQ

OBD Architecture: Two stacks

OBD-SQL Stack  Alpha version of API implemented  Test clients access via SOAP  Phenote current accesses via org.obo model & JDBC  Wraps org.obo model and OBD schema  Share relational abstraction layer  Org.obo wraps OWLAPI  Phenote currently connects via JDBC connectivity in org.obo

OBDAPI examples  node = getNodeById(“OMIM:601653”)  nodes = getNodesBySearch(“p53*”)  Sources = getSourceNodes()  nodes = getNodesBySource(“OMIM”)  nodes = getNodesByQuery(queryExpr)  graph = getAnnotationGraphAroundNode(“PATO: ”, true)  graph = getAnnotationGraphAroundNode(classExpr, true)  annots = getAnnotationStatementsForAnnotatedEntity(“Entrez:2138”)  stats = getSummaryStatistics()  stats = getCoAnnotatedNodes(“CL: ”)  stats = getEnrichedClasses(entityNodeList,Distribution.HYPERGEOMETRIC)

Objects sent over the wire  RESTful: OBD-XML  rnc on sourceforge  SOAP: obd.model objects  Core classes:  Graph  Node  (instance nodes, class nodes, relation nodes)  Statements  LiteralStatement  LinkStatement  Payload can be requested ‘frame-style’ or ‘axiom- style’

Phenote components as OBD clients Currently Implemented

Genome browser mashup Under Development (Holmes lab) Sensory neuron Vulva Uterine muscle locomotion oviposition

OBD Mediator Architecture  OBDAPI can act as client to other OBDAPIs  Mediator node distributes queries to source nodes

OBD-SQL Database  Generic minimal table model  Makes heavy use of views for core capabilities  E.g.  analyzing information content of classes based on annotation  Views can be materialized for speed  Deductive closure of classes (named and class expressions) pre-computed  Not a blind transitive closure  Subset of OWL-DL semantics (EL++)

OBD Dataflow

Analysis requirements  The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data  OBD must have capabilities for using to ontologies to query and analyze data effectively  Example:  Classes in common between similar entities  E.g. Gene homology and phenotype

Sequence homology Phenotype Homology of anatomical structure

Visualisation and display of annotations  Annotation comparison  Within species  Combining annotatin sources  Across species  Translational research OBD web-based interface prototype

Discussion: Integration  How should OBD be integrated with BioPortal?  Use case:  User queries for Sonic hedgehog on BioPortal  What happens?  What APIs are called?  What components in the persistence layer are used?

OBDAPI in BioPortal: two choices  Choice 1: Two separate APIs  Ontology API  Annotation API  Choice 2: Unified API  Use same API for search, implementing same behaviour  Same submission services  Same query model

Some requirements for unified API  Expressive model  Logical expressivity on a par with OWL-DL  Rich terminological and lifecycle model on a par with OBOF  Rich query model and capabilities  Logical entailment for both named classes and class expressions  Simple facades to express common queries  Expressive queries for more complex cases  Compiles to SQL & SPARQL

OBD Roadmap  Jan 2008  Package OBD website  OBD core API released  Local-OBD installer  Mar 2008  Port wrappers and import/export pipeline to java  Prototype RoR BioPortal integration  RESTful layer over API  May 2008  SPARQL wrapper  Integrate with Science Commons triplestore  Dynamic wrappers for other data sources  Analysis service layer released  Pluggable reasoner framework  Sep 2008  Integration with BIRN mediator

 end

Requirements breakout

OBDOntrez Model assertionstagging Analogy Database/knowledgebaseSearch engine; flickr; index Statements about Any bio or info entity; Genetic entities; individuals; trials; … Document and dataset elements Canonical example P53 protein variant gives rise to cancerDocument mentions p53 Document mentions cancer Granularity highlow Accuracy Function of expertiseFunction of concept recgnition engine Content generation Human - expert/community Automated Automated (text matching); Can be regenerated Use Search; finding annotations for entity of interest; finding similar entities; analysis; complex queries Finding documents and datasets; input to curation? Size Curated: 100s to millions Automated: ? 500gb? Risk - Scalability -Not enough assertions to have utility. - Ability to reason/query over large knowledgebase. Truth maintenance? - Scalability - Variation in precision/accuracy across domains (biology vs clinical)

 Ontrez annotation/tagging can be modeled by OBD annotation model

 Share same API, model  Separate underlying databases, API collects results

Capability requirement OBDOntrez Content maintenance Annotation tracking and mapping Yesno Use of cross-ontology links Yes (query expansion and in query)Yes (‘semantic query expansion’) Boolean queries yes Composite descriptions Yes? - perhaps in future Search on annotated entities Yes? Reasoning; detecting contradictions Yes?; no Detailed provenance Yes? Modeling element metadata noyes Distribution and local installation yesParking lot Content submission pipeline yes?

Requirement s for other resources OBDOntrez Ontology text definitions yesno Distribution and local installation yesdisagreement

Capabilities  Today  Get annotations for ‘Shh’  (synonym for “sonic hedgehog gene”)  NCI Thesaurus axioms (BioPortal)

Use case  What happens when a user queries on Shh?  Sources:  Ontologies  Ncithesaurus  Annotations  Tagging  Returns documents, datasets