Ontologies in Biomedicine Mark A. Musen Stanford University.

Slides:



Advertisements
Similar presentations
The National Center for Biomedical Ontology One of three National Centers for Biomedical Computing launched by NIH in 2005 Collaboration of Stanford, Berkeley,
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
1 The Business Case for Large-Scale Ontology Projects: Are we at a tipping point? Mark A. Musen, M.D., Ph.D. Stanford Medical Informatics Stanford University.
Ontology Assessment – Proposed Framework and Methodology.
Using DAML format for representation and integration of complex gene networks: implications in novel drug discovery K. Baclawski Northeastern University.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
So What Does it All Mean? Geospatial Semantics and Ontologies Dr Kristin Stock.
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
Ontologies - Design principles Cartic Ramakrishnan LSDIS Lab University of Georgia.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 System modeling 2.
1 Building Ontologies from the Ground Up When users set out to model their professional activity Mark A. Musen Professor of Medicine and Computer Science.
Ontology Notes are from:
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 8 Slide 1 System models.
FMA: a domain reference ontology Comments on Cornelius Rosse’s talk Anita Burgun WG6 meeting, Rome 29 Apr- 2 May 2005.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Biological Ontologies Neocles Leontis April 20, 2005.
Modified from Sommerville’s originalsSoftware Engineering, 7th edition. Chapter 8 Slide 1 System models.
UML Class Diagrams: Basic Concepts. Objects –The purpose of class modeling is to describe objects. –An object is a concept, abstraction or thing that.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.
1 Joined up Health and Bio Informatics: Joined up Health and Bio Informatics: Alan Rector Bio and Health Informatics Forum/ Medical Informatics Group Department.
Chapter 4 System Models A description of the various models that can be used to specify software systems.
SNOMED CT Denise Downs Knowledge Management & Education Lead Data Standards, Technology Office Department of Health Informatics Directorate.
Knowledge representation
Of 39 lecture 2: ontology - basics. of 39 ontology a branch of metaphysics relating to the nature and relations of being a particular theory about the.
BioHealth Informatics Group Advanced OWL Tutorial 2005 Ontology Engineering in OWL Alan Rector & Jeremy Rogers BioHealth Informatics Group.
Copyright 2002 Prentice-Hall, Inc. Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Chapter 20 Object-Oriented.
Taxonomies and Laws Lecture 10. Taxonomies and Laws Taxonomies enumerate scientifically relevant classes and organize them into a hierarchical structure,
USCISIUSCISI Background Description Logic Systems Thomas Russ.
Manchester Medical Informatics Group OpenGALEN 1 Linking Formal Ontologies: Scale, Granularity and Context Alan Rector Medical Informatics Group, University.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Chapter 7 System models.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Sommerville 2004,Mejia-Alvarez 2009Software Engineering, 7th edition. Chapter 8 Slide 1 System models.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
1 What is an Ontology? n No exact definition n A tool to help organize knowledge n Or a way to convey a theory on how to represent a class of things n.
Database Systems: Enhanced Entity-Relationship Modeling Dr. Taysir Hassan Abdel Hamid.
Ontologies in Biomedicine What is the “right” amount of semantics? Mark A. Musen Stanford University.
SSO: THE SYNDROMIC SURVEILLANCE ONTOLOGY Okhmatovskaia A, Chapman WW, Collier N, Espino J, Conway M, Buckeridge DL Ontology Description The SSO was developed.
Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies.
The ICPS: A taxonomy, a classification, an ontology or an information model? Stefan SCHULZ IMBI, University Medical Center, Freiburg, Germany.
Proposed NWI KIF/CG --> Common Logic Standard A working group was recently formed from the KIF working group. John Sowa is the only CG representative so.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Artificial Intelligence 2004 Ontology
The future of the Web: Semantic Web 9/30/2004 Xiangming Mu.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Some Thoughts to Consider 8 How difficult is it to get a group of people, or a group of companies, or a group of nations to agree on a particular ontology?
+ From OBO to OWL and back again – a tutorial David Osumi-Sutherland, Virtual Fly Brain/FlyBase Chris Mungall – GO/LBL.
Approach to building ontologies A high-level view Chris Wroe.
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
1 SWE Introduction to Software Engineering Lecture 14 – System Modeling.
Of 29 lecture 15: description logic - introduction.
International Workshop 28 Jan – 2 Feb 2011 Phoenix, AZ, USA Ontology in Model-Based Systems Engineering Henson Graves 29 January 2011.
Ontologies COMP6028 Semantic Web Technologies Dr Nicholas Gibbins
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Don’t know much about philosophy: The confusion over bio-ontologies
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
DOMAIN ONTOLOGY DESIGN
ece 720 intelligent web: ontology and beyond
ece 627 intelligent web: ontology and beyond
Ontology.
Chapter 20 Object-Oriented Analysis and Design
Ontology.
Building Ontologies with Protégé-2000
Presentation transcript:

Ontologies in Biomedicine Mark A. Musen Stanford University

What Is An Ontology? The study of being A discipline co-opted by Computer Science to enable the “explicit specification of the conceptualization” of application domains: –Entities –Properties and attributes of entities –Constraints on properties and attributes –Individuals (often, but not always) A theory that provides –a common vocabulary –a shared understanding of the entities in an appliation area

Why Develop an Ontology? To share common understanding of the structure of descriptive information –among people –among software agents –between people and software To enable reuse of domain knowledge –to avoid “re-inventing the wheel” –to introduce standards to allow interoperability

Ontologies are just the beginning Ontologies Software agents Problem- solving methods Annotated Data Databases Declare structure Knowledge bases Knowledge bases Provide domain descriptions Enumerate domain terms

Supreme genus: SUBSTANCE Subordinate genera: BODYSPIRIT Differentiae: material immaterial Differentiae: animate inanimate Differentiae: sensitive insensitive Subordinate genera: LIVING MINERAL Proximate genera: ANIMALPLANT Species: HUMANBEAST Differentiae: rational irrational Individuals: Socrates Plato Aristotle … Porphyry’s depiction of Aristotle’s Categories

Foundational Model of Anatomy Long-term project at University of Washington to create a comprehensive ontology of human anatomy 72K concepts, 1.9M relationships One of the largest and best developed ontologies in biomedicine

Physical Anatomical Entity Anatomical Spatial Entity Anatomical Structure Body Substance Body Part Organ System Organism The Body Organ Part Organ Cell Organ Subdivision Organ Component Tissue Top level of the Foundational Model of Anatomy

Heart Cavity of Heart Wall of Heart Right Atrium Cavity of Right Atrium Wall of Right Atrium Fossa Ovalis Myocardium Sinus Venarum SA Node Myocardium of Right Atrium Cardiac Chamber Hollow Viscus Internal Feature Organ Cavity Organ Cavity Subdivision Anatomical Spatial Entity Anatomical Feature Body Space Organ Component Organ Subdivision Viscus Organ Part Organ Anatomical Structure Parts of the heart Classes of anatomical structures Is-a Part-of

But we really want ontologies in electronic form Ontology contents can be processed and interpreted by computers Interactive tools can assist developers in ontology authoring

The FMA demonstrates that distinctions are not universal Blood is not a tissue, but rather a body substance (like saliva or sweat) The pericardium is not part of the heart, but rather an organ in and of itself Each joint, each tendon, each piece of fascia is a separate organ These views are not shared by many anatomists!

Ontologies are cropping up everywhere! Indexing of online information for access by humans or search engines Product catalogs for e-commerce Reference terminologies for machine translation and data interchange Standard terms for describing experimental data Frameworks for structuring knowledge for decision support

The New Philosophers Categorizing “what exists” in machine- understandable form Providing a structure that enables –Developers to locate and update relevant descriptions –Computers to infer relationships and properties Creating new abstractions about the world to facilitate the creation of this structure

Lots of ontology builders are not very good philosophers Nearly always, ontologies are created to address pressing professional needs The people who have the most insight into professional knowledge may have little appreciation for metaphysics, principles of knowledge representation, or computational logic There simply aren’t enough good philosophers to go around

A case in point: The International Classification of Diseases An enumeration of diseases that forms the basis for all medical claims and reimbursements in the United States A “legacy” terminology that has its roots in 19 th century epidemiology Created initially by biostatisticians with a pressing need to compare death statistics in different European countries A system that won’t go away—and yet we would never create anything like it again

A Small Portion of ICD9-CM 724Unspecified disorders of the back 724.0Spinal stenosis, other than cervical Spinal stenosis, unspecified region Spinal stenosis, thoracic region Spinal stenosis, lumbar region Spinal stenosis, other 724.1Pain in thoracic spine 724.2Lumbago 724.3Sciatica 724.4Thoracic or lumbosacral neuritis 724.5Backache, unspecified 724.6Disorders of sacrum 724.7Disorders of coccyx Unspecified disorder of coccyx Hypermobility of coccyx Coccygodynia 724.8Other symptoms referable to back 724.9Other unspecified back disorders

ICD9 (1977): A Handful of Codes for Traffic Accidents

ICD10 (1999): 587 codes for such accidents V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income W65.40 Drowning and submersion while in bath-tub, street and highway, while engaged in sports activity X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities

ICD is used for lots of (too many?) things! ICD is used to code all patient encounters with the health-care system for purposes of – Billing and reimbursement – Institutional planning – Disease surveillance and public health – Quality assurance – Economic modeling by third-party payors ICD was never intended to make the distinctions relevant to all these tasks! When patient encounters are encoded with ICD, it is impossible to keep all these uses in mind

If real ontologists could build the ICD from scratch … Diseases would be organized with well-defined relationships Diseases would be associated with computer- understandable definitions There would be well-defined rules to enable aggregation of primitive concepts into complex descriptions—and for ensuring that those descriptions were sensible There would be well-defined mechanisms for creating use-specific views of the ICD

The components of ontologies Classes: The primary entities in the world being models (e.g., “organ”) Attributes: The properties of classes (e.g., “shape”, “location”) Relations: Statements regarding how one class may relate to others (e.g., “the heart” is- a “organ”) Axioms: More complex logical statements (e.g., “only paired organs can be left-sided or right-sided”)

Classes and attributes in the FMA

Attributes of a class (e.g., “Esophagus”)

“is-a” is a special relation If a sub-class is-a member of a super-class, then –every instance of the sub-class is also an instance of the super- class (e.g., every member of the set aorta is necessarily a member of the set artery) –Values of attributes of the super-class are inherited by every instance of the sub-class (e.g., if arteries have cylindrical shape, then aorta has cylindrical shape)

“Frame-based” knowledge- representation systems Allow developers to encode –Taxonomic hierarchies of classes –Other relations among classes (e.g., “part-of”) in addition to the is-a hierarchy –Attributes of classes that take on particular values to define instances of the classes Support inheritance of attributes and values along taxonomic relations

Distinctions about ontologies “Light” versus “heavy”: Is the ontology a simple taxonomy or does the ontology additional detail regarding the nature of classes? “Upper-level” versus “domain-oriented”: Does the ontology try to describe general, abstract concepts or concepts tied to a particular application area?

Suggested Upper Merged Ontology (SUMO)

Part of the CYC Upper Ontology

The story so far … Ontologies define the entities—and relationships among entities—in some application area The authors’ point of view determines which distinctions are appropriate in a particular ontology Ontologies often use frame-based representations (including classes, attributes, relationships, and axioms) to encode knowledge People are building ontologies for nearly every niche of biomedicine

The pressing need to standardize the names of human genes

But the human genome is only part of the problem … Scientist maintain huge databases of gene sequences and gene expression for a wide range of “model organisms” (e.g., mouse, rat, yeast, fruit fly, round worm, slime mold) Database entries are annotated with entries such as the name of a gene, the function of the gene, and so on How do you ensure uniformity in the nature of these annotations?

Gene Ontology Consortium Founded in 1998 as a collaboration among scientists responsible for developing different databases of genomic data for model organisms (fruit fly, yeast, mouse) Now, essentially all developers of all model-organism databases participate Goal: To produce a dynamic, controlled vocabulary that can be applied to all organism databases even as knowledge of gene and protein roles in cells is accumulating and changing

Gene Ontology (GO) Comprises three independent “ontologies” –molecular function of gene products –cellular component of gene products –biological process representing the gene product’s higher order role. Uses these terms as attributes of gene products in the collaborating databases (gene product associations) Allows queries across databases using GO terms, providing linkage of biological information across species

GO has been wildly successful!! Dozens of biologists around the world contribute to GO on a regular basis The ontology is updated every 30 minutes! It’s now impossible to work in most areas of computational biology without making use of GO terms

But GO has had real problems … Ontologies initially were represented in an idiosyncratic format that was not compatible with standard knowledge-representation systems (DAG-Edit) The format was based on directed acyclic graphs of concepts, without the general ability to specify machine interpretable properties of entities or definitions of entities Because of the informal knowledge-representation system, lots of errors crept into GO –Terms that were duplicated in different places –Terms with no superclasses –Uncertain relationships between terms The GO consortium is working hard to rectify these problems by means of a new representation (OBO-Edit) and enhanced quality control

Creating ontologies has become a widespread cottage industry Professional Societies –HL7: Reference Information Model –MGED: Microarray Gene Expression Data Society Ontology –HUPO: Human Protein Organization Ontology Government –NCI Thesaurus –NIST: Process Specification Language Open Biological Ontologies –GO –Three dozen (and growing) other ontologies –Mostly in DAG-Edit, some in Protégé format

A Portion of the OBO Library

HL-7 Reference Information Model (RIM)

HL7 RIM Provides a uniform framework for specification of information required by health-care information systems Based on six top-level, very general classes: Act, Entity, Role, Participation, Act_relationship, and Role_link Designed to facilitate information exchange among distributed elements of clinical information systems Has the same limitations that all “upper level” ontologies share: –Abstract entities are hard to define –It’s hard to know what should be “in” and what should be “out”

Description Logic (DL) A subset of logic designed to focus on categories and their definitions in terms of existing relations More expressive than frame-based representations systems (as in FMA) but less expressive than first-order logic (as in CYC) Major inference tasks: –Subsumption Is category C 1 a subset of C 2 ? –Classification Does Object O belong to C?

Kinds of classes Defined –Have explicit necessary and sufficient properties (roles) –Often are specializations of primitive concepts Primitive –Have no sufficient properties –May have other, necessary properties –Correspond to natural kinds

A simple network of Generic Concepts THING WOMAN HUMAN MAMMALFEMALE- ANIMAL MALE- ANIMAL PLANT ANIMAL MINERAL FISH HORSE * * * ** * * * * * MAN Defined concepts are in yellow; Primitive concepts are in green.

A classifier is a program that can use DL to conclude: All WOMEN are FEMALE ANIMALS A HORSE may not also be a PLANT HUMAN subsumes MAN and WOMAN A MAN may not also be a WOMAN

The Primitive Concept MESSAGE THING DATEMESSAGEPERSON TEXT * ** * * A MESSAGE is, among other things, a THING with at least one Sender, all of which are PERSONs, at least one Recipient, all of which are PERSONs, a Body, which is a TEXT, a SendDate, which is a DATE, and a ReceivedDate, which is a DATE. SendDate (1,1) ReceiveDate (1,1) Body (1,1) Recipient (1,NIL) Sender (1,NIL) v/r

Defined concepts are derived from primitive concepts DATEMESSAGEPERSON TEXT ** * * A STARFLEET-MESSAGE is a MESSAGE, all of whose Senders are STARFLEET-COMMANDERS. SendDate (1,1) ReceivedDate (1,1) Body (1,1) Recipient (1,NIL) Sender (1,NIL) v/r STARFLEET- MESSAGE STARFLEET- COMMANDER v/r restricts

A DL Classifier Takes a new Concept and automatically determines all subsumption relations between it and all other Concepts in the network Adds new links when new subsumption relations are discovered Automates the placement of new Concepts in the taxonomy

Before Classifying the Concept X DATEMESSAGEPERSON TEXT ** * * A MESSAGE with exactly one Recipient, and all of whose Senders are STARFLEET-COMMANDERs. SendDate (1,1) ReceivedDate (1,1) Body (1,1) Recipient (1,NIL) Sender (1,NIL) v/r STARFLEET- MESSAGE STARFLEET- COMMANDER v/r restricts v/r restricts (1,1) X

After Classifying the Concept X DATEMESSAGEPERSON TEXT ** * * SendDate (1,1) ReceivedDate (1,1) Body (1,1) Recipient (1,NIL) Sender (1,NIL) v/r STARFLEET- MESSAGE STARFLEET- COMMANDER v/r restricts (1,1) X X IS-A STARFLEET MESSAGE!

The Beauty of Classification for Ontologies The classifier takes care of where to place a new concept in the hierarchy All inheritance relationships are automatically propagated to the new concept Relationships among a new concept and other entities are automatically simplified by classifying the new concept as a specialization of existing concepts

Classification generates a new, inferred hierarchy

The Ontology Web Language (OWL) Comes in three flavors: –OWL Lite (frame-based) –OWL DL (decription logic) –OWL Full (first-order logic and then some) Rapidly being adopted for use in biomedical ontologies, including: –NCI Thesaurus (cancer biology and oncology) –MGED Ontology (DNA micro-array experiments) –BioPAX (metabolic pathways) The new editor and representation system for OBO ontologies (OBO-Edit) uses a subset of OWL

DL and Ontologies There is not just one “description logic”; DLs come in different varieties with different expressivity DLs are of value primarily to ontology developers, to see the implications of modeling decisions DLs also can be used by end users, when reasoning about systems that ontologies model

A thousand flowers are blooming! Ontologies are being developed by interested groups from every sector of academia, industry, and government Many of these ontologies have been proven to be extraordinarily useful to wide communities We finally have tools and representation languages that can enable us to create durable and maintainable ontologies with rich semantic content