Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ontologies in Biomedicine Mark A. Musen Stanford University.

Similar presentations


Presentation on theme: "Ontologies in Biomedicine Mark A. Musen Stanford University."— Presentation transcript:

1 Ontologies in Biomedicine Mark A. Musen Stanford University

2 What Is An Ontology? The study of being A discipline co-opted by Computer Science to enable the “explicit specification of the conceptualization” of application domains: –Entities –Properties and attributes of entities –Constraints on properties and attributes –Individuals (often, but not always) A theory that provides –a common vocabulary –a shared understanding of the entities in an appliation area

3 Why Develop an Ontology? To share common understanding of the structure of descriptive information –among people –among software agents –between people and software To enable reuse of domain knowledge –to avoid “re-inventing the wheel” –to introduce standards to allow interoperability

4 Ontologies are just the beginning Ontologies Software agents Problem- solving methods Annotated Data Databases Declare structure Knowledge bases Knowledge bases Provide domain descriptions Enumerate domain terms

5 Supreme genus: SUBSTANCE Subordinate genera: BODYSPIRIT Differentiae: material immaterial Differentiae: animate inanimate Differentiae: sensitive insensitive Subordinate genera: LIVING MINERAL Proximate genera: ANIMALPLANT Species: HUMANBEAST Differentiae: rational irrational Individuals: Socrates Plato Aristotle … Porphyry’s depiction of Aristotle’s Categories

6

7 Foundational Model of Anatomy Long-term project at University of Washington to create a comprehensive ontology of human anatomy 72K concepts, 1.9M relationships One of the largest and best developed ontologies in biomedicine

8 Physical Anatomical Entity Anatomical Spatial Entity Anatomical Structure Body Substance Body Part Organ System Organism The Body Organ Part Organ Cell Organ Subdivision Organ Component Tissue Top level of the Foundational Model of Anatomy

9 Heart Cavity of Heart Wall of Heart Right Atrium Cavity of Right Atrium Wall of Right Atrium Fossa Ovalis Myocardium Sinus Venarum SA Node Myocardium of Right Atrium Cardiac Chamber Hollow Viscus Internal Feature Organ Cavity Organ Cavity Subdivision Anatomical Spatial Entity Anatomical Feature Body Space Organ Component Organ Subdivision Viscus Organ Part Organ Anatomical Structure Parts of the heart Classes of anatomical structures Is-a Part-of

10 But we really want ontologies in electronic form Ontology contents can be processed and interpreted by computers Interactive tools can assist developers in ontology authoring

11 The FMA demonstrates that distinctions are not universal Blood is not a tissue, but rather a body substance (like saliva or sweat) The pericardium is not part of the heart, but rather an organ in and of itself Each joint, each tendon, each piece of fascia is a separate organ These views are not shared by many anatomists!

12 Ontologies are cropping up everywhere! Indexing of online information for access by humans or search engines Product catalogs for e-commerce Reference terminologies for machine translation and data interchange Standard terms for describing experimental data Frameworks for structuring knowledge for decision support

13 The New Philosophers Categorizing “what exists” in machine- understandable form Providing a structure that enables –Developers to locate and update relevant descriptions –Computers to infer relationships and properties Creating new abstractions about the world to facilitate the creation of this structure

14 Lots of ontology builders are not very good philosophers Nearly always, ontologies are created to address pressing professional needs The people who have the most insight into professional knowledge may have little appreciation for metaphysics, principles of knowledge representation, or computational logic There simply aren’t enough good philosophers to go around

15 A case in point: The International Classification of Diseases An enumeration of diseases that forms the basis for all medical claims and reimbursements in the United States A “legacy” terminology that has its roots in 19 th century epidemiology Created initially by biostatisticians with a pressing need to compare death statistics in different European countries A system that won’t go away—and yet we would never create anything like it again

16 A Small Portion of ICD9-CM 724Unspecified disorders of the back 724.0Spinal stenosis, other than cervical 724.00Spinal stenosis, unspecified region 724.01Spinal stenosis, thoracic region 724.02Spinal stenosis, lumbar region 724.09Spinal stenosis, other 724.1Pain in thoracic spine 724.2Lumbago 724.3Sciatica 724.4Thoracic or lumbosacral neuritis 724.5Backache, unspecified 724.6Disorders of sacrum 724.7Disorders of coccyx 724.70Unspecified disorder of coccyx 724.71Hypermobility of coccyx 724.71Coccygodynia 724.8Other symptoms referable to back 724.9Other unspecified back disorders

17 ICD9 (1977): A Handful of Codes for Traffic Accidents

18 ICD10 (1999): 587 codes for such accidents V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income W65.40 Drowning and submersion while in bath-tub, street and highway, while engaged in sports activity X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities

19 ICD is used for lots of (too many?) things! ICD is used to code all patient encounters with the health-care system for purposes of – Billing and reimbursement – Institutional planning – Disease surveillance and public health – Quality assurance – Economic modeling by third-party payors ICD was never intended to make the distinctions relevant to all these tasks! When patient encounters are encoded with ICD, it is impossible to keep all these uses in mind

20 If real ontologists could build the ICD from scratch … Diseases would be organized with well-defined relationships Diseases would be associated with computer- understandable definitions There would be well-defined rules to enable aggregation of primitive concepts into complex descriptions—and for ensuring that those descriptions were sensible There would be well-defined mechanisms for creating use-specific views of the ICD

21 The components of ontologies Classes: The primary entities in the world being models (e.g., “organ”) Attributes: The properties of classes (e.g., “shape”, “location”) Relations: Statements regarding how one class may relate to others (e.g., “the heart” is- a “organ”) Axioms: More complex logical statements (e.g., “only paired organs can be left-sided or right-sided”)

22 Classes and attributes in the FMA

23 Attributes of a class (e.g., “Esophagus”)

24 “is-a” is a special relation If a sub-class is-a member of a super-class, then –every instance of the sub-class is also an instance of the super- class (e.g., every member of the set aorta is necessarily a member of the set artery) –Values of attributes of the super-class are inherited by every instance of the sub-class (e.g., if arteries have cylindrical shape, then aorta has cylindrical shape)

25 “Frame-based” knowledge- representation systems Allow developers to encode –Taxonomic hierarchies of classes –Other relations among classes (e.g., “part-of”) in addition to the is-a hierarchy –Attributes of classes that take on particular values to define instances of the classes Support inheritance of attributes and values along taxonomic relations

26 Distinctions about ontologies “Light” versus “heavy”: Is the ontology a simple taxonomy or does the ontology additional detail regarding the nature of classes? “Upper-level” versus “domain-oriented”: Does the ontology try to describe general, abstract concepts or concepts tied to a particular application area?

27 Suggested Upper Merged Ontology (SUMO)

28 Part of the CYC Upper Ontology

29 The story so far … Ontologies define the entities—and relationships among entities—in some application area The authors’ point of view determines which distinctions are appropriate in a particular ontology Ontologies often use frame-based representations (including classes, attributes, relationships, and axioms) to encode knowledge People are building ontologies for nearly every niche of biomedicine

30 The pressing need to standardize the names of human genes

31 But the human genome is only part of the problem … Scientist maintain huge databases of gene sequences and gene expression for a wide range of “model organisms” (e.g., mouse, rat, yeast, fruit fly, round worm, slime mold) Database entries are annotated with entries such as the name of a gene, the function of the gene, and so on How do you ensure uniformity in the nature of these annotations?

32 Gene Ontology Consortium Founded in 1998 as a collaboration among scientists responsible for developing different databases of genomic data for model organisms (fruit fly, yeast, mouse) Now, essentially all developers of all model-organism databases participate Goal: To produce a dynamic, controlled vocabulary that can be applied to all organism databases even as knowledge of gene and protein roles in cells is accumulating and changing

33 Gene Ontology (GO) Comprises three independent “ontologies” –molecular function of gene products –cellular component of gene products –biological process representing the gene product’s higher order role. Uses these terms as attributes of gene products in the collaborating databases (gene product associations) Allows queries across databases using GO terms, providing linkage of biological information across species

34

35 GO has been wildly successful!! Dozens of biologists around the world contribute to GO on a regular basis The ontology is updated every 30 minutes! It’s now impossible to work in most areas of computational biology without making use of GO terms

36 But GO has had real problems … Ontologies initially were represented in an idiosyncratic format that was not compatible with standard knowledge-representation systems (DAG-Edit) The format was based on directed acyclic graphs of concepts, without the general ability to specify machine interpretable properties of entities or definitions of entities Because of the informal knowledge-representation system, lots of errors crept into GO –Terms that were duplicated in different places –Terms with no superclasses –Uncertain relationships between terms The GO consortium is working hard to rectify these problems by means of a new representation (OBO-Edit) and enhanced quality control

37 Creating ontologies has become a widespread cottage industry Professional Societies –HL7: Reference Information Model –MGED: Microarray Gene Expression Data Society Ontology –HUPO: Human Protein Organization Ontology Government –NCI Thesaurus –NIST: Process Specification Language Open Biological Ontologies –GO –Three dozen (and growing) other ontologies –Mostly in DAG-Edit, some in Protégé format

38 A Portion of the OBO Library

39 HL-7 Reference Information Model (RIM)

40 HL7 RIM Provides a uniform framework for specification of information required by health-care information systems Based on six top-level, very general classes: Act, Entity, Role, Participation, Act_relationship, and Role_link Designed to facilitate information exchange among distributed elements of clinical information systems Has the same limitations that all “upper level” ontologies share: –Abstract entities are hard to define –It’s hard to know what should be “in” and what should be “out”

41 Description Logic (DL) A subset of logic designed to focus on categories and their definitions in terms of existing relations More expressive than frame-based representations systems (as in FMA) but less expressive than first-order logic (as in CYC) Major inference tasks: –Subsumption Is category C 1 a subset of C 2 ? –Classification Does Object O belong to C?

42 Kinds of classes Defined –Have explicit necessary and sufficient properties (roles) –Often are specializations of primitive concepts Primitive –Have no sufficient properties –May have other, necessary properties –Correspond to natural kinds

43 A simple network of Generic Concepts THING WOMAN HUMAN MAMMALFEMALE- ANIMAL MALE- ANIMAL PLANT ANIMAL MINERAL FISH HORSE * * * ** * * * * * MAN Defined concepts are in yellow; Primitive concepts are in green.

44 A classifier is a program that can use DL to conclude: All WOMEN are FEMALE ANIMALS A HORSE may not also be a PLANT HUMAN subsumes MAN and WOMAN A MAN may not also be a WOMAN

45 The Primitive Concept MESSAGE THING DATEMESSAGEPERSON TEXT * ** * * A MESSAGE is, among other things, a THING with at least one Sender, all of which are PERSONs, at least one Recipient, all of which are PERSONs, a Body, which is a TEXT, a SendDate, which is a DATE, and a ReceivedDate, which is a DATE. SendDate (1,1) ReceiveDate (1,1) Body (1,1) Recipient (1,NIL) Sender (1,NIL) v/r

46 Defined concepts are derived from primitive concepts DATEMESSAGEPERSON TEXT ** * * A STARFLEET-MESSAGE is a MESSAGE, all of whose Senders are STARFLEET-COMMANDERS. SendDate (1,1) ReceivedDate (1,1) Body (1,1) Recipient (1,NIL) Sender (1,NIL) v/r STARFLEET- MESSAGE STARFLEET- COMMANDER v/r restricts

47 A DL Classifier Takes a new Concept and automatically determines all subsumption relations between it and all other Concepts in the network Adds new links when new subsumption relations are discovered Automates the placement of new Concepts in the taxonomy

48 Before Classifying the Concept X DATEMESSAGEPERSON TEXT ** * * A MESSAGE with exactly one Recipient, and all of whose Senders are STARFLEET-COMMANDERs. SendDate (1,1) ReceivedDate (1,1) Body (1,1) Recipient (1,NIL) Sender (1,NIL) v/r STARFLEET- MESSAGE STARFLEET- COMMANDER v/r restricts v/r restricts (1,1) X

49 After Classifying the Concept X DATEMESSAGEPERSON TEXT ** * * SendDate (1,1) ReceivedDate (1,1) Body (1,1) Recipient (1,NIL) Sender (1,NIL) v/r STARFLEET- MESSAGE STARFLEET- COMMANDER v/r restricts (1,1) X X IS-A STARFLEET MESSAGE!

50 The Beauty of Classification for Ontologies The classifier takes care of where to place a new concept in the hierarchy All inheritance relationships are automatically propagated to the new concept Relationships among a new concept and other entities are automatically simplified by classifying the new concept as a specialization of existing concepts

51 Classification generates a new, inferred hierarchy

52 The Ontology Web Language (OWL) Comes in three flavors: –OWL Lite (frame-based) –OWL DL (decription logic) –OWL Full (first-order logic and then some) Rapidly being adopted for use in biomedical ontologies, including: –NCI Thesaurus (cancer biology and oncology) –MGED Ontology (DNA micro-array experiments) –BioPAX (metabolic pathways) The new editor and representation system for OBO ontologies (OBO-Edit) uses a subset of OWL

53 DL and Ontologies There is not just one “description logic”; DLs come in different varieties with different expressivity DLs are of value primarily to ontology developers, to see the implications of modeling decisions DLs also can be used by end users, when reasoning about systems that ontologies model

54 A thousand flowers are blooming! Ontologies are being developed by interested groups from every sector of academia, industry, and government Many of these ontologies have been proven to be extraordinarily useful to wide communities We finally have tools and representation languages that can enable us to create durable and maintainable ontologies with rich semantic content


Download ppt "Ontologies in Biomedicine Mark A. Musen Stanford University."

Similar presentations


Ads by Google