Alan Rector, Luigi Iannone, Robert Stevens

Slides:



Advertisements
Similar presentations
Semantic Interoperability in Health Informatics: Lessons Learned 10 January 2008Semantic Interoperability in Health Informatics: Lessons Learned 1 Medical.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
The ORCHID project Dr Ian Gaywood, NUH Dr Ira Pande, NUH Professor John Chelsom, City University London.
Quality Assurance of the Content of a Large DL-based Terminology using Mixed Lexical and Semantic Criteria: Experience with SNOMED CT Alan Rector, Luigi.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
An Introduction to RDF(S) and a Quick Tour of OWL
Catalina Martínez-Costa, Stefan Schulz: Ontology-based reinterpretation of the SNOMED CT context model Ontology-based reinterpretation of the SNOMED CT.
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
Implementing a Clinical Terminology David Crook Subset Development Project Manager SNOMED in Structured electronic Records Programme NHS Connecting for.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
GO Ontology Editing Workshop: Using Protege and OWL Hinxton Jan 2012.
Editing Description Logic Ontologies with the Protege OWL Plugin.
PRD_codes_KS_ ppx. UK Renal SNOMED CT subset incorporating the new ERA-EDTA PRDs and how we can use them in Rare Disease Groups Keith Simpson.
C OURSE : D ISCRETE STRUCTURE CODE : ICS 252 Lecturer: Shamiel Hashim 1 lecturer:Shamiel Hashim second semester Prepared by: amani Omer.
The IHTSDO Workbench A Terminology Management Tool John Gutai, IHTSDO May 2011 For OHT.
Chapter 5 – Benefits of Physical Activity State Standards: 2,3,6
1 Betsy L. Humphreys, MLS Betsy L. Humphreys, MLS National Library of Medicine National Library of Medicine National Institutes of Health National Institutes.
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
Karen Gibson.  Significant investment in eHealth is underway  Clinical records: ◦ Not only a record for the author ◦ Essential to inform the next person.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
Protege OWL Plugin Short Tutorial. OWL Usage The world wide web is a natural application area of ontologies, because ontologies could be used to describe.
A Case Study of ICD-11 Anatomy Value Set Extraction from SNOMED CT Guoqian Jiang, PhD ©2011 MFMER | slide-1 Division of Biomedical Statistics & Informatics,
SNOMED CT – Distributed Content Management Stefan Schulz Content Committee April 2, 2009.
Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec.
Chapter 3 Benefits of Physical Activity.
BioHealth Informatics Group A Practical Introduction to Ontologies & OWL Session 2: Defined Classes and Additional Modelling Constructs in OWL Nick Drummond.
Managing multiple client systems and building a shared interoperability vision in the Health Sector Dennis Wollersheim Health Information Management.
Coastal Atlas Interoperability - Ontologies (Advanced topics that we did not get to in detail) Luis Bermudez Stephanie Watson Marine Metadata Interoperability.
Semantic Web - an introduction By Daniel Wu (danielwujr)
For Wednesday No reading Homework: –Chapter 18, exercise 6.
The ICPS: A taxonomy, a classification, an ontology or an information model? Stefan SCHULZ IMBI, University Medical Center, Freiburg, Germany.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Based on “A Practical Introduction to Ontologies & OWL” © 2005, The University of Manchester A Practical Introduction to Ontologies & OWL Session 2: Defined.
OilEd An Introduction to OilEd Sean Bechhofer. Topics we will discuss Basic OilEd use –Defining Classes, Properties and Individuals in an Ontology –This.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Semantic Web BY: Josh Rachner and Julio Pena. What is the Semantic Web? The semantic web is a part of the world wide web that allows data to be better.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Lexically Suggest, Logically Define: QA of Qualifiers &
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Approach to building ontologies A high-level view Chris Wroe.
Ontologies for Terminologies, Knowledge Representation & Software: Benefits & Gaps (“Don’t make the tea”) (Only a part of Knowledge Representation) Alan.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
Finding Content in SNOMED CT Jo Oakes – Knowledge & Information Manager.
1 Letting the classifier check your intuitions Existentials, Universals, & other logical variants Some, Only, Not, And, Or, etc. Lab exercise - 3b Alan.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
Funding and support for this project has been provided by the State of Washington, Department of Labor & Industries, Safety & Health Investment Projects.
Chapter 3 Benefits of Physical Activity. 3.1 Health and Wellness Benefits.
The Quality Gateway Chapter 11. The Quality Gateway.
Proof And Strategies Chapter 2. Lecturer: Amani Mahajoub Omer Department of Computer Science and Software Engineering Discrete Structures Definition Discrete.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Dr Linda Bird, IHTSDO Implementation Specialist
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Representation of Hypersensitivity, Allergy and adverse reactions in SNOMED CT Bruce Goldberg, MD, PhD.
Common MBSE Modeling Questions and How Ontology Helps
Integrating SysML with OWL (or other logic based formalisms)
Solving the Health IT Interoperability Puzzle with SNOMED CT
Data Quality Webinar for General Practice
Can SNOMED CT be harmonized with an upper-level ontology?
CIMI Semantics Roundup
February 17, 2017 Bruce Goldberg, MD, PhD
Lab exercise - 3a Alan Rector & colleagues
Knowledge Representation
Architecture for ICD 11 and SNOMED CT Harmonization
ece 720 intelligent web: ontology and beyond
Harmonizing SNOMED CT with BioTopLite
Lexical ambiguity in SNOMED CT
ece 627 intelligent web: ontology and beyond
Stefan SCHULZ IMBI, University Medical Center, Freiburg, Germany
Spreadsheets, Modelling & Databases
Test-Driven Ontology Development in Protégé
Sustainability and scalability Croatian Institute of Public Health
Presentation transcript:

Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk Quality Assurance of the Content of a Large DL-based Terminology using Mixed Lexical and Semantic Criteria: Experience with SNOMED CT Alan Rector, Luigi Iannone, Robert Stevens rector@cs.manchester.ac.uk

“A report from the trenches” SNOMED-CT - mandated terminology for electronic patient records in UK, US, & worldwide aspirations The result of a merger of two other systems SNOMED and Clinical Terms v3 Long history with much opportunity for error Expressed in a Description Logic and now available in OWL subset of EL++ without disjoint axioms Has been resistant to independent analysis although many known problems Despite several global QA attempts based on lexical criteria that have identified errors without explaining them

It’s very big - and classification matters ~400,000 Concepts/Classes; >1,000,000 axioms Much of richness only evident in classified for m Most errors only present in classified form stated Classified

…and some classification horrendously complicated (Skin of Ankle)

An experiment of opportunity The opportunities Tried to use SNOMED for Commercial Collaboration on Clinical Systems Tried to use SNOMED as contribution to WHO’s revsion of International Classification of Diseases (ICD-11) Problems with both Therefore, experiment if QA & repair were possible Conventional wisdom said that it was not However, we had new resources Core Problem List Subset from NLM (8500 most used classes) Software to extract “modules” SNOROCKET Classifier for EL++ 4-8GB machines

Step 1: Cut it down & find a classifier Find a subset UMLS Core Problem List subset - 8500 most used disease concepts Collected by US National Library of Medicine by combining sets from 6 major institutions. Extract a “Module” (built into OWL API v3) Use core subset as “signature” Guaranteed that all inferences amongst the classes in “signature” in whole will hold in module 35,000 concepts - including most of anatomy Find a classifier that can cope - at least two for checking SNOROCKET (EL++) polynomial time subset of OWL (30 sec) Pellet 2.1 (200 sec) FaCT++ (250 sec)

Step 2: Pick some areas of interest to clinicians: some with anomalies already spotted Myocardial Infarction (Heart attack) Should be a kind of Ischemic Heart Disease, but wasn’t Hypertension (High blood pressure) Odd to find it a kind of Soft Tissue disorder Diabetes Odd to find it as a Disorder of the Abdomen Allergies Odd to find some but not all autoimmune disorders classified as Allergies. …

Look at classification: Most initial errors spotted looking upwards Look up hierarchy (with OWLViz) Let clinicians find important concepts and check them Face validity and then look up the hierarchy Check any anomalies against the complete SNOMED in standard browser Guard against artifacts in various transformations Trace anomalies to their root Decide which links to add or break Decide how to break them Edit, classify and check Hierarchies Usages

OwlViz Upwards for Hypertension

And check for the desired result

Check in standard browser in full SNOMED (snob.eggbird.eu/)

Examine definition & formulate solution Disorder of blood vessel that (Finding site some Systemic arterial structure) and (Has definitional manifestation some Increased blood pressure)) Disorder of blood vessel that (Finding site some Cardiovascular system structure) and (Has definitional manifestation some Increased blood pressure)

Then check usages for unwanted results - anything that should relate to arteries instead of Cardiovascular system?

Also look down hierarchy: Combine lexical & semantic search Hard to spot what is missing Hypertensive disorders included some complications as well as kinds of hypertension. Did it contain them all? Use OPPL combining lexical, owl semantics & queries ?C:CLASS=MATCH(“.*[Hh]ypertensive.*”)  lexical SELECT ?C SubClassOf Thing  open world OWL semantics WHERE FAIL ?C SubClassOf “Hypertensive disorder”  closed world query BEGIN ADD ?C SubClassOf Candidate_hypertensive END;  action Classify and look at odd cases …

Classify and look at odd cases

Look for regularities Of hypertensive complications 1 linked to Hypertensive disorder by property due to 1 linked to Hypertensive disorder by property associated with 2 are subclasses of Hypertensive disorder 2 not linked at all No class for Hypertensive complication Although there is a class for Diabetic complication Regularise Create classes for Hypertension, Hypertensive complication and Hypertension AND/OR Hypertensive complication Edit all complications to schema: Disorder due to some Hypertension

Which concept should carry the old ID? Look at usages of Hypertensive disorder All fit Hypertension; none fit Hypertensive complication Therefore, label original ID for Hypertensive disorder as Hypertension New Hierarchy: Hypertension AND/OR Hypertensive complication  new ID/concept Hypertension  old ID/concept …kinds of hypertension Hypertensive complication…  new ID/concept … kinds of hypertensive complication

Looking down hierarchy: Analysis by categorisation Even short alphabetic lists are difficult to check Break it up logically ?

Always trace errors to root to fix mish mash modelling Simple error The axiom that Skin is a kind of Soft tissue was omitted Therefore Injuries to skin are not listed as kinds of Soft tissue injuries Authors have noticed some cases and tried to compensate Cut of skin of foot is a kind of soft tissue injury, but Cut of the skin of lower limb was NOT a soft tissue injury One axiom to fix it all: Skin subClassOf SoftTissue: And then a script to find the redundant axioms

Trace errors to their roots: Incomplete modelling: Example Why is Myocardial Infarction not a kind of Ischemic Heart Disease? Ischemia = “lack of blood supply” Myocardium = “Heart muscle” Infarction not fully defined in SNOMED. References say… “Tissue death due to ischemia” Ischemic heart disease not fully defined SNOMED, Refs say… Heart disease due to ischemia Ischemic disorder does not exist in SNOMED, Natural closure… Disorder due to some Ischemia - NB always involves Cardiovascular system Add definitions and Myocardial infarction classified correctly Also discover a long list of Ischemic disease that have not been classified as cardiovascular Check lexically for other uses of “ischemic” None found in this subset

Error in schema for anatomy: Conflates branches with parts Example Injury to artery of the ankle is located in the pelvis and in the abdomen (as well as the ankle)! Extends to all nerves & blood vessels Requires a generic change Simplest involves about 20 axioms for arteries

Overgeneralisation – explains many arguments The dictionary says “Neuropathy” is a disease of nerves But in practice it is a “dysfunction” of nerves Doctors don’t consider tumors or injuries to nerves to be neuropathies SNOMED often does not distinguish structural and functional disorders Needs a consistent pattern:

Naming issues All SNOMED terms have at least two names “Fully qualified name” & “Preferred name” “Fully qualified names” should be consistent but… Example - conflicting names “Immune hypersensitivity disorder (disorder) = “Allergic disorder” Structure nodes in SEP triples “Structure of X”, “X Structure”, X Leads to “Swelling of gums” is kind of “Swelling of face”

Doing everything in a separate module (insofar as possible) Perform queries as “probes” Perform queries as “probes” Keep changes in Modules Compromise: System of diffs and merges

Summary: QA of a large DL-based ontology is possible! Find a useful subset and use it as signature to extract a manageable module Start with things that are important to your experts Look upwards rather than downwards in the first instance Follow up analogies and patterns When looking downwards enrich categorization to reduce noise Combine lexical and semantic techniques Analysis by synthesis - test alternative potential changes with classifier as far as possible in a separate module; scripting where possible Tooling gaps / weaknesses Scripting tools need work Combining filtering with imports Diffs & change management – needed but don’t enough Log everything!