Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec ( locally Alan Rector BioHealth Informatics Group University of Manchester Copyright University of Manchester 2012 Licensed under Creative Commons Attribution Non-commercial Licence v3
Background ►Use cases that for terminology query languages ►Binding of ontologies to health records: HL7 & EN 13606/Archtypes Specifying “value sets” for fields Expanding SQL queries to include subsumed concepts ►Use of a common ontology in ICD ►Questions ►Theoretical Query expansion for querying data bases rather DL queries on A-Boxes ‣ Negation: “not necessarily” vs “necessarily not” ‣ Natural level of incompleteness – The frame problem and Grice Coping with representations in subsets of EL++ without disjointness ►Practical for ICD Are there flaws in SNOMED’s proposed query language? Are there alternatives? ‣ “build or borrow” – relation to standards Establishing a “reference representation” – what it should have been ‣ and a migration path ►Major issues in Query Language Spec ►Pragmatic requirements for ICD “Arbitrary selection of classes” Negation – exclusions, residual classes (“other”), with/without Using queries to cope with known errors in SNOMED Comprehensible rules for assigning cases to codes 2
Necessary background: ►SNOMED CT ►Binding to EHR ►Separation of Domain Ontology from Data schema ►HL7 and Archetypes ►Three component architecture for ICD11. ►Requirements and status of SNOMED Terminology Query Language (& its Ocean Informatics predecessor) 3
SNOMED CT (SCT) ►Large terminology formuated in an old description logic ►Roughly EL ++ without disjointness Logical content available in OWL syntax OWL version classifies with ELK or SNOROCKET in a few seconds ►~300K active classes; ~1.2M axioms MConvenient to extract modules for experiments ‣ Most tools get bogged down in bulk ►Role Group Translation into OWL not identical to KRSS original ►Idiosyncratic schema & many errors See papers on my website. ►Canonical form mechanism that is often used in lieu of classification A good topic for a separate discussion – not for today 4
Role Groups ►Purpose: group qualifiers (restrictions) together to distinguish ►Cancer originating in breast and metastatic to bone* Cancer & RoleGroup some (has_status some primary & hasSite some Breast) & RoleGroup some (has_status metastases & has_site some bone) ►Cancer originating in bone and metastatic to breast Cancer & RoleGroup some (has_status some metastases & hasSite some Breast) & RoleGroup some (has_status primary & has_site some bone) ►OWL translation pragmatic ►Role groups inserted everywhere for consistency. Native syntax omits them when not required 5 * Easy to understand example. Not literally correct for SNOMED
Major issue: What should a code represent? The “Condition” vs “Situation” debate (now largely resolved in favour of “situations” ►Does a code represent ►A “disorder”? “Condition” interpretation ►“having a disorder”? “Situation” interpretation ‣ “Situation of having a disorder” / ‣ “Patient having the disorder (at a given place and time as observed by| a given clinician)” 6
Example: Fracture of Radius & Ulna (Forearm) – a single code in ICD and SNOMED ►Nothing can be both a “fracture of radius” and “fracture of ulna” ►“Condition interpretation” ►A patient can simultaneously have both a “fracture of radius” and “fracture of ulna” ►“Situation interpretation” 7
The evidence ►Should responses to queries / rules for patients with “Fracture of Radius” include patients with “Fracture of the radius & ulna”? ►Most doctors say “yes” ►Hierarchies of SNOMED and ICD imply “yes”, i.e. “Fracture of Radius and Ulna” is a kind of “Fracture of Radius” ►Which is safer? 8
Implications in OWL 9
…but ►For the foreseable future: ►The hierarchies behave as if the codes represented situations ►Separate entities for the condition and the situation will not be created It is up to software and users to disambiguate or to manage as best they can ‣ One of the many legacy idiosyncracies 10
Most common use case: eHealth records 11 Data schema Ontology Are the dotted arrows: Class expressions? Queries? Other?
Ontology Data base Most common use case: eHealth records To determine what is legal for entries in the database
Consider retrieval from a database ►I want to retrieve all situations with hypertension during pregnancy… ►Pregnancy only recorded if kind of hypertension does not necessarily involve pregnancy, so we need the union of: All situations with kinds of hypertension necessarily involving pregnancy -e.g. SELECT ?situation, ?diagnosis from DiagnosticTable WHERE ?diagnosis IN {SubclassesOf Hypertension_necessarily_not_involves_pregnancy} All situations involving kinds of hypertension not necessarily involving pregnancy but with pregnancy recorded separately. -e.g. SELECT ?situation, ?diagnosis1 from DiagnosticTable WHERE ?diagnosis1 IN {SubclassesOf Hypertension_not_necessarily_involved_pregnancy} & EXISTS ?situation, ?diagnosis2 WHERE ?diagnosis2 IN {Subclasses of Pregnancy} ►In the terminology query language we need a query for: “Kinds of hypertension not necessarily involving X” “Kinds of hypertension necessarily involving X” ‣ (but that’s simple: “Subclasses of X” usually abbreviated “X”) “Kinds of hypertension necessarily not involving X” ‣ Straightforward if we had negation and disjointness, which we don’t 13
Consider specification of “value sets” ►Main cases ►Simple value sets not used elsewhere severity in {mild | moderate | severe} ►Complete hierarchies – all descendants diagnosis in {SubclassesOf Disorder} ►Ordered hierarchies and defaults, with specialisation “Reason for admission” in {Chest pain, Major trauma, Hypothermia,…} ►Arbitrary lists of one or more specific classes “Radiation of chest pain” in {left arm, shoulder, neck, axilla, abdomen} ‣ Exist elsewhere and used for many other purposes ►Union, intersection & difference of all of the above ►Other issues ►Declarative specification updating with changes in terminology; changes in data schema. ►Addition or removal of values by context (discussion for another day) 14
ICD and ICD-11 (“International Classificaiton of Diseases”) ►ICD is a classification NOT an ontology ►Used for national and international statistical returns ►Also for billing in many jurisdictions (including an extra layer of “Clinical Modifications” for each country) ►Lots of legacy idiosyncracies Designed to be printed in books & manuals ►Basic rule: Everything must add up to 100% at each level: therefore… ►Each code has only one parent ►Children of every code mutually exclusive and exhaustive ►Therefore… If a code fits logically in two places it must be “excluded” from all but one. Residual categories “other” & “not elsewhere classified” are required to make siblings exhaustive 15
SNOMED CT Common Ontology Subset ICD 11 Revision use case Multi-layer system 16 Foundation Component (signs, symptoms, causes, …) Ontology Component (kinds) MortalityMorbidtyPrimary Care … Linearizations
ICD 11 Revision ►Aims to provide a persistent structure for computer access ►Foundation component An “ontological core” shared with SNOMED A “Content model” of other information that folk want ‣ signs, symptoms, effects, relation to diability, … … … … … … ►“Linearizations” that look like the legacy system But can be generated from the Foundation Component and its annotations ‣ Coherent with Foundation Model (except for flagged legacy issues) ‣ A single tree of mutually exclusive and exhaustive subclasses at each level -Therefore must have -“Exclusions” -“Residudala categories” – “other” “not elsewhere classified” 17
Assumptions ►Snomed disorder codes to be treated as “situations” ►Conjunctions and negation “wrapped” in code ►Hierarcies consistent with “situation” interpretation ►Queries will be against the either asserted or inferred form of the ontology, but no reasoner will be used ►To be used with separate data schemas ►For lists of potential values ►For expanding queries for retrieval ►To be used with ICD “Linearizations” ►Specify meaning of each item in a linearization in terms of the ontology 18
Requirements listed for SNOMED Terminology Query Language ( locally ►Support ►Select class itself only, children, and/or descendants ►Set operations on results – union, intersection, difference ►Differentiate primitive and fully defined concepts; leaf concepts from others C SubclassOf … vs C EquivalentTo ….; no subclasses vs has subclasses; ‣ And possibly other syntactic selection/filtering ►Concepts asserted related to another given concept And possibly the reciprocals (‘used in’) ►String matching ►Use results of previous queries in nested ) queries and subsequent queries? ►Other ►Functional & all functions returning a set of concepts ►Easy to use, understand, and implement ►Questions ►What’s missing? How best to satisfy the requirements? 19
Examples ►/* This query expression returns concepts in the Clinical finding sub-hierarchy*/ ►DescendantsAndSelf( |Clinical finding|) ► /* This query expression returns all fully defined concepts in the Clinical finding sub-hierarchy /* ►FilterOnFullyDefined(DescendantsAndSelf( |Clinical finding|)) ►/* This query expression returns the first three levels of the Clinical findings hierarchy. */ ►ChildrenAndSelf( ChildrenAndSelf( ChildrenAndSelf( |Clinical finding|))) ►/* This query expression returns all concepts in the ‘Immune hypersensitivity reaction hierarchy that have an explicit ungrouped ‘Causative agent’ relationship defined to any target concept.* ►Intersection( DescendantsAndSelf( |Immune hypersensitivity reaction|), HasDirectRel( |Causitive agent|, All)) ► 20
Inferred & asserted Use of Role Groups ►/* When run against the inferred view, this query expression returns all concepts that contain a first group with a ‘Finding site’ of ‘Inguinal canal structure’ and an ‘Associated morphology’ of ‘Hermial opening’, and a second group with a ‘Finding site’ of ‘Abdominal cavity structure’ and an ‘Associated morphology’of ‘Hernia’. Concepts with inherited grouped relationships are also returned.*/ ►Intersection( HasGroupedRels( |Finding site|, |Inguinal canal structure|, |Associated morphology|, |Hermial opening|) HasGroupedRels( |Finding site|, |Abdominal cavity structure|, |Associated morphology|, |Hernia|)) 21
Example using descendants and has rel without role groups ►/* this query expression returns concepts describing infectious arthritis */ ►Intersection( Descendants( |Clinical finding|) HasRel( |Associated morphology|, DescendantsAndSelf( | Inflammation|)), HasRel( |Finding site|, DescendantsAndSelf( |Joint structure|)), HasRel( |Causative agent|, DescendantsAndSelf( |Organism|)) ) 22