4e édition du Symposium sur l'Ingénierie de l'Information Médicale

Slides:



Advertisements
Similar presentations
Semantic Interoperability in Health Informatics: Lessons Learned 10 January 2008Semantic Interoperability in Health Informatics: Lessons Learned 1 Medical.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Configuration management
Catalina Martínez-Costa, Stefan Schulz: Ontology-based reinterpretation of the SNOMED CT context model Ontology-based reinterpretation of the SNOMED CT.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
10 December, 2013 Katrin Heinze, Bundesbank CEN/WS XBRL CWA1: DPM Meta model CWA1Page 1.
Ontology-based Convergence of Medical Terminologies: SNOMED CT and ICD-11 Institute for Medical Informatics, Statistics and Documentation, Medical University.
Karen Gibson.  Significant investment in eHealth is underway  Clinical records: ◦ Not only a record for the author ◦ Essential to inform the next person.
Developing an OWL-DL Ontology for Research and Care of Intracranial Aneurysms – Challenges and Limitations Holger Stenzhorn, Martin Boeker, Stefan Schulz,
Concept Model for observables, investigations, and observation results For the IHTSDO Observables Project Group and LOINC Coordination Project Revision.
A Case Study of ICD-11 Anatomy Value Set Extraction from SNOMED CT Guoqian Jiang, PhD ©2011 MFMER | slide-1 Division of Biomedical Statistics & Informatics,
SNOMED CT – Distributed Content Management Stefan Schulz Content Committee April 2, 2009.
1 Introduction to Software Engineering Lecture 1.
WHO – IHTSDO: SNOMED CT – ICD-11 coordination: Conditions vs. Situations Stefan Schulz IHTSDO conference Copenhagen, Apr 24, 2012.
Ontological Analysis, Ontological Commitment, and Epistemic Contexts Stefan Schulz WHO – IHTSDO Joint Advisory Group First Face-to-Face Meeting Heathrow,
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
FDT Foil no 1 On Methodology from Domain to System Descriptions by Rolv Bræk NTNU Workshop on Philosophy and Applicablitiy of Formal Languages Geneve 15.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Lab and Messaging CoP 11/29/2012. Agenda Agenda review Introduction to the Specimen Cross-Mapping Table (Specimen CMT) – Background – Expected uses –
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Lexically Suggest, Logically Define: QA of Qualifiers &
Approach to building ontologies A high-level view Chris Wroe.
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
Detection of underspecifications in SNOMED CT concept definitions using language processing 1 Federal Technical University of Paraná (UTFPR), Curitiba,
Ontologies for Terminologies, Knowledge Representation & Software: Benefits & Gaps (“Don’t make the tea”) (Only a part of Knowledge Representation) Alan.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
SNOMED CT Vendor Introduction 27 th October :30 (CET) Implementation Special Interest Group Tom Seabury IHTSDO.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
FROM ONE NOMENCLATURES TO ANOTHER… Drs. Sven Van Laere.
Essential SNOMED. Simplifying S-CT. Supporting integration with health
Oncology in SNOMED CT NCI Workshop The Role of Ontology in Big Cancer Data Session 3: Cancer big data and the Ontology of Disease Bethesda, Maryland May.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Frequent Criticisms too ambitious recreation of a Textbook of Medicine competing with SNOMED-CT replicates the work done elsewhere: DSM, ICPC, too academic.
A Proposed Approach to Binding SNOMED CT to HL7 FHIR Dr Linda Bird Senior Implementation Specialist.
Acquisition of Character Translation Rules for Supporting SNOMED CT Localizations Jose Antonio Miñarro-Giménez a Johannes Hellrich b Stefan Schulz a a.
Information Retrieval in Practice
WP4 Models and Contents Quality Assessment
Pepper modifying Sommerville's Book slides
Assessing SNOMED CT for Large Scale eHealth Deployments in the EU Workpackage 2- Building new Evidence Daniel Karlsson, Linköping University Stefan Schulz,
SNOMED CT and Surgical Pathology
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
The “Common Ontology” between ICD 11 and SNOMED CT
Using language technology for SNOMED CT localization
SNOMED CT Search & Data Entry
Scope The scope of this test is to make a preliminary assessment of the fitness of the Draft Observables and Investigation Model (Observables model)
Achieving Semantic Interoperability of Cancer Registries
SNOMED CT and Surgical Pathology
Can SNOMED CT be harmonized with an upper-level ontology?
Stefan Schulz Medical University of Graz (Austria)
The Systems Engineering Context
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Kin Wah Fung, MD, MS, MA National Library of Medicine, NIH
Networking and Health Information Exchange
Taxonomies, Lexicons and Organizing Knowledge
Social Knowledge Mining
Lecture Software Process Definition and Management Chapter 3: Descriptive Process Models Dr. Jürgen Münch Fall
Architecture for ICD 11 and SNOMED CT Harmonization
Tools of Software Development
Harmonizing SNOMED CT with BioTopLite
Lexical ambiguity in SNOMED CT
Stefan Schulz Medical University of Graz (Austria)
Clinical research seminar SNOMED CT University Hospital Basel, May 24th, 2018 Annotating clinical narratives with SNOMED CT The thorny way towards interoperability.
Reference terminologies vs. Interface terminologies
Assessing SNOMED CT for Large Scale eHealth Deployments in the EU
ece 627 intelligent web: ontology and beyond
Stefan SCHULZ IMBI, University Medical Center, Freiburg, Germany
ONTOMERGE Ontology translations by merging ontologies Paper: Ontology Translation on the Semantic Web by Dejing Dou, Drew McDermott and Peishen Qi 2003.
BFO meets critics Workshop organised by Barry Smith
Presentation transcript:

4e édition du Symposium sur l'Ingénierie de l'Information Médicale Les 23 et 24 Novembre 2017 à Toulouse Keynote address: Annotating clinical narratives with SNOMED CT Aspects of Reliability and Semantic Interoperability Stefan Schulz Medical University of Graz (Austria) purl.org/steschu

Typical information engineering workflow Data Acquisition Represen- tation Reasoning Result Data data is acquired, represented and reasoned about

Typical information engineering workflow Reasoning A Result A Data Acquisition Represen- tation Data AI research has much focused on developing and assessing reasoning methods Reasoning B Result B

Typical information engineering workflow Represen- tation A Reasoning Result A Data Acquisition Data and representation appproaches Represen- tation B Reasoning Result B

Typical information engineering workflow Represen- tation A Reasoning Result A reliability validity Data Acquisition Data Represen- tation B Reasoning Result B

Focus on data acquisition for information engineering Represen- tation Reasoning Result A DA Data Acquisition B Represen- tation Reasoning Result B DB

Data reliability  Data interoperability high Data Acquisition A DA=DB DA DA DB Data Acquisition B DA DB DA low

Data reliability  Data interoperability Acquisition A DA Query: , Data Acquisition B DA

Focus of the talk Analysis of coded extracts from clinical texts Inter-annotator agreement (reliability) Reasons for inter-annotator disagreement Discussion: how to improve agreement Assumption: better data reliability -> better semantic interoperability of clinical data

Annotating clinical narratives with SNOMED CT

Annotating clinical narratives with SNOMED CT Coding observation map reality observed standardised representation  Vocabularies Ontologies Annotation … intermediate representation, which is, on its own, result of observation and interpretation of reality observation, interpretation Resektat nach Whipple": Ein noch nicht eröffnetes Resektat, bestehend aus einem distalen Magen mit einer kleinen Kurvaturlänge von 9,5 cm und einer großen Kurvaturlänge von 13,5 cm, sowei einem duodenalen Anteil von 14 cm Länge. 2 cm aboral des Pylorus zeigt die Dünndarmwandung eine sanduhrartige … map standardised representation symbols  intermediate representation

Annotating clinical narratives with SNOMED CT eHealth standard, maintained by transnational SDO Huge clinical reference terminology ~300,000 "concepts" representable as OWL EL preferred terms and synonyms in several languages SNOMED CT emphasise "definitions" (appendicitis is defined as inflammatory disorder with finding site appendix) covers disorders, procedures, body parts, substances, devices, organisms, qualities… (quasi-) ontological definitional and qualifying axioms multiple hierarchies

Annotation: Sources of complexity Resektat nach Whipple": Ein noch nicht eröffnetes Resektat, bestehend aus einem distalen Magen mit einer kleinen Kurvaturlänge von 9,5 cm und einer großen Kurvaturlänge von 13,5 cm, sowei einem duodenalen Anteil von 14 cm Länge. 2 cm aboral des Pylorus zeigt die Dünndarmwandung eine sanduhrartige Stenose. Im Magen- und Duodenallumen reichlich zähflüssiger Schleim, sanguinolent; SNOMED CT map Ontology classes relations logical constructors axioms Terminology concepts preferred terms synonyms definitions Human language words, multiword terms syntactic structures relations at various levels Clinical language Compact Paragrammatical Context-dependent Best text span to annotate? Naïve or analytic annotation? SNOMED CT Ill-defined concepts Similar concepts Pre-coordination vs. post-coordination Complex annotations (> 1 concept / term)

Examples Clinical text SNOMED CT concepts (FSNs) ? ? ? 'Duodenal structure (body structure)' "… the duodenum . The mucosa is…" 'Mucous membrane structure (body structure)' ? 'Duodenal mucous membrane structure (body structure)' "…Hemorrhagic shock after RTA … " 'Traffic accident on public road (event)' ? 'Traffic accident on public road (event)', 'Renal tubular acidosis (disorder)' ? 'Traffic accident on public road (event)' or 'Renal tubular acidosis (disorder)' 'Suspected dengue (situation)' "…travel history of suspected dengue…" 'Suspected (qualifier value)' 'Dengue (disorder)'

Coding / Annotation guidelines Examples: German coding guidelines for ICD and OPS, 171 pages Using SNOMED CT in CDA models: 147 pages CHEMDNER-patents: annotation of chemical entities in patent corpus: annotation manual 30 pages CRAFT Concept Annotation guidelines: 47 pages Gene Ontology Annotation conventions: 7 pages Complex rule sets, requiring intensive training http://www.dkgev.de/media/file/21502.Deutsche_Kodierrichtlinien_Version_2016.pdf http://www.snomed.org/resource/resource/249 http://www.biocreative.org/media/store/files/2015/cemp_patent_guidelines_v1.pdf http://bionlp-corpora.sourceforge.net/CRAFT/guidelines/CRAFT_concept_annotation_guidelines.pdf http://geneontology.org/page/go-annotation-conventions

Annotation experiments in ASSESS-CT

Annotation of clinical narratives Annotation experiments in ASSESS-CT EU support action on the fitness of SNOMED CT as a EU core reference terminology http://assess-ct.eu/fileadmin/assess_ct/final_brochure/assessct_final_brochure.pdf

Annotation of clinical narratives Annotation experiments in ASSESS-CT EU support action on the fitness of SNOMED CT as a EU core reference terminology Domain experts annotate 60 samples of clinical documents with SNOMED CT paid domain experts… http://assess-ct.eu/fileadmin/assess_ct/final_brochure/assessct_final_brochure.pdf

Annotation of clinical narratives Annotation experiments in ASSESS-CT EU support action on the fitness of SNOMED CT as a EU core reference terminology Domain experts annotate 60 samples of clinical documents with SNOMED CT 1/3 of samples annotated twice Support: Webinars, annotation guidelines http://assess-ct.eu/fileadmin/assess_ct/final_brochure/assessct_final_brochure.pdf

Principal quantitative results (English) Concept coverage [95% CI] SNOMED CT Text annotations – English .86 [.82-.88] Term coverage [95% CI] SNOMED CT Text annotations – English .68 [.64; .70] Inter annotator agreement Krippendorff's Alpha [95% CI] SNOMED CT Text annotations .37 [.33-.41] has not scored that well… (similar results with alternative annotation task, using non-SNOMED UMLS extract) Krippendorff, Klaus (2013). Content analysis: An introduction to its methodology, 3rd edition. Thousand Oaks, CA: Sage.

Agreement map: SNOMED annotations Therefore we decided to dig more deeply into the annotation results green: agreement – yellow: only annotated by one coder – red: disagreement - white no annotations

Systematic error analysis to see what really happend

Systematic error analysis Creation of gold standard for SNOMED CT 20 English text samples annotated twice 208 NPs Analysis of English SNOMED CT annotations by two additional terminology experts Consensus finding, according to pre-established annotation guidelines Inspection, analysis and classification of text annotation disagreements Presentation of some disagreement cases for SNOMED CT

Reasons for disagreement

Human issues Lack of domain knowledge / carelessness Tokens Annotator #1 Annotator #2 Gold standard "IV" 'Structure of abductor hallucis muscle (body structure)' 'Abducens nerve structure (body structure) ' 'Abducens nerve structure (body structure)' Retrieval error (synonym not recognised) Tokens Annotator #1 Annotator #2 Gold standard "Glibenclamide" 'Glyburide (substance)' –

Ontology issues (I) Logical polysemy ("dot categories")* Tokens Annotator #1 Annotator #2 Gold standard 'Lymphoma" 'Malignant lymphoma (disorder)' 'Malignant lymphoma - category (morphologic abnormality)' 'Malignant lymphoma (disorder)' *Alexandra Arapinis, Laure Vieu: A plea for complex categories in ontologies. Applied Ontology 10(3-4): 285-296 (2015)

Ontological issues (II) Incomplete definitions Tokens Annotator #1 Annotator #2 Gold standard "Motor: normal bulk and tone" 'Skeletal muscle structure (body structure)' 'Muscle finding (finding)' 'Skeletal muscle normal (finding)' 'Normal (qualifier value)'

Ontological issues (II) Incomplete definitions Tokens Annotator #1 Annotator #2 Gold standard "Motor: normal bulk and tone" 'Skeletal muscle structure (body structure)' 'Muscle finding (finding)' 'Skeletal muscle normal (finding)' 'Normal (qualifier value)' Tokens Annotator #1 Annotator #2 Gold standard "Former smoker" 'In the past (qualifier value)' 'History of (contextual qualifier) (qualifier value)' 'Ex-smoker (finding)' 'Smoker (finding)'

Ontological issues (III) Navigational concepts Fuzzy, undefined qualifiers Tokens Annotator #1 Annotator #2 Gold standard "palpebral fissure" Finding of measures of palpebral fissure (finding) Structure of palpebral fissure (body structure) Measure of palpebral fissure (observable entity) Tokens Annotator #1 Annotator #2 Gold standard "Significant bleeding" 'Significant (qualifier value)' 'Severe (severity modifier) (qualifier value)' 'Moderate (severity modifier) (qualifier value)' 'Bleeding (finding)'

Interface term (synonym) issues Tokens Annotator #1 Annotator #2 Gold standard "Blood extra- vacation" 'Blood (substance)' 'Hemorrhage (morphologic abnormality)' 'Hemorrhage (morphologic abnormality)' 'Extravasation (morphologic abnormality)' "extravasation of blood"

Interface term (synonym) issues Tokens Annotator #1 Annotator #2 Gold standard "Blood extra- vasation" 'Blood (substance)' 'Hemorrhage (morphologic abnormality)' 'Hemorrhage (morphologic abnormality)' 'Extravasation (morphologic abnormality)' "extravasation of blood" Tokens Annotator #1 Annotator #2 Gold standard "anxious" 'Anxiety (finding)' 'Worried (finding)' "anxious cognitions"

Prevention and remediation of annotation disagreements

Prevention and remediation of annotation disagreements Rationales: More principled SNOMED CT coding of EHR content More principled binding of SNOMED CT codes to clinical models Consistent manual annotations for training corpora and reference standards Improvement of performance of NLP-based annotations

Preventive measures

Prevention: annotation processes Training with continuous feedback Early detection of inter annotator disagreement triggers guideline enforcement / revision Tooling Optimised concept retrieval (fuzzy, substring, synonyms) Guideline enforcement by appropriate tools Postcoordination support (complex syntactic expressions instead of simple concept grouping)

Prevention: improve SNOMED CT quality Fill gaps Add missing equivalence axioms Self-explaining labels, text definitions where necessary Preference rules to manage polysemy Strengthen ontological foundations Upper-level ontology alignment Better distinction between domain entities and information entities Overhaul problematic subhierarchies, especially qualifiers

Prevention: improve content maintenance Data-driven terminology maintenance Harvest notorious disagreements between annotations from clinical datasets Detect imbalances by analysing concept frequency and co-occurrence between comparable institutions Community processes: crowdsourcing of interface terms by languages, dialects, specialties, user groups (ASSESS-CT: interface terminologies to be maintained separately from reference terminologies)

Remediation of annotation disagreements

Remediation of annotation disagreements Exploit ontological dependencies / implications Concept A Concept B Dependency 'Mast cell neoplasm (disorder)' 'Mast cell neoplasm (morphologic abnormality)' A subclassOf AssociatedMorphology some B 'Isosorbide dinitrate' (product)' 'Isosorbide dinitrate (substance)' A subclassOf HasActiveIngredient some B 'Palpation (procedure)' 'Palpation - action (qualifier value)' A subclassOf Method some B 'Blood pressure taking (procedure)' 'Blood pressure (observable entity)' A subclassOf hasOutcome some B 'Increased size (finding)' 'Increased (qualifier value)' A subclassOf isBearerOf some B 'Finding of heart rate (finding)' 'Heart rate (observable entity)' A subclassOf Interprets some B

Experiment Gold standard expansion: Step 1: include concepts linked by attributive relations: A subclassOf Rel some B Step 2: include additional first-level taxonomic relations: A subclassOf B only insignificant improvement possibly due to missing relations in SNOMED CT (see "former smoker" and "skeletal muscle normal" examples) just a side issue… requires more investigation Language of text sample Gold standard expansion F measure English no expansion 0.28 expansion step 1 expansion step 2 0.29

Conclusions

Poor agreement hampers SNOMED CT use: Conclusions Poor agreement hampers SNOMED CT use: Clinical decision support, cohort building, content retrieval, summarisation, analytics,… (but not specific for SNOMED CT  ACCESS CT) Prevention & Remediation: Education, tooling, guidelines Large-scale SNOMED CT content and structure improvement High coverage local interface terminologies, representing real language of clinicians

Outlook "Learning systems" for improvement terminology content / structure / tooling. "Clinical big data": pooling of non-re-identifiable annotations from multiple institutions Community efforts for interface terminology creation and maintenance Post processing of SNOMED CT annotations: Stream of codes  text knowledge graph *Martínez-Costa, Kalra, Schulz. Semantic enrichment of clinical models towards semantic interoperability. JAMIA 2015 May;22(3):565-76

Thanks for your attention Slides will be made accessible at purl.org/steschu Acknowledgements: ASSESS CT team: Jose Antonio Miñarro-Giménez, Catalina Martínez-Costa, Daniel Karlsson, Kirstine Rosenbeck Gøeg, Kornél Markó, Benny Van Bruwaene, Ronald Cornet, Marie-Christine Jaulent, Päivi Hämäläinen, Heike Dewenter, Reza Fathollah Nejad, Sylvia Thun, Veli Stroetmann, Dipak Kalra Contact: stefan.schulz@medunigraz.at

Ecosystem of semantic assets Process Models Information Models Terminologies Guideline Models

Information Models Guideline Models Reference Terminologies …describe and standardize a neutral, language-independent sense The meaning of domain terms The properties of the objects that these terms denote Representational units are commonly called “concepts” RTs enhanced by formal descriptions = "Ontologies" Information Models Reference Terminologies Guideline Models

Terminologies (Classifications) Systems of non-overlapping classes in single hierarchies, for data aggregation and ordering. aka classifications, e.g. the WHO classifications Typically used for health statistics and reimbursement AT3 Core Reference Terminology Information Models AT2 AT1 AT4 Aggregation Terminologies (Classifications) Guideline Models

Information Models Guideline Models Core Reference Terminology AT3 AT2 Reference and aggregation terminologies represent / organize the domain They are not primarily representations of language They use human language labels as a means to univocally describe the entities they denote, independently of the language actually used in human communication Systems of non-overlapping classes in single hierarchies, for data aggregation and ordering. aka classifications, e.g. the WHO classifications Typically used for health statistics and reimbursement AT3 Core Reference Terminology Information Models AT2 AT1 AT4 Guideline Models

Information Models Guideline Models User Interface Terminology Collections of terms used in written and oral communication within a group of users Terms often ambiguous. Entries in user interface terminologies to be further specified by language, dialect, time, sub(domain), user group. User Interface Terminology (language specific) Information Models Guideline Models

"Ca" "Cálcio" "Ca" "Câncer" "Carcinoma" User Interface Terminology (e.g. Portuguese) Reference Terminology "Ca" "Cálcio" [chemistry] 5540006 | Calcium (substance) | "Ca" "Câncer" "Carcinoma" 68453008 | Carcinoma (morphologic abnormality) | [oncology]

User Interface Terminology Process Models AT3 RT4 Core Reference Terminology Information Models AT2 RT1 AT1 RT2 AT4 RT3 Guideline Models

MUG-GIT: Creation of German Interface Terminologie for SNOMED CT Rules Char Token translation Rules trans - rule acquisition lations Reference Clinical corpus (DE) corpus (DE) Chunker rule untranslated exec New tokens Token Translatable SCT trans - descriptions (EN) lations POS n - grams (EN) tags filter concepts with identical terms across translations n - gram Human curation translations • correct most frequent mis - n - grams (DE) translations • remove wrong Non - Translatable Phrase translations SCT descriptions generation All SCT descriptions (EN) • check POS tags rules • normalise adjectives • add synonyms Term reassembling heuristics Raw full terms Curated ngram (DE) Human Validation translations(DE) • dependent on use cases • e.g. input for official translation • e.g. starting point for crowdsourcing process for interface term generation • lexicon for NLP approaches

ngram – core vocabulary

Machine-generated Interface terms