Reference terminologies vs. Interface terminologies

Slides:



Advertisements
Similar presentations
Catalina Martínez-Costa, Stefan Schulz: Ontology-based reinterpretation of the SNOMED CT context model Ontology-based reinterpretation of the SNOMED CT.
Advertisements

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 System modeling 2.
SYSTEM ANALYSIS & DESIGN (DCT 2013)
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 8 Slide 1 System models.
Modified from Sommerville’s originalsSoftware Engineering, 7th edition. Chapter 8 Slide 1 System models.
1/31 CS 426 Senior Projects Chapter 1: What is UML? Chapter 2: What is UP? [Arlow and Neustadt, 2005] January 22, 2009.
1 Software Requirements Specification Lecture 14.
1 CS 426 Senior Projects Chapter 1: What is UML? Chapter 2: What is UP? [Arlow and Neustadt, 2002] January 26, 2006.
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
10 December, 2013 Katrin Heinze, Bundesbank CEN/WS XBRL CWA1: DPM Meta model CWA1Page 1.
What is UML? What is UP? [Arlow and Neustadt, 2005] January 23, 2014
SC32 WG2 Metadata Standards Tutorial Metadata Registries and Big Data WG2 N1945 June 9, 2014 Beijing, China.
Karen Gibson.  Significant investment in eHealth is underway  Clinical records: ◦ Not only a record for the author ◦ Essential to inform the next person.
Ontology Development in the Sciences Some Fundamental Considerations Ontolytics LLC Topics:  Possible uses of ontologies  Ontologies vs. terminologies.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 7 Slide 1 System models l Abstract descriptions of systems whose requirements are being.
Chapter 4 System Models A description of the various models that can be used to specify software systems.
System models Abstract descriptions of systems whose requirements are being analysed Abstract descriptions of systems whose requirements are being analysed.
NLP superficial and lexic level1 Superficial & Lexical level 1 Superficial level What is a word Lexical level Lexicons How to acquire lexical information.
A Case Study of ICD-11 Anatomy Value Set Extraction from SNOMED CT Guoqian Jiang, PhD ©2011 MFMER | slide-1 Division of Biomedical Statistics & Informatics,
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
School of Computing FACULTY OF ENGINEERING Developing a methodology for building small scale domain ontologies: HISO case study Ilaria Corda PhD student.
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
SNOMED CT – Distributed Content Management Stefan Schulz Content Committee April 2, 2009.
Tommie Curtis SAIC January 17, 2000 Open Forum on Metadata Registries Santa Fe, NM SDC JE-2023.
Chapter 7 System models.
System models l Abstract descriptions of systems whose requirements are being analysed.
Modified by Juan M. Gomez Software Engineering, 6th edition. Chapter 7 Slide 1 Chapter 7 System Models.
A School of Information Science, Federal University of Minas Gerais, Brazil b Medical University of Graz, Austria, c University Medical Center Freiburg,
FDT Foil no 1 On Methodology from Domain to System Descriptions by Rolv Bræk NTNU Workshop on Philosophy and Applicablitiy of Formal Languages Geneve 15.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
MedKAT Medical Knowledge Analysis Tool December 2009.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
Approach to building ontologies A high-level view Chris Wroe.
INTRODUCTION TO APPLIED LINGUISTICS
1 Alberta Health Services Capital Health Palliative Care Program Clinical Vocabulary Pilot Project Project Update Friday April 24, 2009 Dennis Lee & Francis.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Acquisition of Character Translation Rules for Supporting SNOMED CT Localizations Jose Antonio Miñarro-Giménez a Johannes Hellrich b Stefan Schulz a a.
Engineering, 7th edition. Chapter 8 Slide 1 System models.
Assessing SNOMED CT for Large Scale eHealth Deployments in the EU Workpackage 2- Building new Evidence Daniel Karlsson, Linköping University Stefan Schulz,
SNOMED CT and Surgical Pathology
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
The “Common Ontology” between ICD 11 and SNOMED CT
Using language technology for SNOMED CT localization
Systems Analysis and Design
What is UML? What is UP? [Arlow and Neustadt, 2005] October 5, 2017
SNOMED CT and Surgical Pathology
Use Case Model.
Can SNOMED CT be harmonized with an upper-level ontology?
CRF &SVM in Medication Extraction
Stefan Schulz Medical University of Graz (Austria)
Development of the Amphibian Anatomical Ontology
Education SIG Implementors Curriculum
How to bring both kinds of data together for biomarker research ?
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Computational and Statistical Methods for Corpus Analysis: Overview
WP1: D 1.3 Standards Framework Status June 25, 2015
Clinical Assessment Instruments and the LOINC/SNOMED CT Relationship
Abstract descriptions of systems whose requirements are being analysed
Architecture for ICD 11 and SNOMED CT Harmonization
4e édition du Symposium sur l'Ingénierie de l'Information Médicale
Part of the Multilingual Web-LT Program
Harmonizing SNOMED CT with BioTopLite
Information Retrieval
Lexical ambiguity in SNOMED CT
Stefan Schulz Medical University of Graz (Austria)
Clinical research seminar SNOMED CT University Hospital Basel, May 24th, 2018 Annotating clinical narratives with SNOMED CT The thorny way towards interoperability.
Assessing SNOMED CT for Large Scale eHealth Deployments in the EU
Attributes and Values Describing Entities.
Presentation transcript:

Reference terminologies vs. Interface terminologies Semantics@Roche – September 27th, 2017 Stefan Schulz Medical University of Graz (Austria) purl.org/steschu Reference terminologies vs. Interface terminologies

Common problems with specialised domain terminologies I need to automatically annotate textual content that uses an idiosyncratic sublanguage which abounds of abbreviations and errors (syntax, spelling) which uses highly ambiguous terms the meaning of which depends on local contexts is characterised by constantly new terms I need to find close-to-user terms for structured entry Current situation: Numerous quasi-standard terminologies None of them cover all my concepts Many terms I use are not covered (although concepts are available)

Popularity of Terms in Pubmed Preferred term (SNOMED CT) Count Synonym Primary malignant neoplasm of lung Lung cancer Bronchial carcinoma 120682 3452 Cerebrovascular accident 3819 Stroke 191559 Block dissection of cervical lymph nodes  1 Neck dissection 7512 Electrocardiographic procedure Electrocardiogram ECG 33670 55120 Backache 3489 Back pain 38132 Capillary blood specimens 32 Capillary blood samples 574

Two Aspects of Terminologies Normative   Codes + Labels (names) denote well-defined entities in a realm of discourse "Explanatory“ labels, e.g. "Primary malignant neoplasm of lung (disorder)". Scope notes and definitions explain meaning of concepts Meaning formalised by logic descriptions  formal ontology Descriptive Describes language as being used: lexicon "Lung cancer"

Reference terms – Interface terms Reference terms: Normative Primarily language-independent: described by textual and / or formal (ontological) definitions Labels univocally identify a concept, often not terms of choice in communication / documentation Interface terms: Descriptive Collection of human language terms / expressions used in (informal) written and oral communication Grounding by connection with reference terminologies Inherent ambiguity of interface terms, often not perceived Need to deal with short forms Many domain terminologies contain interface terms, but far from being exhaustive http://assess-ct.eu/fileadmin/assess_ct/final_brochure/assessct_final_brochure.pdf

Case study: German interface terminology for SNOMED CT Context: Information extraction from clinical narratives  CBmed IICCAB* (see poster) Limited resources, incremental approach Top-down: Modularization of original terms (split into noun phrases): Translating / finding synonyms of derived, highly repetitive phrases: "Magnetic resonance imaging" in 627 SNOMED terms "Second degree burn" in 166 SNOMED terms Acquisition of translations and synonyms by decreasing frequency (machine translation, automated synonym acquisition) Manual revision Bottom-up: Addition of terms from n-gram frequency lists from reference corpora *Schulz S. Innovative Nutzung von Informationen für klinische Versorgung und Biomarkerforschung. http://goo.gl/wHMedz

MUG-GIT: Erstellung einer deutschen Interface-terminologie für SNOMED CT (II) Rules Char Token translation Rules trans - rule acquisition lations Reference Clinical corpus (DE) corpus (DE) Chunker rule untranslated exec New tokens Token Translatable SCT trans - descriptions (EN) lations POS n - grams (EN) tags filter concepts with identical terms across translations n - gram Human curation translations • correct most frequent mis - n - grams (DE) translations • remove wrong Non - Translatable Phrase translations SCT descriptions generation All SCT descriptions (EN) • check POS tags rules • normalise adjectives • add synonyms Term reassembling heuristics Raw full terms Curated ngram (DE) Human Validation translations(DE) • dependent on use cases • e.g. input for official translation • e.g. starting point for crowdsourcing process for interface term generation • lexicon for NLP approaches

Case study: German interface terminology for SNOMED CT Core vocabulary: Constantly checked and enhanced by domain experts (medical students) Priorisation by Use cases Guidelines Avoidance of ambiguous entries: inclusion of composed terms (e.g. "delivery"  "drug delivery", "preterm delivery") Acronyms only in context: not "CT" but "CT guided" Inflection, compositions rules requires special markers (German) and rules for reassembly of terms Current state: ca. 2 Million Interface-Terms Automatically generated from core vocabulary with 92,500 n-grams, out of 85,400 English n-grams Benchmark: parallel corpus extracted from Medline: term coverage 33.1% for German vs. 55.4% for English

Core n-gram vocabulary

Automatically generated interface terminology

Crowdsourcing for terminology development Functionality: entry of new terms, commenting and validating existing terms Possible central data element : Interface Term – External Code "DM" - 81827009 |Diameter (qualifier value) Possible Attributes: Creator, creation type, date, (sub)domain, user group Max Muster, manual, 20170803, Dermatology Graz, Doctos Example annotation, e.g. "ein 3 cm im DM haltender Tumor" Validation/ commenting by other users John Doe, 20180912,  "Example incomprehensible – additional examples needed"

Short forms – Document preprocessing Ambiguous short forms not in dictionary Difficulty of maintenance Abundance of concurring readings High productivity Instead: Main assumption: short forms and expansions occur in the same corpus Automatically create N-gram model from specific reference corpus Replace short forms by most plausible expansions The same for other out of-lexicon words, e.g. misspellings If assumption fails: try Web mining

Example: Resolution of short forms "dilat. Kardiomyopathie, hochgr. red. EF" Word-n-gram model (30,000 discharge summaries) Web mining 1035 dilat. Kardiomyopathie 1442 dilatative Kardiomyopathie 7 hochgr. red. EF 4 hochgradig reduzierte EF

Example: Resolution of short forms "Pat. mit rez. HWI und VUR" Wort-N-gram Modell No clear expansions of "HWI" and "VUR" Web mining 381 Pat. mit 120 Patient mit 2 rez. 707 rezenten 468 rezidivierende

Example: Resolution of short forms Interpretation: Frequency Acronym-definition patterns Regular expressions created from short forms

Recommendations Need for interface terms not satisfied by domain terminologies Acquisition of interface terms Top-down (from reference terminologies) Bottom up (from corpora) Use collaborative approach (crowdsourcing) Avoid ambiguous terms in lexicon Use n-gram models (alternatively deep learning ?) for out-of-lexicon terms, extracted from related corpora Benefitting from typical local contexts in "similar" documents Resolution of short forms Correction of typos

Questions? Contact: stefan.schulz@medunigraz.at Semantics@Roche – September 27th, 2017 Stefan Schulz Medical University of Graz (Austria) purl.org/steschu Questions? Contact: stefan.schulz@medunigraz.at

Additional Slides

Example SNOMED CT http://browser.ihtsdotools.org

"Is SNOMED CT well suited as an European reference terminology ?" H2020: Assessing SNOMED CT for Large Scale eHealth Deployments in the EU "Is SNOMED CT well suited as an European reference terminology ?" Manual annotation of a corpus of clinical texts with SNOMED CT vs. UMLS-Extract Assessment: concept coverage term coverage Unterschiede SNOMED CT Schwedisch – Englisch Schwedisch: ein (Vorzugs-)Term pro concept Englisch: durchschnittlich 2,3 Terme pro concept (Vorzugsterme, Synonyme)

"Is SNOMED CT well suited as an European reference terminology ?" H2020: Assessing SNOMED CT for Large Scale eHealth Deployments in the EU Concept coverage "Is SNOMED CT well suited as an European reference terminology ?" Manual annotation of a corpus of clinical texts with SNOMED CT vs. UMLS-Extract Assessment: concept coverage term coverage Unterschiede SNOMED CT Schwedisch – Englisch Schwedisch: ein (Vorzugs-)Term pro concept Englisch: durchschnittlich 2,3 Terme pro concept (Vorzugsterme, Synonyme)

"Is SNOMED CT well suited as an European reference terminology ?" H2020: Assessing SNOMED CT for Large Scale eHealth Deployments in the EU Concept coverage "Is SNOMED CT well suited as an European reference terminology ?" Manual annotation of a corpus of clinical texts with SNOMED CT vs. UMLS-Extract Assessment: concept coverage term coverage Differences SNOMED CT Swedish – English Swedish: one term per concept English: on average 2.3 terms per concept (Preferred terms , synonyms) Term coverage

SNOMED CT als Terminology Label (Fully Specified Name) Code (Concept ID) (…) Interface terms (Synonyme) Text definition

SNOMED CT als formal Ontologie Logig Axioms Code (Concept ID) Taxonomy

Interface-Terms  Interface-Terminology Interface terms (Synonyms) http://browser.ihtsdotools.org

H2020: Assessing SNOMED CT for Large Scale eHealth Deployments in the EU "SNOMED CT should be part of an ecosystem of terminologies, including international aggregation terminologies (e.g., the WHO Family of Classifications), and user interface terminologies, which address multilingualism in Europe and clinical communication with multidisciplinary professional language and lay language" AT: aggregation terminology, RT: reference terminology http://assess-ct.eu/fileadmin/assess_ct/final_brochure/assessct_final_brochure.pdf