Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.

Slides:



Advertisements
Similar presentations
2017/3/25 Test Case Upgrade from “Test Case-Training Material v1.4.ppt” of Testing basics Authors: NganVK Version: 1.4 Last Update: Dec-2005.
Advertisements

A Probabilistic Representation of Systemic Functional Grammar Robert Munro Department of Linguistics, SOAS, University of London.
OLIF V2 Gr. Thurmair April OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status.
Semantic Interoperability in Health Informatics: Lessons Learned 10 January 2008Semantic Interoperability in Health Informatics: Lessons Learned 1 Medical.
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
1 What Is The Next Step? - A review of the alignment results Liru Zhang, Katia Forêt & Darlene Bolig Delaware Department of Education 2004 CCSSO Large-Scale.
An Ontology Creation Methodology: A Phased Approach
WP 10 Multilingual Access Philipp Daumke, Stefan Schulz.
Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,
PubMed Searching: Automatic Term Mapping (ATM) PubMed for Trainers, Spring 2014 U.S. National Library of Medicine (NLM) and NLM Training Center.
Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,
SNOMED Core Structures 2 nd AAHA Software Vendors Summit – April 21, 2009.
10th Conference on Artificial Intelligence in Medicine (AIME 05) July 2005 Aberdeen, Scotland Building Medical Ontologies based on Terminology.
Exploring the Effectiveness of Lexical Ontologies for Modeling Temporal Relations with Markov Logic Eun Y. Ha, Alok Baikadi, Carlyle Licata, Bradford Mott,
LinkEHR Studio: a tool for archetype-based data transformations David Moner Biomedical Informatics Group (IBIME) ITACA Institute, Technical.
RSNA Reporting Templates: Representation of Findings in CDA R2 Instances Helmut Koenig Co-Chair DICOM WG20.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Stefan Schulz Philipp Daumke Holger Stenzhorn Incremental Semantic Enrichment of Narrative Content in Electronic Health Records Institute of Medical Biometry.
Probabilistic Detection of Context-Sensitive Spelling Errors Johnny Bigert Royal Institute of Technology, Sweden
Retrieval of Similar Electronic Health Records using UMLS Concept Graphs Laura Plaza and Alberto Díaz Universidad Complutense de Madrid.
HISA ltd. Biography proforma MEDINFO Lygon Street, Brunswick East 3057 Australia Presenter Name: Stefan Schulz Country:1. Germany, 2. Brazil Qualification(s):
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus HISA ltd. Biography proforma MEDINFO Lygon Street, Brunswick East.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Multilingual Access to Biomedical Documents Stefan Schulz, Philipp Daumke Institute of Medical Biometry and Medical Informatics University Medical Center.
Stefan Schulz, Thorsten Seddig, Susanne Hanser, Albrecht Zaiß, Philipp Daumke Checking coding completeness by mining discharge summaries.
Extraction of Adverse Drug Effects from Clinical Records E. ARAMAKI* Ph.D., Y. MIURA **, M. TONOIKE ** Ph.D., T. OHKUMA ** Ph.D., H. MASHUICHI ** Ph.D.,K.WAKI.
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.
Exploitation of Structured Knowledge Sources for Question Answering: Future Aspects Stefan Schulz Markus Kreuzthaler Ulrich Andersen.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Survey of Semantic Annotation Platforms
Open Health Natural Language Processing Consortium (OHNLP)
Our contribution to SNOMED CT implementation Javier Fernández ITServer product manager.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.
A semantic based methodology to classify and protect sensitive data in medical records Flora Amato, Valentina Casola, Antonino Mazzeo, Sara Romano Dipartimento.
20 th of May 2004 Beatrice Alex School of Informatics The University of Edinburgh Mixed-Lingual Entity Recognition.
IntroductionMethods & MaterialsResults Conclusions The Office of Standards and Interoperability (OFTSI) of the Foundation TicSalut, is working on the need.
I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
Acknowledgements Contact Information Objective An automated annotation tool was developed to assist human annotators in the efficient production of a high.
A School of Information Science, Federal University of Minas Gerais, Brazil b Medical University of Graz, Austria, c University Medical Center Freiburg,
Multimodal User Interface with Natural Language Classification for Clinicians At Point of Care Health Informatics Showcase Peter Budd Sponsors: NCCH -
Semiautomatic domain model building from text-data Petr Šaloun Petr Klimánek Zdenek Velart Petr Šaloun Petr Klimánek Zdenek Velart SMAP 2011, Vigo, Spain,
Understanding eMeasures – And Their Impact on the EHR June 3, 2014 Linda Hyde, RHIA.
1 Guy Divita Qing Zeng-Treitler Salt Lake City VA, University of Utah School of Medicine Pragmatic Interoperability.
Layered MorphoSaurus Lexicon Extension. Problem Confuse and arbitrary synonym classes of non-medical concepts High ambiguity of general (non- terminological)
Blog Summarization We have built a blog summarization system to assist people in getting opinions from the blogs. After identifying topic-relevant sentences,
MedKAT Medical Knowledge Analysis Tool December 2009.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Detection of underspecifications in SNOMED CT concept definitions using language processing 1 Federal Technical University of Paraná (UTFPR), Curitiba,
Open Health Natural Language Processing Consortium
Overview of Statistical NLP IR Group Meeting March 7, 2006.
SNOMED CT Vendor Introduction 27 th October :30 (CET) Implementation Special Interest Group Tom Seabury IHTSDO.
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
1 Alberta Health Services Capital Health Palliative Care Program Clinical Vocabulary Pilot Project Project Update Friday April 24, 2009 Dennis Lee & Francis.
Acquisition of Character Translation Rules for Supporting SNOMED CT Localizations Jose Antonio Miñarro-Giménez a Johannes Hellrich b Stefan Schulz a a.
Large-Scale Evaluation of a Medical Cross- Language Information Retrieval System Kornél Markó 1,2, Philipp Daumke 1,2, Stefan Schulz 2, Rüdiger Klar 2,
Assessing SNOMED CT for Large Scale eHealth Deployments in the EU Workpackage 2- Building new Evidence Daniel Karlsson, Linköping University Stefan Schulz,
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
CRF &SVM in Medication Extraction
Our contribution to SNOMED CT implementation Javier Fernández
Multilingual Biomedical Dictionary
Lexical ambiguity in SNOMED CT
Morphoogle - A Multilingual Interface to a Web Search Engine
SNOMED-CT representation Radiologic report Admission Letter
By Hossein Hematialam and Wlodek Zadrozny Presented by
Presentation transcript:

Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan SchulzFreiburg University Medical Center, Germany Federal Technological University of Paraná, Brazil

Introduction Methods Results Conclusion

Background Important role of narrative content in the EHR Manual coding: cost, quality and scope problems Increasing demand for high-quality structured data SNOMED CT as a new terminological standard claims to represent the whole clinical process Can language technology help semantically enrich narratives in the Electronic Health Record ? Introduction Methods Results Conclusion

Case study Source: – discharge summaries from the cardiology department of the Hospital de Clínicas de Porto Alegre, Brazil – Language: Portuguese Target SNOMED Clinical Terms, 01/2009 Languages: English, Spanish Introduction Methods Results Conclusion

Sample Discharge Summary # HAS # DM # Miocardiopatia dilatada chagásica (FE 35%) # Ca de prostata - orquiectomia (2004) # Cardiopatia isquêmica - IAM em 2005, com colocação de stent em DA e lesão severa inoperável em CD Pct vem a emergência em 20/03 com quadro de dor torácica típica, sem elevação enzimática, com diagnóstico de angina instável e fibrilação atrial não identificada em avaliações prévias. Adicionalmente, apresentava descompensação do diabetes com sindrome hiperosmlar não cetótica. Recebe tratamento clínico para otimização do quadro e é submetido a novo cateterismo em 28/03, que demonstra CD ocluída no terço proximal, DA com stent rpoximal com lesão de 40% no seu interior e Mg de Cx com lesão de 60-65%. Recebe alta em bom estado geral, sem dor torácica, anticoagulado, com plano de retorno ambulatorial para equipe de cardiopatia isquêmica e para o ambulatório de anticoagulação. Acronyms Abbreviations Punctuation errors Typing errors Telegram Style Introduction Methods Results Conclusion

NLP pipeline sentence detecting spell checking acronym expansion NE recognition POS tagging NP extraction context detection morpho- semantic abstraction SCT - EN SCT - SP subset creation morpho- semantic abstraction MID- Representation SNOMED CT MID- Representation Term candidates Introduction Methods Results Conclusion

Language processing tools implemented Sentence splitter, POS tagger: openNLP, trained with manually annotated texts Acronym expander: RegExp matching against acronym database, disambiguation by local context (token cooccurrence in a three token window) Noun phrase detector: driven by typical POS patterns in Spanish SNOMED CT descriptions (with few adaptations to Portuguese, due to the similarity between the two languages) Not yet implemented: spell checker, NE-recognizer, context (e.g. negation) detector Introduction Methods Results Conclusion

Morphosemantic Abstraction Using MSI (morphosemantic indexing) toolkit (Averbis GmbH, Freiburg) Extraction of significant word fragments (subwords) and mapping to semantic identifiers (MIDs): #derm = {heart, cardiac, herz, kard, corac, cardiac, coeur, … } #inflamm = { inflamm, -itic, -itis, -phlog, entzuend, -itis, inflam, flog, inflam, flog,... } Thesaurus ~ equivalence classes Lexicon entries: – English:~ – German:~ – Portuguese:~ – Spanish:~ – French:~ – Swedish:~ – Italian:~ muscle myo muskel muscul inflamm -itis inflam entzünd Eq Class subword herz heart card corazon card INFLAMM MUSCLE HEART Introduction Methods Results Conclusion

Methods: NLP pipeline sentence detecting spell checking acronym expansion NE recognition POS tagging NP extraction context detection morpho- semantic abstraction SCT - EN SCT - SP subset creation morpho- semantic abstraction MID- Representation SNOMED CT MID- Representation Term candidates Mapping Heuristics Introduction Methods Results Conclusion

SNOMED CT Concepts as Subwords SNOMED CT Concept Description MIDs ENG: Congestive heart failure#abund #cardiac #deficien ENG: Congestive heart disease#abund #cardiac #disorder ENG: Congestive cardiac failure#abund #cardiac #deficien SPA: Insuficiencia cardíaca #insuff #cardiac SPA: Insuficiencia cardíaca congestiva#insuff #cardiac #abund Introduction Methods Results Conclusion

Mapping heuristics For each term candidate decide whether there is a matching SNOMED description if yes, find the best SNOMED description map to the pertaining SNOMED description Preference criteria: matching with “term-typical” POS patterns MID coincidence (weighted by tf-idf) threshold: 60% In case of failure: test whether term candidate corresponds to two SNOMED concepts. Plausibility of concept coordinations using SNOMED relationship table Introduction Methods Results Conclusion

Gold standard (kappa = 0.89) Introduction Methods Results Conclusion

First results Introduction Methods Results Conclusion

Conclusion Work in progress – Encouraging preliminary results – SNOMED mapping possible across language boundaries Future work – Implement and test pipeline elements not implemented so far – Measure impact of each pipeline element for mapping quality – Scientific challenges: Automated context (e.g. plan, order, negation) identification Use of SNOMED CT’s ontological structure for improving mapping result Introduction Methods Results Conclusion

Acknowledgements German Research Foundation (DFG) International Bureau of the German Ministry of Research (BMBF-IB) Brazilian National Research Council (CNPq) Hospital de Clínicas de Porto Alegre (HCPA) Averbis GmbH, Germany