Detection of Spelling Errors in Swedish Clinical Text Nizamuddin Uddin and Hercules Dalianis Department of Computer and Systems Sciences, (DSV)

Slides:



Advertisements
Similar presentations
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Advertisements

Language tools for writers Ola Knutsson IPLab, NADA, KTH Sweden.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
A method for unsupervised broad-coverage lexical error detection and correction 4th Workshop on Innovative Uses of NLP for Building Educational Applications.
ICT and medicine IT & C Department AP - Secretariat.
Inverted Index Hongning Wang
Tools for Text Review. Algorithms The heart of computer science Definition: A finite sequence of instructions with the properties that –Each instruction.
Creating and Evaluating a Consensus for Negated and Speculative Words in a Swedish Clinical Corpus Hercules Dalianis Maria Skeppstedt Stockholm University.
What is a national corpus. Primary objective of a national corpus is to provide linguists with a tool to investigate a language in the diversity of types.
1 Health Text Lexical Processing Mojtaba Sabbagh.
Learning objectives:- 1. Introduction. 2. Define health record. 3. Explain types of health record. 4. Mention purposes of health record. 5. List general.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
CS2422 Assembly Language and System Programming Linking Loader Department of Computer Science National Tsing Hua University.
Named Entity Recognition in an Intranet Query Log Richard Sutcliffe 1, Kieran White 1, Udo Kruschwitz University of Limerick, Ireland 2 - University.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Chapter 1 Data Storage. 2 Chapter 1: Data Storage 1.1 Bits and Their Storage 1.2 Main Memory 1.3 Mass Storage 1.4 Representing Information as Bit Patterns.
1 Applying intention-based guidelines for critiquing Robert-Jan Sips, Loes Braun, and Nico Roos Department of Computer Science, Maastricht University,
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dörre, Peter Gerstl, and Roland Seiffert Presented By: Jake Happs,
Simplification Tactics Dr. Saeed Shiry Amirkabir University of Technology Computer Engineering & Information Technology Department.
Tom wrote Texts C and D in science lessons
With Microsoft Access 2010 © 2011 Pearson Education, Inc. Publishing as Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Access.
Communication is Vital! Technology is your friend!
Word Processing. ► This is using a computer for:  Writing  EditingTEXT  Printing  Used to write letters, books, memos and produce posters etc.  A.
HIBBs is a program of the Global Health Informatics Partnership Introduction To Electronic Medical Records Y. SINGH NELSON R. MANDELA SCHOOL OF MEDICINE.
Stefan Schulz, Thorsten Seddig, Susanne Hanser, Albrecht Zaiß, Philipp Daumke Checking coding completeness by mining discharge summaries.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.
Electronic Health Records Based on Alliance for Health Reform Toolkit on Health Information Technology Narrated by Leonel V. Baliton.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture Notes by Neşe Yalabık Spring 2011.
Extraction of Adverse Drug Effects from Clinical Records E. ARAMAKI* Ph.D., Y. MIURA **, M. TONOIKE ** Ph.D., T. OHKUMA ** Ph.D., H. MASHUICHI ** Ph.D.,K.WAKI.
Performing the Study Data Collection
Word Processing Standard Grade Computing LA/LM. Word processor a computer program that allows you to manipulate text What is?
Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA 2 Department of Information Studies, University at Albany SUNY, Albany,
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
By Mahmmoud Rashed Moussa Habanbo George Bu Nader © Copyright AUST Department of Computer Science.
WJEC Applied ICT Spreadsheet Skills 1.Introduction to Financial Modelling Definition A model is a program which has been developed to copy the way.
Information Extraction From Medical Records by Alexander Barsky.
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189.
1 The Ferret Copy Detector Finding short passages of similar texts in large document collections Relevance to natural computing: System is based on processing.
Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.
Acknowledgements Contact Information Anthony Wong, MTech 1, Senthil K. Nachimuthu, MD 1, Peter J. Haug, MD 1,2 Patterns and Rules  Vital signs medoids.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
1 CREATING A RESEARCH PAPER (25 June 2010) Objectives: To create a Research Paper using MLA Documentation style.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Information Integration Entity Resolution – 21.7 Presented By: Deepti Bhardwaj Roll No: 223_103.
Introduction, purpose and General Rules for Documentation Dr. Ali Abd El-Monsif Thabet.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research.
Proofing Documents Lesson 9 #1.09.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY & ROCCHIO CLASSIFICATION Kezban Demirtas
INTRODUCTION TO APPLIED LINGUISTICS
IT is key to controlling rising health costs and improving quality of patient care Health and IT
Best-of-Breed Hybrid Methods for Text De-identification Yang H, Garibaldi JM. Automatic detection of protected health information from clinical narratives.
Editing Editing – the process of updating a word processing document to: make changes correct errors make it visually appealing.
Showcasing work by Jonnageddala, Liaw, Ray, Kumar, Chang, and Dai on
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Clinical NLP in North Germanic Languages
Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2
Electronic Health Records
Prepared by: Mahmoud Rafeek Al-Farra
Course Introduction CSC 576: Data Mining.
Only 28% of U.S. Primary Care Physicians Have Electronic Medical Records; Only 19% Advanced IT Capacity, 2006 Percent reporting 7 or more out of 14 functions*
Table 3. Decompression process using LZW
Presentation transcript:

Detection of Spelling Errors in Swedish Clinical Text Nizamuddin Uddin and Hercules Dalianis Department of Computer and Systems Sciences, (DSV)

Introduction Hospitals produce text based records at different stages of patient health care process. –Stored in Electronic Patient Record System (EPRS) –Contain valueable information such as patient symptoms, diagnoses, treatment and future plans etc. Uddin & Dalianis 2

To improve patient health care process. Important resource for research in the clinical domain. Automatic text processing methods to use –information retrieval –text summarization –decision support and statistical analysis etc. Uddin & Dalianis 3

Problem The text is in non-standardized form May contain: –Spelling errors. –Medical jargons and domain specific abbreviations. –Telegraphic style, not complete style sentences The reason for that is: –It may entered in time presure. –It is mostly used for internal communication. Uddin & Dalianis 4

Related Research The spelling problem was addressed previously by many researchers. –In 1994 Domeij et al, developed an algorithm for detection spelling errors in Swedish ordinary text. –Patrick and Nguyen in 2006 developed a rule based system for detection and correction of spelling errors of clinical text. –Recently Isenius et al (2012) developed an algorithm which detects abbreviations. –Ruch et al (2003) detects 10 percent spelling errors in clinical text written in French. Uddin & Dalianis 5

Methods Techniques –N-gram analysis. Uses predefined word list table called N- gram table. –Lexical Lookup. Uses dictionaries or word lists and simply search input text in the available dictionaries. Uddin & Dalianis 6

Data Stockholm EPR Corpus –Karolinska University Hospital patient records – –Over one million patients Subset –Stockholm EPR PHI Corpus –100 patient records 151,924 words Uddin & Dalianis 7

Preprocessing –Tokenization. Split the stream of word into token. Remove PHI (Named entities) instances. Remove digits etc. –Compound splitting. Swedish language is compound rich language. Used compound splitter. –Lemmatization. To convert each input string into its base form. Used CST lemmatizer. Uddin & Dalianis 8

Algorithm Developed an algorithm –Use Stockholm EPR PHI Corpus –Uses seven different dictionaries (Medical + Swedish). Algorithm detects spelling errors in two stages. –Perform preprocessing (Tokenization + Compound splitting). –Compare or search each input string in available dictionaries. Uddin & Dalianis 9

10 Results

Continue... Uddin & Dalianis 11

Continue… 36.8 percent were correctly spelled words that was detected as misspelled words (false positives). 7.6 percent spelling errors after correcting false positive. 4.2 percent were compound words. 2.9 percent were abbreviations. Uddin & Dalianis 12

Discussion Error rate in patient record is much higher (8%) than other type of text (1-2%) Reasons are –Other type of text are read by a large number of people –While patient records are mostly used for internal communication –No spelling correction in EPRS Result supports the previous research –10 percent spelling errors in clinical text written in French by Ruch et al., (2003). Uddin & Dalianis 13

Discussion Performance of the algorithm. –Exclusively depends on dictionaries –Improved word lists or dictionaries may improve the result. –Improved preprocess may also improve the result for example better compound splitter. Uddin & Dalianis 14

Future Work Use Web instead of dictionaries for of detection spelling errors. The existing algorithm can be improved with correction feature by using edit distance algorithm. Uddin & Dalianis 15

Thank you! Questions?