Knowledge Discovery and Data Mining to Assist Natural Language Understanding (Adam Wilcox, M.A., George Hripcsak, M.D. Department of Medical Informatics,

Slides:



Advertisements
Similar presentations
Predicting Readmissions (and other outcomes) Doesn’t Take a PhD John Showalter, MD MSIS Chief Health Information Officer University of Mississippi Medical.
Advertisements

Diagnosing – Critical Activity HINF Medical Methodologies Session 7.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Clinical computing and the repository George Hripcsak Jim Cimino Pete Stetson.
Information Extraction from Clinical Reports Wendy W. Chapman, PhD University of Pittsburgh Department of Biomedical Informatics.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Supporting Medical Decision Making with Electronic Medical Records James J. Cimino Departments of Medicine and Medical Informatics Columbia University.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Faculty of Computer Science © 2006 CMPUT 605March 31, 2008 Towards Applying Text Mining and Natural Language Processing for Biomedical Ontology Acquisition.
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
Data Mining: Discovering Information From Bio-Data Present by: Hongli Li & Nianya Liu University of Massachusetts Lowell.
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
Data Mining – Intro.
Harnessing World Wide Web Technology and Standardized Terminology to Improve Decision Making for Patients and Providers James J. Cimino Departments of.
Data Mining: A Closer Look
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture Notes by Neşe Yalabık Spring 2011.
An Exercise in Machine Learning
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Automated Classification of Medical Questions Using Semantic Parsing Techniques Paul E. Pancoast, MD Arthur B. Smith, MS Chi-Ren Shyu, PhD University of.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
Endocrinology Thyroid Function Tests Case F Tu Nguyen Tuan Tran Thi Trang.
Introduction To Data Mining. What Is Data Mining? A toolA tool Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful)
Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Image Classification 영상분류
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
The Transition to What you need to know for Pulmonary Medicine Date | Presenter Information.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Acknowledgements Contact Information Anthony Wong, MTech 1, Senthil K. Nachimuthu, MD 1, Peter J. Haug, MD 1,2 Patterns and Rules  Vital signs medoids.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
Classification of Emergency Department CT Imaging Reports using Natural Language Processing and Machine Learning Efsun Sarioglu, Kabir Yadav, Meaghan Smith,
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College Bio Informatics January
1 Tournament Not complete Processing will begin again tonight, 7:30PM until wee hours Friday, 8-5. Extra Credit 5 points for passing screening, in tournament.
Clinical Decision Support 1 Historical Perspectives.
Automatic Discovery and Processing of EEG Cohorts from Clinical Records Mission: Enable comparative research by automatically uncovering clinical knowledge.
PhD Dissertation Defense Scaling Up Machine Learning Algorithms to Handle Big Data BY KHALIFEH ALJADDA ADVISOR: PROFESSOR JOHN A. MILLER DEC-2014 Computer.
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
AUTONOMOUS REQUIREMENTS SPECIFICATION PROCESSING USING NATURAL LANGUAGE PROCESSING - Vivek Punjabi.
Copyright © 2016 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Data Mining By: Johan Johansson. Mining Techniques Association Rules Association Rules Decision Trees Decision Trees Clustering Clustering Nearest Neighbor.
Open Health Natural Language Processing Consortium
1 An introduction to support vector machine (SVM) Advisor : Dr.Hsu Graduate : Ching –Wen Hong.
A direct relationship exists between the amount of TSH in the sample and the RLUs detected by the instrument optical system.
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
  The thyroid gland The thyroid gland is a small butterfly-shaped gland at the base of the neck. It weighs only about 20 grams. However, the hormones.
Modeling of Core Protection Calculator System Software February 28, 2005 Kim, Sung Ho Kim, Sung Ho.
Terminology Server - Semantics for Medical Documentation and Interoperability / Dortmund © Peter Haas, Robert Mützner / Working Group Med. Informatics.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Brief Intro to Machine Learning CS539
Data Mining – Intro.
Sentiment analysis algorithms and applications: A survey
DATA MINING © Prentice Hall.
Prepared by: Mahmoud Rafeek Al-Farra
Basic Intro Tutorial on Machine Learning and Data Mining
Classifying the Thyroid Disease
A Modified Naïve Possibilistic Classifier for Numerical Data
Congestive Heart Failure in Elderly Patients
Collen Award Honoree Session S77
The Organizational Impacts on Software Quality and Defect Estimation
A task of induction to find patterns
Pattern Recognition: Statistical and Neural
Presentation transcript:

Knowledge Discovery and Data Mining to Assist Natural Language Understanding (Adam Wilcox, M.A., George Hripcsak, M.D. Department of Medical Informatics, Columbia University, New York, NY.,1998) Presented by Chaveevan Pechsiri

outline Objective Methodologies Results Discussion Suggestion

Objective Generate queries and rules Interpret the output from MedLEE processor at Columbia-Presbyterian Medical Center Techniques: NLP Data mining: Classification by using C5.0  Chest radiograph reports + clinic encounters

Methodologies NLP Findings with modifiers Generate a vector report Flattening = finding + modifier Coding = flattening + modifier value Classification The decision tree C5.0(ID3)

NLP Words & pharses recognition Std. term generation Classify terms to semantic catagories Parse sequences of semantic categories to structures Narrative report MedLEE processor Findings with modifiers Clinical dictionary Grammar rules dictionary congestive heart failure, heart failure, CHF left pleural effusion…… …….. new pleural effusion

NLP Pulmonary vascular congestion certainty: high degree: low Pleural effusion region: left status: new Congestive change certainty: moderate degree: low “Probable mild pulmonary vascular congestion with new left pleural effusion, question mild congestive changes Processor output (3Findings with modifiers) Narrative report NLP MedLEE

Coding finding-modifier pair Pulmonary vascular congestion certainty: high degree: low Pleural effusion region: left status: new Congestive change certainty: moderate degree: low Processor output pulmonary vascular congestion= present pulmonary vascular congestion: certainty= high pulmonary vascular congestion : degree= low pleural effusion= present pleural effusion: region= left pleural effusion: status= new congestive change= present congestive change: certainty= moderate congestive change: degree= low Finding vector report

Diagnosing Hypothyroidism Attribute Assay 1 Assay 2 Assay age sex F M M on thyroxine t f f query on thyroxine f f f on antithyroid medication f f f sick f f f pregnant t N/A N/A thyroid surgery f f f I131 treatment f f f query hypothyroid f f t query hyperthyroid t f f lithium f f f tumor f f f goitre f f f hypopituitary f f f psych f f f TSH T TT T4U FTI referral source other SVI other diagnosis negative primary compensated hypothyr hypothyr C5.0 Decision table

C5.0 If-then rules Rule 1: (31, lift 42.7) thyroid surgery = f TSH > 6 TT4 <= 37 -> class primary [0.970] Rule 2: (63/6, lift 39.3) TSH > 6 FTI <= 65 -> class primary [0.892] Rule 3: (270/116, lift 10.3) TSH > 6 -> class compensated [0.570] Rule 4: (2225/2, lift 1.1) TSH <= 6 -> class negative [0.999] Rule 5: (296, lift 1.1) on thyroxine = t FTI > 65 -> class negative [0.997]

Error Measurement TP=True Positive FN=False Negative TN=True Negative FP=False Negative

results

Discussion The automated method did not reach the level of the physicians High noise in training set The training set is too small to properly train the system to detect positive findings. The training set with ICD9 was not accurate enough to create rules the ambiguities cause C5.0 error, or lack of strong specificity

Suggestion Need a large training set to generate a sensitive classifier Ontology should be implemented to clinical dictionary Need to modify the ICD9 code The knowledge discovery should be the generalized knowledge Try some other classifiers: Bayesian belief networks, the Backpropagation neural network, the sequential covering algorithm