Download presentation
Presentation is loading. Please wait.
Published byAdelia Edwards Modified over 9 years ago
1
Knowledge Discovery and Data Mining to Assist Natural Language Understanding (Adam Wilcox, M.A., George Hripcsak, M.D. Department of Medical Informatics, Columbia University, New York, NY.,1998) Presented by Chaveevan Pechsiri
2
outline Objective Methodologies Results Discussion Suggestion
3
Objective Generate queries and rules Interpret the output from MedLEE processor at Columbia-Presbyterian Medical Center Techniques: NLP Data mining: Classification by using C5.0 Chest radiograph reports + clinic encounters
4
Methodologies NLP Findings with modifiers Generate a vector report Flattening = finding + modifier Coding = flattening + modifier value Classification The decision tree C5.0(ID3)
5
NLP Words & pharses recognition Std. term generation Classify terms to semantic catagories Parse sequences of semantic categories to structures Narrative report MedLEE processor Findings with modifiers Clinical dictionary Grammar rules dictionary congestive heart failure, heart failure, CHF left pleural effusion…… …….. new pleural effusion
6
NLP Pulmonary vascular congestion certainty: high degree: low Pleural effusion region: left status: new Congestive change certainty: moderate degree: low “Probable mild pulmonary vascular congestion with new left pleural effusion, question mild congestive changes Processor output (3Findings with modifiers) Narrative report NLP MedLEE
7
Coding finding-modifier pair Pulmonary vascular congestion certainty: high degree: low Pleural effusion region: left status: new Congestive change certainty: moderate degree: low Processor output pulmonary vascular congestion= present pulmonary vascular congestion: certainty= high pulmonary vascular congestion : degree= low pleural effusion= present pleural effusion: region= left pleural effusion: status= new congestive change= present congestive change: certainty= moderate congestive change: degree= low Finding vector report
8
Diagnosing Hypothyroidism Attribute Assay 1 Assay 2 Assay 3.....age 32 63 19 sex F M M on thyroxine t f f query on thyroxine f f f on antithyroid medication f f f sick f f f pregnant t N/A N/A thyroid surgery f f f I131 treatment f f f query hypothyroid f f t query hyperthyroid t f f lithium f f f tumor f f f goitre f f f hypopituitary f f f psych f f f TSH 0.025 108 9 T3 3.7.4 2.2 TT4 139 14 117 T4U 1.34.98 - FTI 104 14 - referral source other SVI other diagnosis negative primary compensated hypothyr hypothyr C5.0 Decision table
9
C5.0 If-then rules Rule 1: (31, lift 42.7) thyroid surgery = f TSH > 6 TT4 <= 37 -> class primary [0.970] Rule 2: (63/6, lift 39.3) TSH > 6 FTI <= 65 -> class primary [0.892] Rule 3: (270/116, lift 10.3) TSH > 6 -> class compensated [0.570] Rule 4: (2225/2, lift 1.1) TSH <= 6 -> class negative [0.999] Rule 5: (296, lift 1.1) on thyroxine = t FTI > 65 -> class negative [0.997]
10
Error Measurement TP=True Positive FN=False Negative TN=True Negative FP=False Negative
11
results
13
Discussion The automated method did not reach the level of the physicians High noise in training set The training set is too small to properly train the system to detect positive findings. The training set with ICD9 was not accurate enough to create rules the ambiguities cause C5.0 error, or lack of strong specificity
14
Suggestion Need a large training set to generate a sensitive classifier Ontology should be implemented to clinical dictionary Need to modify the ICD9 code The knowledge discovery should be the generalized knowledge Try some other classifiers: Bayesian belief networks, the Backpropagation neural network, the sequential covering algorithm
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.