Presentation is loading. Please wait.

Presentation is loading. Please wait.

Knowledge Discovery and Data Mining to Assist Natural Language Understanding (Adam Wilcox, M.A., George Hripcsak, M.D. Department of Medical Informatics,

Similar presentations


Presentation on theme: "Knowledge Discovery and Data Mining to Assist Natural Language Understanding (Adam Wilcox, M.A., George Hripcsak, M.D. Department of Medical Informatics,"— Presentation transcript:

1 Knowledge Discovery and Data Mining to Assist Natural Language Understanding (Adam Wilcox, M.A., George Hripcsak, M.D. Department of Medical Informatics, Columbia University, New York, NY.,1998) Presented by Chaveevan Pechsiri

2 outline Objective Methodologies Results Discussion Suggestion

3 Objective Generate queries and rules Interpret the output from MedLEE processor at Columbia-Presbyterian Medical Center Techniques: NLP Data mining: Classification by using C5.0  Chest radiograph reports + clinic encounters

4 Methodologies NLP Findings with modifiers Generate a vector report Flattening = finding + modifier Coding = flattening + modifier value Classification The decision tree C5.0(ID3)

5 NLP Words & pharses recognition Std. term generation Classify terms to semantic catagories Parse sequences of semantic categories to structures Narrative report MedLEE processor Findings with modifiers Clinical dictionary Grammar rules dictionary congestive heart failure, heart failure, CHF left pleural effusion…… …….. new pleural effusion

6 NLP Pulmonary vascular congestion certainty: high degree: low Pleural effusion region: left status: new Congestive change certainty: moderate degree: low “Probable mild pulmonary vascular congestion with new left pleural effusion, question mild congestive changes Processor output (3Findings with modifiers) Narrative report NLP MedLEE

7 Coding finding-modifier pair Pulmonary vascular congestion certainty: high degree: low Pleural effusion region: left status: new Congestive change certainty: moderate degree: low Processor output pulmonary vascular congestion= present pulmonary vascular congestion: certainty= high pulmonary vascular congestion : degree= low pleural effusion= present pleural effusion: region= left pleural effusion: status= new congestive change= present congestive change: certainty= moderate congestive change: degree= low Finding vector report

8 Diagnosing Hypothyroidism Attribute Assay 1 Assay 2 Assay 3.....age 32 63 19 sex F M M on thyroxine t f f query on thyroxine f f f on antithyroid medication f f f sick f f f pregnant t N/A N/A thyroid surgery f f f I131 treatment f f f query hypothyroid f f t query hyperthyroid t f f lithium f f f tumor f f f goitre f f f hypopituitary f f f psych f f f TSH 0.025 108 9 T3 3.7.4 2.2 TT4 139 14 117 T4U 1.34.98 - FTI 104 14 - referral source other SVI other diagnosis negative primary compensated hypothyr hypothyr C5.0 Decision table

9 C5.0 If-then rules Rule 1: (31, lift 42.7) thyroid surgery = f TSH > 6 TT4 <= 37 -> class primary [0.970] Rule 2: (63/6, lift 39.3) TSH > 6 FTI <= 65 -> class primary [0.892] Rule 3: (270/116, lift 10.3) TSH > 6 -> class compensated [0.570] Rule 4: (2225/2, lift 1.1) TSH <= 6 -> class negative [0.999] Rule 5: (296, lift 1.1) on thyroxine = t FTI > 65 -> class negative [0.997]

10 Error Measurement TP=True Positive FN=False Negative TN=True Negative FP=False Negative

11 results

12

13 Discussion The automated method did not reach the level of the physicians High noise in training set The training set is too small to properly train the system to detect positive findings. The training set with ICD9 was not accurate enough to create rules the ambiguities cause C5.0 error, or lack of strong specificity

14 Suggestion Need a large training set to generate a sensitive classifier Ontology should be implemented to clinical dictionary Need to modify the ICD9 code The knowledge discovery should be the generalized knowledge Try some other classifiers: Bayesian belief networks, the Backpropagation neural network, the sequential covering algorithm


Download ppt "Knowledge Discovery and Data Mining to Assist Natural Language Understanding (Adam Wilcox, M.A., George Hripcsak, M.D. Department of Medical Informatics,"

Similar presentations


Ads by Google