Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Extraction for Clinical Data Mining: A Mammography Case Study H. Nassif, R. Woods, E. Burnside, M. Ayvaci, J. Shavlik and D. Page University.

Similar presentations


Presentation on theme: "Information Extraction for Clinical Data Mining: A Mammography Case Study H. Nassif, R. Woods, E. Burnside, M. Ayvaci, J. Shavlik and D. Page University."— Presentation transcript:

1 Information Extraction for Clinical Data Mining: A Mammography Case Study H. Nassif, R. Woods, E. Burnside, M. Ayvaci, J. Shavlik and D. Page University of Wisconsin – Madison, USA

2 The American Cancer Society, Cancer Facts & Figures 2009.

3 Impression (free text) Mammogram Radiologist Structured Database Predictive Model Benign Malignant

4 Task Formulation Given: - Free text radiology report - Standard lexicon (BI-RADS) Do: - Extract lexicon concepts from text - Populate a structured database Why: - Automate information extraction - Manual extraction is labor intensive - Consistency checks

5 BI-RADS Lexicon Concepts

6 Lobular ShapeOval ShapeObscured Margin… Report 1010… Report 2101… …………… Example In the right breast, an approximately 1.0 cm mass is identified in the right upper slightly inner breast. This mass is noncalcified and partially obscured and lobulated in appearance. Concepts

7

8 Syntax Analyzer Tokenize sentences Discard punctuation Keep stop words Stem words

9

10 Information from Lexicon Lexicon specifies synonyms: Eg: Equal density, Isodense Lexicon allows for ambiguous wording: TextConcept indistinct margin indistinct calcificationamorphous calcification indistinct imagenot a concept

11

12 Experts Provide domain specific information – Synonyms: Oval, Ovoid – Acronyms, abbreviations – Domain idiosyncrasies Interact with and modify semantic rules

13

14 Concept Finder Context Free Grammar rules Extract concepts from text Rule formation: – Initial rules based on lexicon – Rules refined by experts

15 Rule Generation Example 1 Aim: Regional Distribution Concept Lexicon specifies the word “regional” Initial rule: presence of the word “regional” Run on training set, experts see results Many false positives: – “regional medical center”, “regional hospital” Rule refined by experts: – “regional.* !(medical|hospital)”

16 Rule Generation Example 2 Aim: Skin Thickening Concept Lexicon specifies “skin thickening” Try “skin” and “thickening” in same sentence – “skin retraction and thickening” – “thickening of the overlying skin” – “A BB placed on the skin overlying a palpable focal area of thickening in the upper outer right breast” Experts suggest “skin” and “thickening” in close proximity

17 Scope Scope: distance between two words Start with a large scope: – assess number of true and false positives Move to smaller scopes: – assess number of false negatives Check precision and recall estimates Experts decide on the best distance

18

19 Negation Detector Negation triggers (Mutalik 01, Gindl 08): – “not”, if not preceded by “where” – “no” – “without” Precedes or appears within the subsentence Establish negation scope “without evidence of suspicious cluster of microcalcifications”

20 Negation Deactivation “there is no change in the rounded density” Negation-deactivation triggers: – Change – All – Correlation – Differ – Other

21 Multiple Latent Concepts Mammography reports: – Radiology concepts – Ultrasound concepts – MRI concepts… “round hypoechoic mass” – Concept should not be extracted Provide an ultrasound lexicon: – Algorithm handles multiple latent concepts

22 Experiment Training set: 146,198 reports, unlabeled Testing set: 100 reports, labeled by radiologist Algorithm differs over 43 concept occurrences – Correctly extracts 28 Lobular ShapeOval ShapeObscured Margin… Report 1010… Report 2101… ……………

23 Contingency Table on Test Set Automated v/s Manual Feature Extraction Actual Concept presentConcept absent PredictedConcept present 211 (198)5 (5) Concept absent 10 (23)4074 (4074)

24 Statistics Ground truth? – Features that both methods agree on – Experts re-label diverging cases Probabilistic interpretation of contingency table (Goutte 05) Computational method is statistically superior to the manual method (p=0.024)

25 Conclusion Automated extraction that matches experts Novel contributions: – Negation-deactivation triggers – Handling multiple latent concepts Improves our current breast cancer classifier (work in progress)


Download ppt "Information Extraction for Clinical Data Mining: A Mammography Case Study H. Nassif, R. Woods, E. Burnside, M. Ayvaci, J. Shavlik and D. Page University."

Similar presentations


Ads by Google