Download presentation
Presentation is loading. Please wait.
Published byLucas Rogers Modified over 9 years ago
1
Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I. Dutra University of Wisconsin – Madison, and University of Porto, Portugal
2
The American Cancer Society, Cancer Facts & Figures 2009.
3
Impression (free text) Mammogram Radiologist Structured Database Predictive Model Benign Malignant
4
BI-RADS Lexicon Concepts
5
Lobular ShapeOval ShapeObscured Margin… Report 1010… Report 2101… …………… Example In the right breast, an approximately 1.0 cm mass is identified in the right upper slightly inner breast. This mass is noncalcified and partially obscured and lobulated in appearance. Concepts
6
Nassif 09
7
Syntax Analyzer Tokenize sentences Discard punctuation Keep stop words Stem words
8
Nassif 09
9
Information from Lexicon Translate lexicon into Portuguese Lexicon specifies synonyms: Eg: Equal density, Isodense Lexicon allows for ambiguous wording: TextConcept indistinct margin indistinct calcificationamorphous calcification indistinct imagenot a concept
10
Nassif 09
11
Experts Provide domain specific information – Synonyms: Oval, Ovoid – Acronyms, abbreviations – Domain idiosyncrasies Interact with and modify semantic rules
12
Nassif 09
13
Concept Finder Regular expression rules Extract concepts from text Rule formation: – Initial rules based on lexicon – Rules refined by experts
14
Rule Generation Example 1 Aim: Regional Distribution Concept Lexicon specifies the word “regional” Initial rule: presence of the word “regional” Run on training set, experts see results Many false positives: – “regional medical center”, “regional hospital” Rule refined by experts: – “regional.* !(medical|hospital)”
15
Rule Generation Example 2 Aim: Skin Thickening Concept Lexicon specifies “skin thickening” Try “skin” and “thickening” in same sentence – “skin retraction and thickening” – “thickening of the overlying skin” – “A BB placed on the skin overlying a palpable focal area of thickening in the upper outer right breast” Experts suggest “skin” and “thickening” in close proximity
16
Scope Scope: distance between two words Start with a large scope: – assess number of true and false positives Move to smaller scopes: – assess number of false negatives Check precision and recall estimates Experts decide on the best distance
17
Nassif 09
18
Negation Detector Negation triggers (Mutalik 01, Gindl 08): – “não” (not) when not preceded by “onde” (where) – “sem” (without) – “nem” (nor). Precedes or appears within the subsentence Establish negation scope “without evidence of suspicious cluster of microcalcifications”
19
Dataset Training set: 1,129 reports, unlabeled Testing set: 153 pairs, labeled by radiologist – Basic screening report – Detailed diagnostic report Perform three refinement passes – Double blind, based on lexicon – Refine rules – Refine manual labeling and rules
20
Results
21
Conclusion Out of 48 disputed cases, parser correctly classified 25 (52.1%) First Portuguese BI-RADS extractor – Discovers features missed or misclassified – Similar performance to manual annotation Method portable to other languages
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.