Information Extraction for Clinical Data Mining: A Mammography Case Study H. Nassif, R. Woods, E. Burnside, M. Ayvaci, J. Shavlik and D. Page University.

Slides:



Advertisements
Similar presentations
eClassifier: Tool for Taxonomies
Advertisements

Golan.O, Sperber.F, Shalmon.A, Weinstein.I, Gat.A
View Learning: An extension to SRL An application in Mammography Jesse Davis, Beth Burnside, Inês Dutra Vítor Santos Costa, David Page, Jude Shavlik &
1 Hypothesis testing. 2 A common aim in many studies is to check whether the data agree with certain predictions. These predictions are hypotheses about.
Automatic Report Generation from Ontologies: the MIAKT Approach Kalina Bontcheva, Yorick Wilks Department of Computer Science University of Sheffield.
COMPUTATIONAL INTELLIGENCE FOR THE DETECTION AND CLASSIFICATION OF MALIGNANT LESIONS IN SCREENING MAMMOGRAPHY DATA E. Panourgias,
Computer Aided Diagnosis: CAD overview
· Information gathering · Data analysis · Decision making · “ Human life is too important to be left to a computer “ Patients receive the best treatment.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Presented by Zeehasham Rasheed
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
February 13, 1997CWU B.Kovalerchuk1 DESIGN OF CONSISTENT SYSTEM FOR RADIOLOGISTS TO SUPPORT BREAST CANCER DIAGNOSIS.
Automatic Detection And Classification Of Microcalcifications In Digital Mammograms Institute for Brain and Neural Systems Brown University Providence.
Introduction to Machine Learning Approach Lecture 5.
Breast Neoplasm In this section we will be discussing breast neoplasm.
Breast Imaging Made Brief and Simple
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Stefan Schulz, Thorsten Seddig, Susanne Hanser, Albrecht Zaiß, Philipp Daumke Checking coding completeness by mining discharge summaries.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Volumetric Breast Density understandbreastdensity.or g.
AJCC Staging Moments AJCC TNM Staging 7th Edition Breast Case #2 Contributors: Stephen B. Edge, MD Roswell Park Cancer Institute, Buffalo, New York David.
Uncovering Age-Specific Invasive and DCIS Breast Cancer Rules Using Inductive Logic Programming Houssam Nassif, David Page, Mehmet Ayvaci, Jude Shavlik,
Bayesian Network for Predicting Invasive and In-situ Breast Cancer using Mammographic Findings Jagpreet Chhatwal1 O. Alagoz1, E.S. Burnside1, H. Nassif1,
ENDA MOLLOY, ELECTRONIC ENG. FINAL PRESENTATION, 31/03/09. Automated Image Analysis Techniques for Screening of Mammography Images.
Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach by: Craig A. Knoblock, Kristina Lerman Steven Minton, Ion Muslea Presented.
Age Stratified Risk Prediction of Invasive versus In-situ Breast Cancer: A Logistic Regression Model Mehmet Ayvaci 1,2 Oguzhan Alagoz 1,Jagpreet Chhatwal.
LECTURE 19 THURSDAY, 14 April STA 291 Spring
DR (MRS) AUGUSTINA BADU-PEPRAH MB Ch B, FWACS RADIOLOGIST KATH.
Introduction to Breast Imaging BREAST RAD LAB Directions: Please answer all the questions prior to interactive conference. 1.
Integrating Machine Learning and Physician Knowledge to Improve the Accuracy of Breast Biopsy Inês Dutra University of Porto, CRACS & INESC-Porto LA Houssam.
Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.
Introduction to Clinical Radiology: The Breast
How will you approach the 35-year old, with a 2x2x2cm, firm, mobile, well-circumscribed non-tender mass on her R breast?
Managed by UT-Battelle for the Department of Energy Learning Cue Phrase Patterns from Radiology Reports Using a Genetic Algorithm Robert M. Patton, Ph.D.
Presenter: Shanshan Lu 03/04/2010
3D Mammography Ernesto Coto Sören Grimm Stefan Bruckner M. Eduard Gröller Institute of Computer Graphics and Algorithms Vienna University of Technology.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
C M Clarke-Hill1 Analysing Quantitative Data Forming the Hypothesis Inferential Methods - an overview Research Methods.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Chapter 23: Probabilistic Language Models April 13, 2004.
Section 3.3: The Story of Statistical Inference Section 4.1: Testing Where a Proportion Is.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
Christian A. Cumbaa and Igor Jurisica Division of Signaling Biology, Ontario Cancer Institute, Toronto,
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Automatic extraction of BI-RADS breast tissue composition classes from mammography reports Bethany Percha (Stanford) Houssam Nassif (U. Wisconsin) Jafi.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
國立雲林科技大學 National Yunlin University of Science and Technology Intelligent Database Systems Lab 1 Self-organizing map for cluster analysis of a breast cancer.
Dr. Julia Flukinger Breast Radiologist, Director Breast MRI, Advanced Radiology May 21, 2106.
Case (I) Chemosensitive group. (a) Indistinct margined, oval shaped, hyperdense mass in Rt. UIQ with clustered pleomorphic microcalcifications (b) Nearly.
Case (II) Chemoresistant group. Case 6. F/41 Rt. breast cancer, 3 cycle NAC for 2 months (a) Indistinct margined, irregular shaped, hyperdense mass in.
SCORE AS YOU LIFT (SAYL) A Statistical Relational Learning Approach to Uplift Modeling Houssam Nassif 1, Finn Kuusisto 1, Elizabeth S. Burnside 1, David.
BI-RADS classification in mammography
Contrast-enhanced Dedicated Breast CT: Initial Clinical Experience
Example 6: (A,B) Diagnostic mammography was obtained in this 70-year-old woman for a palpable mass in the right breast. There is a large speculated, irregular.
School of Computer Science & Engineering
Application of Classification and Clustering Methods on mVoC (Medical Voice of Customer) data for Scientific Engagement Yingzi Xu, Department of Statistics,
Statistics for Business and Economics (13e)
Beyond Mammography: New Frontiers in Breast Cancer Screening
Current Status of Breast Ultrasound
The efficacy of using CAD for detection of
Extracting Semantic Concept Relations
Automatic Extraction of BI-RADS Features from Cross-Institution and Cross-Language Free-Text Mammography Reports Houssam Nassif, Terrie Kitchner, Filipe.
Ontology-Driven Sentiment Analysis of Product and Service Aspects
Avoiding Pitfalls in Mammographic Interpretation
Low-Grade Adenosquamous Carcinoma of the Breast: Imaging and Histopathologic Characteristics of This Rare Disease  Elena P. Scali, MD, Rola H. Ali, MD,
Multi-parametric MRI Breast: A problem solving method
Presentation transcript:

Information Extraction for Clinical Data Mining: A Mammography Case Study H. Nassif, R. Woods, E. Burnside, M. Ayvaci, J. Shavlik and D. Page University of Wisconsin – Madison, USA

The American Cancer Society, Cancer Facts & Figures 2009.

Impression (free text) Mammogram Radiologist Structured Database Predictive Model Benign Malignant

Task Formulation Given: - Free text radiology report - Standard lexicon (BI-RADS) Do: - Extract lexicon concepts from text - Populate a structured database Why: - Automate information extraction - Manual extraction is labor intensive - Consistency checks

BI-RADS Lexicon Concepts

Lobular ShapeOval ShapeObscured Margin… Report 1010… Report 2101… …………… Example In the right breast, an approximately 1.0 cm mass is identified in the right upper slightly inner breast. This mass is noncalcified and partially obscured and lobulated in appearance. Concepts

Syntax Analyzer Tokenize sentences Discard punctuation Keep stop words Stem words

Information from Lexicon Lexicon specifies synonyms: Eg: Equal density, Isodense Lexicon allows for ambiguous wording: TextConcept indistinct margin indistinct calcificationamorphous calcification indistinct imagenot a concept

Experts Provide domain specific information – Synonyms: Oval, Ovoid – Acronyms, abbreviations – Domain idiosyncrasies Interact with and modify semantic rules

Concept Finder Context Free Grammar rules Extract concepts from text Rule formation: – Initial rules based on lexicon – Rules refined by experts

Rule Generation Example 1 Aim: Regional Distribution Concept Lexicon specifies the word “regional” Initial rule: presence of the word “regional” Run on training set, experts see results Many false positives: – “regional medical center”, “regional hospital” Rule refined by experts: – “regional.* !(medical|hospital)”

Rule Generation Example 2 Aim: Skin Thickening Concept Lexicon specifies “skin thickening” Try “skin” and “thickening” in same sentence – “skin retraction and thickening” – “thickening of the overlying skin” – “A BB placed on the skin overlying a palpable focal area of thickening in the upper outer right breast” Experts suggest “skin” and “thickening” in close proximity

Scope Scope: distance between two words Start with a large scope: – assess number of true and false positives Move to smaller scopes: – assess number of false negatives Check precision and recall estimates Experts decide on the best distance

Negation Detector Negation triggers (Mutalik 01, Gindl 08): – “not”, if not preceded by “where” – “no” – “without” Precedes or appears within the subsentence Establish negation scope “without evidence of suspicious cluster of microcalcifications”

Negation Deactivation “there is no change in the rounded density” Negation-deactivation triggers: – Change – All – Correlation – Differ – Other

Multiple Latent Concepts Mammography reports: – Radiology concepts – Ultrasound concepts – MRI concepts… “round hypoechoic mass” – Concept should not be extracted Provide an ultrasound lexicon: – Algorithm handles multiple latent concepts

Experiment Training set: 146,198 reports, unlabeled Testing set: 100 reports, labeled by radiologist Algorithm differs over 43 concept occurrences – Correctly extracts 28 Lobular ShapeOval ShapeObscured Margin… Report 1010… Report 2101… ……………

Contingency Table on Test Set Automated v/s Manual Feature Extraction Actual Concept presentConcept absent PredictedConcept present 211 (198)5 (5) Concept absent 10 (23)4074 (4074)

Statistics Ground truth? – Features that both methods agree on – Experts re-label diverging cases Probabilistic interpretation of contingency table (Goutte 05) Computational method is statistically superior to the manual method (p=0.024)

Conclusion Automated extraction that matches experts Novel contributions: – Negation-deactivation triggers – Handling multiple latent concepts Improves our current breast cancer classifier (work in progress)