Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Slides:



Advertisements
Similar presentations
Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Advertisements

Advanced Piloting Cruise Plot.
Generative Design in Civil Engineering Using Cellular Automata Rafal Kicinger June 16, 2006.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 1 Embedded Computing.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 5 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
STATISTICS HYPOTHESES TEST (II) One-sample tests on the mean and variance Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National.
ITRS Roadmap Design + System Drivers Makuhari, December 2007 Worldwide Design ITWG Good morning. Here we present the work that the ITRS Design TWG has.
A Pipeline for Computer Aided Polyp Detection Wei Hong, Feng Qiu, and Arie Kuafman Center for Visual Computing (CVC) and Department of Computer Science.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
Configuration management
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
ABC Technology Project
3 Logic The Study of What’s True or False or Somewhere in Between.
Reconstruction from Voxels (GATE-540)
VOORBLAD.
Quadratic Inequalities
Copyright © 2013, 2009, 2006 Pearson Education, Inc.
1 CS 391L: Machine Learning: Rule Learning Raymond J. Mooney University of Texas at Austin.
Lecture 3 Validity of screening and diagnostic tests
Machine Learning: Intro and Supervised Classification
© 2012 National Heart Foundation of Australia. Slide 2.
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Lecture 8: Testing, Verification and Validation
Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1 Section 5.4 Polynomials in Several Variables Copyright © 2013, 2009, 2006 Pearson Education, Inc.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Dept of Biomedical Engineering, Medical Informatics Linköpings universitet, Linköping, Sweden A Data Pre-processing Method to Increase.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
Week 1.
We will resume in: 25 Minutes.
A SMALL TRUTH TO MAKE LIFE 100%
PSSA Preparation.
1 PART 1 ILLUSTRATION OF DOCUMENTS  Brief introduction to the documents contained in the envelope  Detailed clarification of the documents content.
People Counting and Human Detection in a Challenging Situation Ya-Li Hou and Grantham K. H. Pang IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART.
Classification Classification Examples
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Computer Aided Diagnosis: CAD overview
· Information gathering · Data analysis · Decision making · “ Human life is too important to be left to a computer “ Patients receive the best treatment.
A Computer Aided Detection System For Digital Mammograms Based on Radial Basis Functions and Feature Extraction Techniques By Mohammed Jirari Shanghai,
Copyright © Siemens Medical Solutions, USA, Inc.; All rights reserved. Polyhedral Classifier for Target Detection A Case Study: Colorectal Cancer.
For internal use only / Copyright © Siemens AG All rights reserved. Multiple-instance learning improves CAD detection of masses in digital mammography.
Automatic Detection And Classification Of Microcalcifications In Digital Mammograms Institute for Brain and Neural Systems Brown University Providence.
APPLICATION : DIAGNOSTIC CODING 1 SIEMENS  Coding is the translation of diagnosis terms describing patients diagnosis or treatment into a coded number.
Learning Classifiers for Computer Aided Diagnosis Using Local Correlations Glenn Fung, Computer-Aided Diagnosis and Therapy Siemens Medical Solutions,
Designing Efficient Cascaded Classifiers: Tradeoff between Accuracy and Cost Vikas Raykar Balaji Krishnapuram Shipeng Yu Siemens Healthcare KDD 2010 TexPoint.
Learning Classifiers For Non-IID Data
Introduction to Medical Imaging Mammography and Computer Aided Diagnostic (CAD) Example Guy Gilboa Course
Computers in Healthcare Jinbo Bi Department of Computer Science and Engineering Connecticut Institute for Clinical and Translational Research University.
Copyright © 2010 Siemens Medical Solutions USA, Inc. All rights reserved. Hierarchical Segmentation and Identification of Thoracic Vertebra Using Learning-based.
AdvisorStudent Dr. Jia Li Shaojun Liu Dept. of Computer Science and Engineering, Oakland University Automatic 3D Image Segmentation of Internal Lung Structures.
Copyright © 2006 Siemens Medical Solutions USA, Inc. All rights reserved. Learning-based Component for Suppression of Rectal Tube False Positives: Evaluation.
Supervised learning from multiple experts
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Presentation transcript:

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat Dundar Vikas Raykar Shipeng Yu Sriram Krishnan Xiang Zhou Arun Krishnan Marcos Salganicoff Luca Bogoni Matthias Wolf Anna Jerebko Jonathan Stoeckel

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 2 Outline of the talk Mining medical images Computer aided diagnosis (CAD) Key data mining challenges Clinical impact Lessons learnt Several thousand units of the products described in this paper have been commercially deployed in hospitals around the world since 2004

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 3 Medical Imaging 1895 X-ray used for broken bones, locating foreign objects 1970 Computed tomography (CT) 3-D imaging As resolution increased in-vivo imaging is widely used to locate medical abnormalities for diagnosis and surgery planning Digital Mammogram CT Scan

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 4 Mining medical imaging data Increased resolution has resulted in Data Overload Increased total study time Increase in data does not always translate to improved diagnosis Automatically extract the actionable information from the imaging data in order to ensure improvement in patient care simultaneous reduction in total study time Raw imaging data Clinically relevant information Knowledge based data-mining algorithms Knowledge based data-mining algorithms Computer aided diagnosis/detection CAD

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 5 Computer-aided diagnosis/detection (CAD) Used as a second reader Improves the detection performance of a radiologist Reduces mistakes related to misinterpretation The principal value of CAD is determined by carefully measuring the incremental value of CAD in normal clinical practice CAD technologies support the physician by drawing attention to structures in the image that may require further review.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 6 Lung CAD Identify suspicious regions called nodules (which are known to be precursors of cancer) in CT scans of the lung.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 7 Colon PEV Polyp Enhanced Viewer Identify suspicious regions called polyps in CT scans of the colon.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 8 Mammo CAD Identify abnormal masses/calcifications in digital mammograms. PECAD and MammoCAD are only sold outside the US.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 9 PE CAD Pulmonary Embolism (PE) is a sudden blockage in a pulmonary artery caused by an embolus that is formed in one part of the body and travels to the lungs in the bloodstream through the heart. PECAD and MammoCAD are only sold outside the US.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 10 CAD Goal is to detect potentially malignant nodules (lung) polyps (colon) lesions (breast) Pulmonary emboli (lung) in medical images like CT scans, X-ray, MRI, etc. Early detection provides the best prognosis

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 11 Typical CAD architecture Candidate Generation Feature Computation Classification Image [ X-ray | CT scan | MRI ] Location of lesions Focus of the current talk Potential candidates Lesion > 90% sensitivity FP/image > 80% sensitivity 2-5 FP/image

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 12 Key Data Mining Challenges High accuracy 2-5 FP/image sensitivity > 80% 1.The breakdown of assumptions 2.Highly unbalanced data 3.Feature computation cost 4.Incorporating domain knowledge 5.No objective ground truth

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 13 The breakdown of assumptions region on a mammogramlesionnot a lesion Traditional classification algorithms Neural networks Support Vector Machines Logistic Regression …. Often violated in CAD Make two key assumptions (1) Training samples are independent (2) Maximize classification accuracy over all candidates

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 14 Violation 1: Training examples are correlated Candidate generation produces a lot of spatially adjacent candidates. Hence there are high level of correlations among candidates. Also correlations exist across different images/detector type/hospitals.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 15 Violation 2: Candidate level accuracy is not important Several candidates from the CG point to the same lesion in the breast. Lesion is detected if at least one of them is detected. It is fine if we miss adjacent overlapping candidates. Hence CAD system accuracy is measured in terms of per lesion/image/patient sensitivity. So why not optimize the performance metric we use to evaluate our system? Most algorithms maximize classification accuracy. Try to classify every candidate correctly.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 16 Solution 1: Multiple Instance Learning Fung, et al. 2006, Bi, et al. 2007, Raykar et al. 2008, Krishnapuram, et al. 2008, How do we acquire labels ? Candidates which overlap with the radiologist mark is a positive. Rest are negative Single Instance Learning Multiple Instance Learning Classify every candidate correctly Positive Bag Classify at-least one candidate correctly We have modified SVM and logistic regression for multiple instance learning

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 17 Simple Illustration Single instance learning: Reject as many negative candidates as possible. Detect as many positives as possible. Multiple Instance Learning Single Instance Learning Multiple instance learning: Reject as many negative candidates as possible. Detect at-least one candidate in a positive bag. Accounts for correlation during training

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 18 Solution 2: Batch Classification Vural et al., 2009 Accounts for correlation during testing Change the decision boundary during test time.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 19 Skewed data and expensive features 1.Highly unbalanced class distribution (less than 1% are abnormal) 2.Huge number of experimentally engineered features 3.Lot of them are irrelevant and redundant. 4.Feature computation is expensive 5.Stringent run-time requirements 1.Feature selection/Sparse classifiers 2.Cascaded classification architecture 1.Feature selection/Sparse classifiers 2.Cascaded classification architecture

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 20 Cascaded classification architecture Bi, et al. 2006

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 21 Novel AND-OR training of cascades Dundar and Bi 2007

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 22 Incorporating domain knowledge We know that lesions have different shapes/sizes/appearance

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 23 Gated Classification architecture

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 24 Incorporating domain knowledge Dundar et al Exploit different sub-classes of False Positives

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 25 Subjective Ground truth Raykar et al Lesion IDRadiologist 1 Radiologist 2 Radiologist 3 Radiologist 4 Truth Unknown x x x x x x x Each radiologist is asked to annotate whether a lesion is malignant (1) or not (0). In practice there is a substantial amount of disagreement. We have no knowledge of the actual golden ground truth. Getting absolute ground truth (e.g. biopsy) can be expensive. We have proposed an EM algorithm to simultaneously learn the ground truth and the classifier. We have proposed an EM algorithm to simultaneously learn the ground truth and the classifier.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 26 Key Data Mining Challenges ChallengeSolutions 1.Training/testing data is correlatedMultiple instance learning batch classification 2.Evaluation metric is CAD specificMultiple instance learning 3.Highly unbalanced dataCascaded classifiers 4.Feature computation costCascaded classifiers Feature selection methods 5.Incorporating domain knowledgeGated classifiers Polyhedral classifiers 6.No objective ground truthEM algorithm

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 27 Clinical Impact 1.How much can a radiologist benefit by using the CAD software ? 2.CAD is mostly deployed in second reader mode. 3.Measure the improvement in performance of a radiologist with CAD. 4.Several independent clinical studies/trials have been conducted by our collaborators worldwide.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 28 Lung CAD 1.FDA clinical validation study with17 radiologists,196 cases from 4 hospitals. Average reader AUC increased by (p<0.001) because of CAD. 2.Recent study at NYU by Godoy et al New prototype also helps detect different kinds of nodules.. Mean sensitivity without CAD Mean sensitivity with CAD Increase in sensitivity Solid Nodules60%85%15 % Part-solid Nodules80%95%15% Ground Glass Opacities75%86%11% Sensitivity without CADSensitivity with CADIncrease in sensitivity Reader %66.0 %9.8 % Reader %89.8 %10.6 %

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 29 Colon PEV Colon PEV (Polyp Enhanced Viewer) was evaluated by Baker, et al Study with seven less-experienced readers Without PEV average sensitivity was With PEV average sensitivity was A 9.8% increase in average sensitivity (p=0.0152)..

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 30 PE CAD Das et al conducted a study with 43 patients to asses the sensitivity of detection of pulmonary embolism.. Sensitivity without CAD Sensitivity with CAD Increase in sensitivity Reader 187%98%11% Reader 282%93%11% Reader 377%92%15%

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 31 Key data mining lessons 1.True measure of impact is how much does CAD help the radiologists. 2.Design algorithms to optimize the metric you care about 3.Careful analysis of the assumptions behind off-the-shelf data-mining algorithms. In CAD most of these assumptions break down. Need to design new methods. 4.Domain knowledge is very important. Collaboration with radiologists is crucial in eliciting the domain knowledge.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 32 Conclusions 1.Radiologists have access to orders of magnitude more data for diagnosing various cancers. 2. Difficult and time-consuming to identify key clinical findings. 3. We described the data-mining challenges in a commercially deployed CAD software. 4. Use of CAD as second reader improves radiologist's detection performance. 5. Key opportunity for data mining technologies to impact patient care worldwide.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Page 33 Acknowledgements Dr. D. Naidich, MD, of New York University Dr. M. E. Baker, MD, of the Cleveland Clinic Foundation Dr. M. Das, MD, of the University of Aachen Dr. U. J. Schoepf, MD, of the Medical University of South Carolina Dr. Peter Herzog, MD, of Klinikum Grossharden, Munich. Alok Gupta, Ph.D., Ingo Schmuecking, MD, Harald Steck, Ph.D., Stefan Niculescu, Ph.D., Romer Rosales, Ph.D., Sangmin Park, Ph.D., Gerardo Valadez Ph.D. Maleeha Qazi, and the entire SISL team.