Download presentation
Published byAllen Blake Modified over 9 years ago
1
CAD Panel Meeting General CAD Methods Nicholas Petrick, Ph.D.
Deputy Director, Division of Imaging and Applied Math, OSEL Radiological Devices Panel Meeting March 4, 2008
2
Outline What is CAD Basic components of CAD algorithms
Clinical implementations of CAD Evaluating CAD Algorithms Non-clinical testing Clinical testing Basic statistical tools
3
What is CAD?
4
What is CADe? CADe: Computer-aided detection devices Also termed CAD
Designed to identify findings (or regions) on an image that may be abnormal Prompting devices only
5
What is CADx? CADx: Computer-aided diagnosis Also termed CAD
Designed to process a specific finding (or region) to characterize the finding Likelihood of malignancy Recommended clinical action Describe the finding Helps physician determine what he/she is looking at 0.26 0.77 0.27
6
What is CADx? CADx: Computer-aided diagnosis Termed CAD also
Designed to process a specific finding (or region) to characterize the finding Likelihood of malignancy Recommended clinical action Describe the finding Helps physician determine what he/she is looking at B1 B4 B2
7
Artificial Intelligence
What is a CAD? Statistics Pattern Recognition Artificial Intelligence CAD Medicine Physics Biology Image Processing CAD encompasses many disciplines
8
Basic Blocks in CADe Algorithms
Image processing Segmentation Features/feature selection Classification Sequencing and block details differ between CADe algorithms Acquire Digital Data Image Processing Segmentation Features & Feature Selections Classification Annotation
9
Acquire Data Digital data can come from Digitized film
Acquire Digital Data Digital data can come from Digitized film Direct digital devices FFDM CT Many others Image Processing Segmentation Features & Feature Selections Annotation Classification Mass
10
Image Processing Image is enhanced or processed to facilitate analysis
Acquire Digital Data Image is enhanced or processed to facilitate analysis Image Processing Segmentation Features & Feature Selections Annotation Classification
11
Segmentation Identify boundaries or regions within the image
Acquire Digital Data Identify boundaries or regions within the image Lesion candidates Organs Image Processing Segmentation Features & Feature Selections Annotation Classification
12
Features Features Feature selection F1: Area F2: Perim
Acquire Digital Data Image Processing Segmentation Features & Feature Selections Classification Annotation Features Characterize regions or pixels within a dataset Shape Texture Curvature … Feature selection Process for selecting informative features F1: Area F2: Perim
13
Classification Classification Classifier types
Acquire Digital Data Classification Features input to learning algorithm Combine into an output score Classifier types Multiple thresholds, LDA, Neural Network Training/Test paradigm critical Image Processing Segmentation Features & Feature Selections Annotation Classification F1 FN Trained Learning Machine Object Score
14
Threshold applied to object scores
Classification Acquire Digital Data Classification Features input to learning algorithm Combine into an output score Classifier types Multiple thresholds, LDA, Neural Network Training/Test paradigm critical Image Processing Segmentation Features & Feature Selections Annotation Classification Threshold applied to object scores
15
Annotation CADe Annotations Prompts of potential abnormalities
Acquire Digital Data CADe Annotations Prompts of potential abnormalities Image Processing Segmentation Features & Feature Selections Annotation Classification
16
Basic Blocks in a CADx Algorithm
Characterization of a finding Basic blocks are similar Image processing Features/feature selection Classification Sequencing and block details differ between CADx algorithms Identified Region Image Processing Features & Feature Selections Classification Annotation
17
Basic Blocks in a CADx Algorithm
Characterization of a finding Basic blocks are similar Image processing Features/feature selection Classification Sequencing and block details differ between CADx algorithms 0.79 0.91 0.66
18
Training CADs Process for systematically improving performance for a set of data known as the training set Maximize sensitivity Maximize area under ROC curve Training can be performed By computer Regression or optimization techniques By humans Tweak parameters or combination of parameters Algorithm fixed after training
19
No. of Patients per Class
Training CADs Learning Curve Training (learning) is a dynamic process Increasing training data Increases performance Decrease variability ROC Area No. of Patients per Class Learning curve 3 feature linear classifier
20
Clinical Use of CAD
21
Discussion questions: M6, C7, L6
CAD Reading Paradigms First reader Physician reviews only regions or findings marked by the CAD device Unmarked regions not necessarily evaluated by physician No radiological CAD device approved/cleared for this mode Discussion questions: M6, C7, L6
22
CAD Reading Paradigms Second reader
Physician first conducts a complete interpretation without CAD (unaided read) Then re-conducts an interpretation with the CAD device (aided read). Also termed “second detector” or “sequential reader” Example Mammography CADs Some lung CADs
23
CAD Reading Paradigms Concurrent read
Physician performs a complete interpretation in the presence of CAD marks CAD marks are available at any time Examples Some colon CAD devices are potentially used in this way
24
CAD Factors Influencing Clinical Use
Physical characteristics of mark Physicians may respond differently to different types of marks* CAD standalone performance Number of CAD marks Knowledge of Se & FP rate may affect user confidence in or attention to CAD marks Change in interpretation Change in reading time Increase review time Maintain/decrease review time *EA Krupinski et al., A Perceptually Based Method for Enhancing Pulmonary Nodule Recognition. 28(4) Investigative Radiology 289 (1993).
25
Evaluating CAD Algorithms Non-clinical Evaluation
26
Non-Clinical Evaluation
Device & algorithm descriptions Stability analysis
27
Non-Clinical Evaluation
Algorithm description Different CAD devices contain different processing Easier to assess/compare if devices are not “blackboxes” To understand a CAD the following info is needed Patients targeted by device Device usage (e.g., reading mode, etc) Image processing, segmentation, etc Features, classifiers, etc Training & training data, etc Discussion question: G1
28
Discussion question: G1
Algorithm Stability Stable algorithm Similar performance with changes in algorithm, features, training, or training databases Stability increases as No. of training cases increases No. of initial features decreases Complexity of the CAD decreases Discussion question: G1
29
Why Stability Analysis?
Indicates if performance due to fortuitous training/test set Algorithm updates produce evolving performance *Example only: Not an actual device More Stable Training CI
30
Why Stability Analysis?
Indicates if performance due to fortuitous training/test set Algorithm updates produce evolving performance *Example only: Not an actual device Less Stable Training CI
31
Evaluating CAD Algorithms Clinical Testing
32
Hierarchical Model of Efficacy*
Level 1 Technical efficacy Physical & bench tests Level 2 Diagnostic accuracy Se/Sp, ROC curve, etc Level 3 Diagnostic thinking Effect on clinicians’ estimates of diagnostic probabilities, pretest to posttest Level 4 Therapeutic efficacy Effect on therapeutic management Level 5 Patient outcome Value in terms of quality-adjusted life years (QALYs), etc. Level 6 Societal efficacy Overall societal benefit *Fryback, Thornbury, “The efficacy of diagnostic imaging,” Med Decis Making 11:88–94, 1991.
33
Hierarchical Model of Efficacy*
Levels imaging technology sponsors generally focus when going through FDA Sponsors & FDA are not constrained to these levels Level 1 Technical efficacy Physical & bench tests Level 2 Diagnostic accuracy Se/Sp, ROC curve, etc Level 3 Diagnostic thinking Effect on clinicians’ estimates of diagnostic probabilities, pretest to posttest Level 4 Therapeutic efficacy Effect on therapeutic management Level 5 Patient outcome Value in terms of quality-adjusted life years (QALYs), etc. Level 6 Societal efficacy Overall societal benefit *Fryback, Thornbury, “The efficacy of diagnostic imaging,” Med Decis Making 11:88–94, 1991.
34
Classes of Tests Standalone performance testing
Performance of the device by itself Intrinsic functionality of the device Reader performance testing Performance of physicians using the device Impact on physician performance
35
Discussion questions: M1, C3, L2
Standalone Testing Performance of the device by itself Establish Scoring Rule & Method Discussion questions: M1, C3, L2 Establish Truthing Rule & Method Establish Ground Truth Apply CADe Device Acquire Test Dataset Apply Scoring Statistical Analysis
36
Discussion questions: M1, M4, C3, C6, L2, L5
Test Dataset Clinical images used to determine safety and effectiveness of a CAD Different from set used to train/develop or validate CAD Represents target population & target disease condition Usually includes clinically relevant spectrum of patients, imaging hardware & protocols Discussion questions: M1, M4, C3, C6, L2, L5
37
Acquiring Test Dataset
Field test accrual Collection during real-time clinical interpretation Enrichment accrual Enrichment for low prevalence of disease Enrich with disease cases at a higher proportion than in population Enrichment for stress testing Enrich with cases containing challenging findings Stress testing usually includes a comparison modality
38
Reuse of Test Data Ideal testing paradigm Develop CAD algorithm
Collect testing cases Apply CAD Report standalone and/or reader performance results G2
39
Discussion question: G2
Reuse of Test Data Sponsor may want to compare performance of revised algorithm with same or expanded version of test cases Developer may have gained knowledge (learned) by knowing performance of original CAD on test data For larger datasets and minimal feedback, knowledge gain may be quite small May be possible to reuse test data under appropriate constraints to streamline assessment What may be appropriate constraints to balance data integrity & data collection? Discussion question: G2
40
Standalone Testing Performance of the device by itself
Establish Scoring Rule & Method Establish Truthing Rule & Method Establish Ground Truth Acquire Test Dataset Apply Scoring Statistical Analysis Apply CADe Device
41
Discussion questions: C2, L1
Ground Truth Ground truthing includes: Whether or not disease is present (patient level) Location and/or extent of the disease (lesion level) Types of ground truthing Cancerous lesions Biopsy/pathology (Follow-up imaging for normals) Non-cancerous lesions Expert panel reviews all available clinical information May be others Discussion questions: C2, L1
42
Ground Truth by Expert Panel
Experts almost always required to determine lesion locations May also determine if abnormality is present Experts are susceptible to reader variability Multiple readers allow measure of truth variability
43
Ground Truthing: Mammography
Patient-level Pathology verified cancer in left breast
44
Ground Truthing: Mammography
Lesion-level Radiologist identifies region of lesion Clinician identified ROI
45
Ground Truthing: Mammography
Lesion-level Radiologist segments region
46
Standalone Testing Performance of the device by itself
Establish Scoring Rule & Method Establish Truthing Rule & Method Establish Ground Truth Acquire Test Dataset Apply Scoring Statistical Analysis Apply CADe Device
47
Scoring Rules and Methods
Truth Segmentation Used to determine whether CAD marks a true lesion Overlap between CAD/truth Discussion questions: M1, C3, L2 CAD Segmentation
48
Scoring Rules and Methods
Truth Centroid Used to determine whether CAD marks a true lesion Distance between CAD/truth centroids Distance= 2.1 mm Scoring by a physician CAD Centroid
49
Standalone Performance Measures
Lesion-based sensitivity and number of FPs per image (or per scan) [Se, FPs/Image] Free Response Receiver Operating Characteristic (FROC) curve No. of FPs (per image) 1.0 2.0 3.0 4.0 0.0 0.2 0.4 0.6 0.8 TPF, sensitivity 5.0
50
Evaluating CAD Algorithms Clinical Testing Reader Performance Testing
51
Reader Performance Testing
Performance of physicians using the device Establish Scoring Rule & Method Establish Truthing Rule & Method Establish Ground Truth Discussion questions: M2, C4, L3 Apply Scoring Apply CADe Read w CADe Read w/o CADe Acquire Test Dataset Statistical Analysis Apply Scoring Identify Study Readers
52
Reader Selection Readers generally selected to be representative of intended users Representative of clinicians who will use device Representative of proper clinician experience level Reader performance testing depends on Proper understanding & using of the CAD device Proper understanding & implementation of study protocol Training of readers is a key to achieving both
53
Designing Reader Studies
Common endpoints Common CAD study designs
54
Evaluating CAD Algorithms Clinical Testing Study Endpoints
55
Discussion questions: M2, C4, L3
Study Endpoints Patient analysis [Sensitivity, Specificity] ROC analysis Location–specific analysis Location-specific ROC Free-response ROC (FROC) Discussion questions: M2, C4, L3
56
Patient Endpoints Assessing CADx Assessing CADe
Not accounting for location Identified Region CADx Aid CADx POM: 0.91 Clinician POM: 0.95
57
Patient-Based Endpoints
Patient analysis does not account for localizing the lesion Endpoints Binary decision (single threshold) [Sensitivity, Specificity] ([Se, Sp]) operating point Rating/ranking (range of thresholds) Receiver operating characteristic (ROC) curve
58
True Positive Fraction False Positive Fraction
Se/Sp Operating Point [Se, Sp] operating point Comparing without/with CAD Often Higher Se, Lower Sp Many other possible endpoints 1.0 Reader+CAD True Positive Fraction = Sensitivity Reader alone 0.0 0.0 1.0 False Positive Fraction = 1.0 Specificity
59
ROC Assessment Non-diseased cases Diseased cases Computer score
60
Single Operating Point
Single Threshold Non-diseased cases Single Operating Point TPF, sensitivity Diseased cases FPF, 1-specificity
61
Entire ROC Curve Non-diseased cases TPF, sensitivity Threshold Range
FPF, 1-specificity
62
True Positive Fraction False Positive Fraction
ROC Analysis Comparing without/with CAD Often Higher Se, Lower Sp ROC can facilitate comparison Requires ordering cases from least to most suspicious Ratings often used to facilitate ordering 1.0 Reader+CAD True Positive Fraction = Sensitivity Reader alone 0.0 0.0 1.0 False Positive Fraction = 1.0 Specificity
63
True Positive Fraction False Positive Fraction
ROC Analysis Performance metrics ROC area (AUC) Average TPF across all possible FPFs Partial area under the curve (PAUC) Challenge to link AUC measures to clinical relevance 1.0 AUC1 True Positive Fraction = Sensitivity AUC2 0.0 0.0 1.0 False Positive Fraction = 1.0 Specificity
64
Location-Specific Endpoints
CADe Device Assessing CADe Location is important Multiple prompt on the same image Truthing rule now is critical component
65
Location-Specific ROC
ROC analysis that requires correct location of the lesion One scored location per patient Location must be on the lesion
66
Location-Based Operating Points
[Se, No. FPs] operating point Comparing without/with CAD Often higher Se along with more FPs Many other possible endpoints 1.0 0.8 0.6 TPF, sensitivity 0.4 0.2 0.0 0.0 1.0 2.0 3.0 4.0 5.0 No. of FPs (per image)
67
Free-Response ROC All [Se, No. FPs] combinations All thresholds
1.0 0.8 0.6 TPF, sensitivity 0.4 0.2 0.0 0.0 1.0 2.0 3.0 4.0 5.0 No. of FPs (per image)
68
FROC Performance Metrics
Area under FROC curve Need to choose FP range Area under alternative FROC (AFROC) Challenges Link measures to clinical relevance Statistical methodology 1.0 0.8 0.6 TPF, sensitivity 0.4 0.2 0.0 0.0 1.0 2.0 3.0 4.0 5.0 No. of FPs (per image)
69
Evaluating CAD Algorithms Clinical Testing Reader Study Designs
70
Reader Performance Study Designs
Prospective studies Retrospective studies Some CAD study designs Warren-Burhenne MRMC Discussion questions: M4, C6, L5
71
Prospective Reader Studies
CAD performance measured as part of actual clinical practice Field testing of CAD devices
72
Retrospective Reader Studies
Cases are collected prior to image interpretation Typically enriched or stress test dataset used Read offline by one or more readers under specific reading conditions CAD Examples Mammography CAD devices Lung nodule CAD devices
73
Warren-Burhenne Study Design*
Two separate studies Retrospective study of CAD Se to detect abnormalities “missed” in clinical practice Estimated relative reduction in false negative (FN) rate with CAD Commonly a prospective study of the work-up rate of readers with & without CAD in clinical practice Difference in work-up rate is attributed to use of CAD Study design in early mammography CAD approvals *Warren Burhenne et al, Radiology 215: , 2000.
74
Warren-Burhenne Study Design
Fundamental limitation is that reduction in FN rate & increase in work-up rate are not being evaluated in same study Study design can be difficult to interpret statistically Study design goal is to estimate “potential” effect on the FN rate
75
Multiple Reader Multiple Case (MRMC) Study Design
Study where a set of readers interpret a set of patient images, in each of two competing reading conditions With and without CAD Could be either prospective or retrospective Fully-crossed design All readers read all of the cases in both modalities Most statistical power for given number of cases Hybrid designs are also evaluable
76
MRMC Study Design Advantages Generalizes to new readers & cases
Cases are random effects Readers are random effects Advantages Greater statistical power for given number of cases MRMC studies can accommodate [Se, Sp] endpoints ROC endpoints FROC endpoint MRMC studies are generally statistically interpretable
77
Patient-Based MRMC Analysis
Includes [Se, Sp] or ROC endpoints Well established methodologies & tools Jackknife/ANOVA Dorfman, Berbaum and Metz ANOVA and correlation model Obuchowski Ordinal regression Toledano and Gatsonis Bootstrap Beiden, Wagner, and Campbell One-shot estimate Gallas
78
Location-Based MRMC Analysis
Accounts for correct localization of lesions Statistical methodologies & tools are available Region-of-Interest ROC analysis Obuchowski et al., Rutter Divide patient data into ROIs (e.g., quadrant or lobe) Jackknife FROC (JAFROC) Chakraborty & Berbaum Bootstrap FROC analysis Samuelson and Petrick, Bornefalk and Hermansson
79
Evaluating CAD Algorithms Further Statistical Issues
Next Talk Evaluating CAD Algorithms Further Statistical Issues
80
Extra Slides
81
ROC and Operating Point
It is possible to obtain both a rating/ranking as well as action item within the same reader study Not necessarily just one or the other Examples Determine if patient should have workup Rate patient level of suspicion Rate level of suspicion for individual lesions Determine if individual lesions require workup
82
Example from literature
Jiang et. al, “Improving breast cancer diagnosis with computer-aided diagnosis,” Academic Radiology. 6(1):22-33, 1999. Authors studied ROC curves, ROC areas and [Se, Sp] operating point Characterization of microcalcifications Quasi-continuous ratings & action item
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.