CAD Panel Meeting General CAD Methods Nicholas Petrick, Ph.D.

Name: CAD Panel Meeting General CAD Methods Nicholas Petrick, Ph.D.
Uploaded: 2017-07-19T06:13:09+00:00
Duration: PTM33S56
Channel: Allen Blake
Description: CAD Panel Meeting General CAD Methods Nicholas Petrick, Ph.D.

CAD Panel Meeting General CAD Methods Nicholas Petrick, Ph.D.
Deputy Director, Division of Imaging and Applied Math, OSEL Radiological Devices Panel Meeting March 4, 2008

Outline What is CAD Basic components of CAD algorithms
Clinical implementations of CAD Evaluating CAD Algorithms Non-clinical testing Clinical testing Basic statistical tools

What is CAD?

What is CADe? CADe: Computer-aided detection devices Also termed CAD
Designed to identify findings (or regions) on an image that may be abnormal Prompting devices only

What is CADx? CADx: Computer-aided diagnosis Also termed CAD
Designed to process a specific finding (or region) to characterize the finding Likelihood of malignancy Recommended clinical action Describe the finding Helps physician determine what he/she is looking at 0.26 0.77 0.27

What is CADx? CADx: Computer-aided diagnosis Termed CAD also
Designed to process a specific finding (or region) to characterize the finding Likelihood of malignancy Recommended clinical action Describe the finding Helps physician determine what he/she is looking at B1 B4 B2

Artificial Intelligence
What is a CAD? Statistics Pattern Recognition Artificial Intelligence CAD Medicine Physics Biology Image Processing CAD encompasses many disciplines

Basic Blocks in CADe Algorithms
Image processing Segmentation Features/feature selection Classification Sequencing and block details differ between CADe algorithms Acquire Digital Data Image Processing Segmentation Features & Feature Selections Classification Annotation

Acquire Data Digital data can come from Digitized film
Acquire Digital Data Digital data can come from Digitized film Direct digital devices FFDM CT Many others Image Processing Segmentation Features & Feature Selections Annotation Classification Mass

Image Processing Image is enhanced or processed to facilitate analysis
Acquire Digital Data Image is enhanced or processed to facilitate analysis Image Processing Segmentation Features & Feature Selections Annotation Classification

Segmentation Identify boundaries or regions within the image
Acquire Digital Data Identify boundaries or regions within the image Lesion candidates Organs Image Processing Segmentation Features & Feature Selections Annotation Classification

Features Features Feature selection F1: Area F2: Perim
Acquire Digital Data Image Processing Segmentation Features & Feature Selections Classification Annotation Features Characterize regions or pixels within a dataset Shape Texture Curvature … Feature selection Process for selecting informative features F1: Area F2: Perim

Classification Classification Classifier types
Acquire Digital Data Classification Features input to learning algorithm Combine into an output score Classifier types Multiple thresholds, LDA, Neural Network Training/Test paradigm critical Image Processing Segmentation Features & Feature Selections Annotation Classification F1 FN Trained Learning Machine Object Score

Threshold applied to object scores
Classification Acquire Digital Data Classification Features input to learning algorithm Combine into an output score Classifier types Multiple thresholds, LDA, Neural Network Training/Test paradigm critical Image Processing Segmentation Features & Feature Selections Annotation Classification Threshold applied to object scores

Annotation CADe Annotations Prompts of potential abnormalities
Acquire Digital Data CADe Annotations Prompts of potential abnormalities Image Processing Segmentation Features & Feature Selections Annotation Classification

Basic Blocks in a CADx Algorithm
Characterization of a finding Basic blocks are similar Image processing Features/feature selection Classification Sequencing and block details differ between CADx algorithms Identified Region Image Processing Features & Feature Selections Classification Annotation

Basic Blocks in a CADx Algorithm
Characterization of a finding Basic blocks are similar Image processing Features/feature selection Classification Sequencing and block details differ between CADx algorithms 0.79 0.91 0.66

Training CADs Process for systematically improving performance for a set of data known as the training set Maximize sensitivity Maximize area under ROC curve Training can be performed By computer Regression or optimization techniques By humans Tweak parameters or combination of parameters Algorithm fixed after training

No. of Patients per Class
Training CADs Learning Curve Training (learning) is a dynamic process Increasing training data Increases performance Decrease variability ROC Area No. of Patients per Class Learning curve 3 feature linear classifier

Clinical Use of CAD

Discussion questions: M6, C7, L6
CAD Reading Paradigms First reader Physician reviews only regions or findings marked by the CAD device Unmarked regions not necessarily evaluated by physician No radiological CAD device approved/cleared for this mode Discussion questions: M6, C7, L6

CAD Reading Paradigms Second reader
Physician first conducts a complete interpretation without CAD (unaided read) Then re-conducts an interpretation with the CAD device (aided read). Also termed “second detector” or “sequential reader” Example Mammography CADs Some lung CADs

CAD Reading Paradigms Concurrent read
Physician performs a complete interpretation in the presence of CAD marks CAD marks are available at any time Examples Some colon CAD devices are potentially used in this way

CAD Factors Influencing Clinical Use
Physical characteristics of mark Physicians may respond differently to different types of marks* CAD standalone performance Number of CAD marks Knowledge of Se & FP rate may affect user confidence in or attention to CAD marks Change in interpretation Change in reading time Increase review time Maintain/decrease review time *EA Krupinski et al., A Perceptually Based Method for Enhancing Pulmonary Nodule Recognition. 28(4) Investigative Radiology 289 (1993).

Evaluating CAD Algorithms Non-clinical Evaluation

Non-Clinical Evaluation
Device & algorithm descriptions Stability analysis

Non-Clinical Evaluation
Algorithm description Different CAD devices contain different processing Easier to assess/compare if devices are not “blackboxes” To understand a CAD the following info is needed Patients targeted by device Device usage (e.g., reading mode, etc) Image processing, segmentation, etc Features, classifiers, etc Training & training data, etc Discussion question: G1

Discussion question: G1
Algorithm Stability Stable algorithm Similar performance with changes in algorithm, features, training, or training databases Stability increases as No. of training cases increases No. of initial features decreases Complexity of the CAD decreases Discussion question: G1

Why Stability Analysis?
Indicates if performance due to fortuitous training/test set Algorithm updates produce evolving performance *Example only: Not an actual device More Stable Training CI

Why Stability Analysis?
Indicates if performance due to fortuitous training/test set Algorithm updates produce evolving performance *Example only: Not an actual device Less Stable Training CI

Evaluating CAD Algorithms Clinical Testing

Hierarchical Model of Efficacy*
Level 1 Technical efficacy Physical & bench tests Level 2 Diagnostic accuracy Se/Sp, ROC curve, etc Level 3 Diagnostic thinking Effect on clinicians’ estimates of diagnostic probabilities, pretest to posttest Level 4 Therapeutic efficacy Effect on therapeutic management Level 5 Patient outcome Value in terms of quality-adjusted life years (QALYs), etc. Level 6 Societal efficacy Overall societal benefit *Fryback, Thornbury, “The efficacy of diagnostic imaging,” Med Decis Making 11:88–94, 1991.

Hierarchical Model of Efficacy*
Levels imaging technology sponsors generally focus when going through FDA Sponsors & FDA are not constrained to these levels Level 1 Technical efficacy Physical & bench tests Level 2 Diagnostic accuracy Se/Sp, ROC curve, etc Level 3 Diagnostic thinking Effect on clinicians’ estimates of diagnostic probabilities, pretest to posttest Level 4 Therapeutic efficacy Effect on therapeutic management Level 5 Patient outcome Value in terms of quality-adjusted life years (QALYs), etc. Level 6 Societal efficacy Overall societal benefit *Fryback, Thornbury, “The efficacy of diagnostic imaging,” Med Decis Making 11:88–94, 1991.

Classes of Tests Standalone performance testing
Performance of the device by itself Intrinsic functionality of the device Reader performance testing Performance of physicians using the device Impact on physician performance

Standalone Testing Performance of the device by itself Establish Scoring Rule & Method Discussion questions: M1, C3, L2 Establish Truthing Rule & Method Establish Ground Truth Apply CADe Device Acquire Test Dataset Apply Scoring Statistical Analysis

Discussion questions: M1, M4, C3, C6, L2, L5
Test Dataset Clinical images used to determine safety and effectiveness of a CAD Different from set used to train/develop or validate CAD Represents target population & target disease condition Usually includes clinically relevant spectrum of patients, imaging hardware & protocols Discussion questions: M1, M4, C3, C6, L2, L5

Acquiring Test Dataset
Field test accrual Collection during real-time clinical interpretation Enrichment accrual Enrichment for low prevalence of disease Enrich with disease cases at a higher proportion than in population Enrichment for stress testing Enrich with cases containing challenging findings Stress testing usually includes a comparison modality

Reuse of Test Data Ideal testing paradigm Develop CAD algorithm
Collect testing cases Apply CAD Report standalone and/or reader performance results G2

Discussion question: G2
Reuse of Test Data Sponsor may want to compare performance of revised algorithm with same or expanded version of test cases Developer may have gained knowledge (learned) by knowing performance of original CAD on test data For larger datasets and minimal feedback, knowledge gain may be quite small May be possible to reuse test data under appropriate constraints to streamline assessment What may be appropriate constraints to balance data integrity & data collection? Discussion question: G2

Standalone Testing Performance of the device by itself
Establish Scoring Rule & Method Establish Truthing Rule & Method Establish Ground Truth Acquire Test Dataset Apply Scoring Statistical Analysis Apply CADe Device

Discussion questions: C2, L1
Ground Truth Ground truthing includes: Whether or not disease is present (patient level) Location and/or extent of the disease (lesion level) Types of ground truthing Cancerous lesions Biopsy/pathology (Follow-up imaging for normals) Non-cancerous lesions Expert panel reviews all available clinical information May be others Discussion questions: C2, L1

Ground Truth by Expert Panel
Experts almost always required to determine lesion locations May also determine if abnormality is present Experts are susceptible to reader variability Multiple readers allow measure of truth variability

Ground Truthing: Mammography
Patient-level Pathology verified cancer in left breast

Lesion-level Radiologist identifies region of lesion Clinician identified ROI

Lesion-level Radiologist segments region

Standalone Testing Performance of the device by itself
Establish Scoring Rule & Method Establish Truthing Rule & Method Establish Ground Truth Acquire Test Dataset Apply Scoring Statistical Analysis Apply CADe Device

Scoring Rules and Methods
Truth Segmentation Used to determine whether CAD marks a true lesion Overlap between CAD/truth Discussion questions: M1, C3, L2 CAD Segmentation

Scoring Rules and Methods
Truth Centroid Used to determine whether CAD marks a true lesion Distance between CAD/truth centroids Distance= 2.1 mm Scoring by a physician CAD Centroid

Standalone Performance Measures
Lesion-based sensitivity and number of FPs per image (or per scan) [Se, FPs/Image] Free Response Receiver Operating Characteristic (FROC) curve No. of FPs (per image) 1.0 2.0 3.0 4.0 0.0 0.2 0.4 0.6 0.8 TPF, sensitivity 5.0

Evaluating CAD Algorithms Clinical Testing Reader Performance Testing

Reader Performance Testing
Performance of physicians using the device Establish Scoring Rule & Method Establish Truthing Rule & Method Establish Ground Truth Discussion questions: M2, C4, L3 Apply Scoring Apply CADe Read w CADe Read w/o CADe Acquire Test Dataset Statistical Analysis Apply Scoring Identify Study Readers

Reader Selection Readers generally selected to be representative of intended users Representative of clinicians who will use device Representative of proper clinician experience level Reader performance testing depends on Proper understanding & using of the CAD device Proper understanding & implementation of study protocol Training of readers is a key to achieving both

Designing Reader Studies
Common endpoints Common CAD study designs

Evaluating CAD Algorithms Clinical Testing Study Endpoints

Study Endpoints Patient analysis [Sensitivity, Specificity] ROC analysis Location–specific analysis Location-specific ROC Free-response ROC (FROC) Discussion questions: M2, C4, L3

Patient Endpoints Assessing CADx Assessing CADe
Not accounting for location Identified Region CADx Aid CADx POM: 0.91 Clinician POM: 0.95

Patient-Based Endpoints
Patient analysis does not account for localizing the lesion Endpoints Binary decision (single threshold) [Sensitivity, Specificity] ([Se, Sp]) operating point Rating/ranking (range of thresholds) Receiver operating characteristic (ROC) curve

True Positive Fraction False Positive Fraction
Se/Sp Operating Point [Se, Sp] operating point Comparing without/with CAD Often Higher Se, Lower Sp Many other possible endpoints 1.0 Reader+CAD True Positive Fraction = Sensitivity Reader alone 0.0 0.0 1.0 False Positive Fraction = 1.0  Specificity

ROC Assessment Non-diseased cases Diseased cases Computer score

Single Operating Point
Single Threshold Non-diseased cases Single Operating Point TPF, sensitivity Diseased cases FPF, 1-specificity

Entire ROC Curve Non-diseased cases TPF, sensitivity Threshold Range
FPF, 1-specificity

ROC Analysis Comparing without/with CAD Often Higher Se, Lower Sp ROC can facilitate comparison Requires ordering cases from least to most suspicious Ratings often used to facilitate ordering 1.0 Reader+CAD True Positive Fraction = Sensitivity Reader alone 0.0 0.0 1.0 False Positive Fraction = 1.0  Specificity

ROC Analysis Performance metrics ROC area (AUC) Average TPF across all possible FPFs Partial area under the curve (PAUC) Challenge to link AUC measures to clinical relevance 1.0 AUC1 True Positive Fraction = Sensitivity AUC2 0.0 0.0 1.0 False Positive Fraction = 1.0  Specificity

Location-Specific Endpoints
CADe Device Assessing CADe Location is important Multiple prompt on the same image Truthing rule now is critical component

Location-Specific ROC
ROC analysis that requires correct location of the lesion One scored location per patient Location must be on the lesion

Location-Based Operating Points
[Se, No. FPs] operating point Comparing without/with CAD Often higher Se along with more FPs Many other possible endpoints 1.0 0.8 0.6 TPF, sensitivity 0.4 0.2 0.0 0.0 1.0 2.0 3.0 4.0 5.0 No. of FPs (per image)

Free-Response ROC All [Se, No. FPs] combinations All thresholds
1.0 0.8 0.6 TPF, sensitivity 0.4 0.2 0.0 0.0 1.0 2.0 3.0 4.0 5.0 No. of FPs (per image)

FROC Performance Metrics
Area under FROC curve Need to choose FP range Area under alternative FROC (AFROC) Challenges Link measures to clinical relevance Statistical methodology 1.0 0.8 0.6 TPF, sensitivity 0.4 0.2 0.0 0.0 1.0 2.0 3.0 4.0 5.0 No. of FPs (per image)

Evaluating CAD Algorithms Clinical Testing Reader Study Designs

Reader Performance Study Designs
Prospective studies Retrospective studies Some CAD study designs Warren-Burhenne MRMC Discussion questions: M4, C6, L5

Prospective Reader Studies
CAD performance measured as part of actual clinical practice Field testing of CAD devices

Retrospective Reader Studies
Cases are collected prior to image interpretation Typically enriched or stress test dataset used Read offline by one or more readers under specific reading conditions CAD Examples Mammography CAD devices Lung nodule CAD devices

Warren-Burhenne Study Design*
Two separate studies Retrospective study of CAD Se to detect abnormalities “missed” in clinical practice Estimated relative reduction in false negative (FN) rate with CAD Commonly a prospective study of the work-up rate of readers with & without CAD in clinical practice Difference in work-up rate is attributed to use of CAD Study design in early mammography CAD approvals *Warren Burhenne et al, Radiology 215: , 2000.

Warren-Burhenne Study Design
Fundamental limitation is that reduction in FN rate & increase in work-up rate are not being evaluated in same study Study design can be difficult to interpret statistically Study design goal is to estimate “potential” effect on the FN rate

Multiple Reader Multiple Case (MRMC) Study Design
Study where a set of readers interpret a set of patient images, in each of two competing reading conditions With and without CAD Could be either prospective or retrospective Fully-crossed design All readers read all of the cases in both modalities Most statistical power for given number of cases Hybrid designs are also evaluable

MRMC Study Design Advantages Generalizes to new readers & cases
Cases are random effects Readers are random effects Advantages Greater statistical power for given number of cases MRMC studies can accommodate [Se, Sp] endpoints ROC endpoints FROC endpoint MRMC studies are generally statistically interpretable

Patient-Based MRMC Analysis
Includes [Se, Sp] or ROC endpoints Well established methodologies & tools Jackknife/ANOVA Dorfman, Berbaum and Metz ANOVA and correlation model Obuchowski Ordinal regression Toledano and Gatsonis Bootstrap Beiden, Wagner, and Campbell One-shot estimate Gallas

Location-Based MRMC Analysis
Accounts for correct localization of lesions Statistical methodologies & tools are available Region-of-Interest ROC analysis Obuchowski et al., Rutter Divide patient data into ROIs (e.g., quadrant or lobe) Jackknife FROC (JAFROC) Chakraborty & Berbaum Bootstrap FROC analysis Samuelson and Petrick, Bornefalk and Hermansson

Evaluating CAD Algorithms Further Statistical Issues
Next Talk Evaluating CAD Algorithms Further Statistical Issues

Extra Slides

ROC and Operating Point
It is possible to obtain both a rating/ranking as well as action item within the same reader study Not necessarily just one or the other Examples Determine if patient should have workup Rate patient level of suspicion Rate level of suspicion for individual lesions Determine if individual lesions require workup

Example from literature
Jiang et. al, “Improving breast cancer diagnosis with computer-aided diagnosis,” Academic Radiology. 6(1):22-33, 1999. Authors studied ROC curves, ROC areas and [Se, Sp] operating point Characterization of microcalcifications Quasi-continuous ratings & action item

CAD Panel Meeting General CAD Methods Nicholas Petrick, Ph.D.

Similar presentations

Presentation on theme: "CAD Panel Meeting General CAD Methods Nicholas Petrick, Ph.D."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CAD Panel Meeting General CAD Methods Nicholas Petrick, Ph.D.

Similar presentations

Presentation on theme: "CAD Panel Meeting General CAD Methods Nicholas Petrick, Ph.D."— Presentation transcript:

Similar presentations

About project

Feedback