Presentation is loading. Please wait.

Presentation is loading. Please wait.

CAD Panel Meeting General CAD Methods Nicholas Petrick, Ph.D.

Similar presentations


Presentation on theme: "CAD Panel Meeting General CAD Methods Nicholas Petrick, Ph.D."— Presentation transcript:

1 CAD Panel Meeting General CAD Methods Nicholas Petrick, Ph.D.
Deputy Director, Division of Imaging and Applied Math, OSEL Radiological Devices Panel Meeting March 4, 2008

2 Outline What is CAD Basic components of CAD algorithms
Clinical implementations of CAD Evaluating CAD Algorithms Non-clinical testing Clinical testing Basic statistical tools

3 What is CAD?

4 What is CADe? CADe: Computer-aided detection devices Also termed CAD
Designed to identify findings (or regions) on an image that may be abnormal Prompting devices only

5 What is CADx? CADx: Computer-aided diagnosis Also termed CAD
Designed to process a specific finding (or region) to characterize the finding Likelihood of malignancy Recommended clinical action Describe the finding Helps physician determine what he/she is looking at 0.26 0.77 0.27

6 What is CADx? CADx: Computer-aided diagnosis Termed CAD also
Designed to process a specific finding (or region) to characterize the finding Likelihood of malignancy Recommended clinical action Describe the finding Helps physician determine what he/she is looking at B1 B4 B2

7 Artificial Intelligence
What is a CAD? Statistics Pattern Recognition Artificial Intelligence CAD Medicine Physics Biology Image Processing CAD encompasses many disciplines

8 Basic Blocks in CADe Algorithms
Image processing Segmentation Features/feature selection Classification Sequencing and block details differ between CADe algorithms Acquire Digital Data Image Processing Segmentation Features & Feature Selections Classification Annotation

9 Acquire Data Digital data can come from Digitized film
Acquire Digital Data Digital data can come from Digitized film Direct digital devices FFDM CT Many others Image Processing Segmentation Features & Feature Selections Annotation Classification Mass

10 Image Processing Image is enhanced or processed to facilitate analysis
Acquire Digital Data Image is enhanced or processed to facilitate analysis Image Processing Segmentation Features & Feature Selections Annotation Classification

11 Segmentation Identify boundaries or regions within the image
Acquire Digital Data Identify boundaries or regions within the image Lesion candidates Organs Image Processing Segmentation Features & Feature Selections Annotation Classification

12 Features Features Feature selection F1: Area F2: Perim
Acquire Digital Data Image Processing Segmentation Features & Feature Selections Classification Annotation Features Characterize regions or pixels within a dataset Shape Texture Curvature Feature selection Process for selecting informative features F1: Area F2: Perim

13 Classification Classification Classifier types
Acquire Digital Data Classification Features input to learning algorithm Combine into an output score Classifier types Multiple thresholds, LDA, Neural Network Training/Test paradigm critical Image Processing Segmentation Features & Feature Selections Annotation Classification F1 FN Trained Learning Machine Object Score

14 Threshold applied to object scores
Classification Acquire Digital Data Classification Features input to learning algorithm Combine into an output score Classifier types Multiple thresholds, LDA, Neural Network Training/Test paradigm critical Image Processing Segmentation Features & Feature Selections Annotation Classification Threshold applied to object scores

15 Annotation CADe Annotations Prompts of potential abnormalities
Acquire Digital Data CADe Annotations Prompts of potential abnormalities Image Processing Segmentation Features & Feature Selections Annotation Classification

16 Basic Blocks in a CADx Algorithm
Characterization of a finding Basic blocks are similar Image processing Features/feature selection Classification Sequencing and block details differ between CADx algorithms Identified Region Image Processing Features & Feature Selections Classification Annotation

17 Basic Blocks in a CADx Algorithm
Characterization of a finding Basic blocks are similar Image processing Features/feature selection Classification Sequencing and block details differ between CADx algorithms 0.79 0.91 0.66

18 Training CADs Process for systematically improving performance for a set of data known as the training set Maximize sensitivity Maximize area under ROC curve Training can be performed By computer Regression or optimization techniques By humans Tweak parameters or combination of parameters Algorithm fixed after training

19 No. of Patients per Class
Training CADs Learning Curve Training (learning) is a dynamic process Increasing training data Increases performance Decrease variability ROC Area No. of Patients per Class Learning curve 3 feature linear classifier

20 Clinical Use of CAD

21 Discussion questions: M6, C7, L6
CAD Reading Paradigms First reader Physician reviews only regions or findings marked by the CAD device Unmarked regions not necessarily evaluated by physician No radiological CAD device approved/cleared for this mode Discussion questions: M6, C7, L6

22 CAD Reading Paradigms Second reader
Physician first conducts a complete interpretation without CAD (unaided read) Then re-conducts an interpretation with the CAD device (aided read). Also termed “second detector” or “sequential reader” Example Mammography CADs Some lung CADs

23 CAD Reading Paradigms Concurrent read
Physician performs a complete interpretation in the presence of CAD marks CAD marks are available at any time Examples Some colon CAD devices are potentially used in this way

24 CAD Factors Influencing Clinical Use
Physical characteristics of mark Physicians may respond differently to different types of marks* CAD standalone performance Number of CAD marks Knowledge of Se & FP rate may affect user confidence in or attention to CAD marks Change in interpretation Change in reading time Increase review time Maintain/decrease review time *EA Krupinski et al., A Perceptually Based Method for Enhancing Pulmonary Nodule Recognition. 28(4) Investigative Radiology 289 (1993).

25 Evaluating CAD Algorithms Non-clinical Evaluation

26 Non-Clinical Evaluation
Device & algorithm descriptions Stability analysis

27 Non-Clinical Evaluation
Algorithm description Different CAD devices contain different processing Easier to assess/compare if devices are not “blackboxes” To understand a CAD the following info is needed Patients targeted by device Device usage (e.g., reading mode, etc) Image processing, segmentation, etc Features, classifiers, etc Training & training data, etc Discussion question: G1

28 Discussion question: G1
Algorithm Stability Stable algorithm Similar performance with changes in algorithm, features, training, or training databases Stability increases as No. of training cases increases No. of initial features decreases Complexity of the CAD decreases Discussion question: G1

29 Why Stability Analysis?
Indicates if performance due to fortuitous training/test set Algorithm updates produce evolving performance *Example only: Not an actual device More Stable Training CI

30 Why Stability Analysis?
Indicates if performance due to fortuitous training/test set Algorithm updates produce evolving performance *Example only: Not an actual device Less Stable Training CI

31 Evaluating CAD Algorithms Clinical Testing

32 Hierarchical Model of Efficacy*
Level 1 Technical efficacy Physical & bench tests Level 2 Diagnostic accuracy Se/Sp, ROC curve, etc Level 3 Diagnostic thinking Effect on clinicians’ estimates of diagnostic probabilities, pretest to posttest Level 4 Therapeutic efficacy Effect on therapeutic management Level 5 Patient outcome Value in terms of quality-adjusted life years (QALYs), etc. Level 6 Societal efficacy Overall societal benefit *Fryback, Thornbury, “The efficacy of diagnostic imaging,” Med Decis Making 11:88–94, 1991.

33 Hierarchical Model of Efficacy*
Levels imaging technology sponsors generally focus when going through FDA Sponsors & FDA are not constrained to these levels Level 1 Technical efficacy Physical & bench tests Level 2 Diagnostic accuracy Se/Sp, ROC curve, etc Level 3 Diagnostic thinking Effect on clinicians’ estimates of diagnostic probabilities, pretest to posttest Level 4 Therapeutic efficacy Effect on therapeutic management Level 5 Patient outcome Value in terms of quality-adjusted life years (QALYs), etc. Level 6 Societal efficacy Overall societal benefit *Fryback, Thornbury, “The efficacy of diagnostic imaging,” Med Decis Making 11:88–94, 1991.

34 Classes of Tests Standalone performance testing
Performance of the device by itself Intrinsic functionality of the device Reader performance testing Performance of physicians using the device Impact on physician performance

35 Discussion questions: M1, C3, L2
Standalone Testing Performance of the device by itself Establish Scoring Rule & Method Discussion questions: M1, C3, L2 Establish Truthing Rule & Method Establish Ground Truth Apply CADe Device Acquire Test Dataset Apply Scoring Statistical Analysis

36 Discussion questions: M1, M4, C3, C6, L2, L5
Test Dataset Clinical images used to determine safety and effectiveness of a CAD Different from set used to train/develop or validate CAD Represents target population & target disease condition Usually includes clinically relevant spectrum of patients, imaging hardware & protocols Discussion questions: M1, M4, C3, C6, L2, L5

37 Acquiring Test Dataset
Field test accrual Collection during real-time clinical interpretation Enrichment accrual Enrichment for low prevalence of disease Enrich with disease cases at a higher proportion than in population Enrichment for stress testing Enrich with cases containing challenging findings Stress testing usually includes a comparison modality

38 Reuse of Test Data Ideal testing paradigm Develop CAD algorithm
Collect testing cases Apply CAD Report standalone and/or reader performance results G2

39 Discussion question: G2
Reuse of Test Data Sponsor may want to compare performance of revised algorithm with same or expanded version of test cases Developer may have gained knowledge (learned) by knowing performance of original CAD on test data For larger datasets and minimal feedback, knowledge gain may be quite small May be possible to reuse test data under appropriate constraints to streamline assessment What may be appropriate constraints to balance data integrity & data collection? Discussion question: G2

40 Standalone Testing Performance of the device by itself
Establish Scoring Rule & Method Establish Truthing Rule & Method Establish Ground Truth Acquire Test Dataset Apply Scoring Statistical Analysis Apply CADe Device

41 Discussion questions: C2, L1
Ground Truth Ground truthing includes: Whether or not disease is present (patient level) Location and/or extent of the disease (lesion level) Types of ground truthing Cancerous lesions Biopsy/pathology (Follow-up imaging for normals) Non-cancerous lesions Expert panel reviews all available clinical information May be others Discussion questions: C2, L1

42 Ground Truth by Expert Panel
Experts almost always required to determine lesion locations May also determine if abnormality is present Experts are susceptible to reader variability Multiple readers allow measure of truth variability

43 Ground Truthing: Mammography
Patient-level Pathology verified cancer in left breast

44 Ground Truthing: Mammography
Lesion-level Radiologist identifies region of lesion Clinician identified ROI

45 Ground Truthing: Mammography
Lesion-level Radiologist segments region

46 Standalone Testing Performance of the device by itself
Establish Scoring Rule & Method Establish Truthing Rule & Method Establish Ground Truth Acquire Test Dataset Apply Scoring Statistical Analysis Apply CADe Device

47 Scoring Rules and Methods
Truth Segmentation Used to determine whether CAD marks a true lesion Overlap between CAD/truth Discussion questions: M1, C3, L2 CAD Segmentation

48 Scoring Rules and Methods
Truth Centroid Used to determine whether CAD marks a true lesion Distance between CAD/truth centroids Distance= 2.1 mm Scoring by a physician CAD Centroid

49 Standalone Performance Measures
Lesion-based sensitivity and number of FPs per image (or per scan) [Se, FPs/Image] Free Response Receiver Operating Characteristic (FROC) curve No. of FPs (per image) 1.0 2.0 3.0 4.0 0.0 0.2 0.4 0.6 0.8 TPF, sensitivity 5.0

50 Evaluating CAD Algorithms Clinical Testing Reader Performance Testing

51 Reader Performance Testing
Performance of physicians using the device Establish Scoring Rule & Method Establish Truthing Rule & Method Establish Ground Truth Discussion questions: M2, C4, L3 Apply Scoring Apply CADe Read w CADe Read w/o CADe Acquire Test Dataset Statistical Analysis Apply Scoring Identify Study Readers

52 Reader Selection Readers generally selected to be representative of intended users Representative of clinicians who will use device Representative of proper clinician experience level Reader performance testing depends on Proper understanding & using of the CAD device Proper understanding & implementation of study protocol Training of readers is a key to achieving both

53 Designing Reader Studies
Common endpoints Common CAD study designs

54 Evaluating CAD Algorithms Clinical Testing Study Endpoints

55 Discussion questions: M2, C4, L3
Study Endpoints Patient analysis [Sensitivity, Specificity] ROC analysis Location–specific analysis Location-specific ROC Free-response ROC (FROC) Discussion questions: M2, C4, L3

56 Patient Endpoints Assessing CADx Assessing CADe
Not accounting for location Identified Region CADx Aid CADx POM: 0.91 Clinician POM: 0.95

57 Patient-Based Endpoints
Patient analysis does not account for localizing the lesion Endpoints Binary decision (single threshold) [Sensitivity, Specificity] ([Se, Sp]) operating point Rating/ranking (range of thresholds) Receiver operating characteristic (ROC) curve

58 True Positive Fraction False Positive Fraction
Se/Sp Operating Point [Se, Sp] operating point Comparing without/with CAD Often Higher Se, Lower Sp Many other possible endpoints 1.0 Reader+CAD True Positive Fraction = Sensitivity Reader alone 0.0 0.0 1.0 False Positive Fraction = 1.0  Specificity

59 ROC Assessment Non-diseased cases Diseased cases Computer score

60 Single Operating Point
Single Threshold Non-diseased cases Single Operating Point TPF, sensitivity Diseased cases FPF, 1-specificity

61 Entire ROC Curve Non-diseased cases TPF, sensitivity Threshold Range
FPF, 1-specificity

62 True Positive Fraction False Positive Fraction
ROC Analysis Comparing without/with CAD Often Higher Se, Lower Sp ROC can facilitate comparison Requires ordering cases from least to most suspicious Ratings often used to facilitate ordering 1.0 Reader+CAD True Positive Fraction = Sensitivity Reader alone 0.0 0.0 1.0 False Positive Fraction = 1.0  Specificity

63 True Positive Fraction False Positive Fraction
ROC Analysis Performance metrics ROC area (AUC) Average TPF across all possible FPFs Partial area under the curve (PAUC) Challenge to link AUC measures to clinical relevance 1.0 AUC1 True Positive Fraction = Sensitivity AUC2 0.0 0.0 1.0 False Positive Fraction = 1.0  Specificity

64 Location-Specific Endpoints
CADe Device Assessing CADe Location is important Multiple prompt on the same image Truthing rule now is critical component

65 Location-Specific ROC
ROC analysis that requires correct location of the lesion One scored location per patient Location must be on the lesion

66 Location-Based Operating Points
[Se, No. FPs] operating point Comparing without/with CAD Often higher Se along with more FPs Many other possible endpoints 1.0 0.8 0.6 TPF, sensitivity 0.4 0.2 0.0 0.0 1.0 2.0 3.0 4.0 5.0 No. of FPs (per image)

67 Free-Response ROC All [Se, No. FPs] combinations All thresholds
1.0 0.8 0.6 TPF, sensitivity 0.4 0.2 0.0 0.0 1.0 2.0 3.0 4.0 5.0 No. of FPs (per image)

68 FROC Performance Metrics
Area under FROC curve Need to choose FP range Area under alternative FROC (AFROC) Challenges Link measures to clinical relevance Statistical methodology 1.0 0.8 0.6 TPF, sensitivity 0.4 0.2 0.0 0.0 1.0 2.0 3.0 4.0 5.0 No. of FPs (per image)

69 Evaluating CAD Algorithms Clinical Testing Reader Study Designs

70 Reader Performance Study Designs
Prospective studies Retrospective studies Some CAD study designs Warren-Burhenne MRMC Discussion questions: M4, C6, L5

71 Prospective Reader Studies
CAD performance measured as part of actual clinical practice Field testing of CAD devices

72 Retrospective Reader Studies
Cases are collected prior to image interpretation Typically enriched or stress test dataset used Read offline by one or more readers under specific reading conditions CAD Examples Mammography CAD devices Lung nodule CAD devices

73 Warren-Burhenne Study Design*
Two separate studies Retrospective study of CAD Se to detect abnormalities “missed” in clinical practice Estimated relative reduction in false negative (FN) rate with CAD Commonly a prospective study of the work-up rate of readers with & without CAD in clinical practice Difference in work-up rate is attributed to use of CAD Study design in early mammography CAD approvals *Warren Burhenne et al, Radiology 215: , 2000.

74 Warren-Burhenne Study Design
Fundamental limitation is that reduction in FN rate & increase in work-up rate are not being evaluated in same study Study design can be difficult to interpret statistically Study design goal is to estimate “potential” effect on the FN rate

75 Multiple Reader Multiple Case (MRMC) Study Design
Study where a set of readers interpret a set of patient images, in each of two competing reading conditions With and without CAD Could be either prospective or retrospective Fully-crossed design All readers read all of the cases in both modalities Most statistical power for given number of cases Hybrid designs are also evaluable

76 MRMC Study Design Advantages Generalizes to new readers & cases
Cases are random effects Readers are random effects Advantages Greater statistical power for given number of cases MRMC studies can accommodate [Se, Sp] endpoints ROC endpoints FROC endpoint MRMC studies are generally statistically interpretable

77 Patient-Based MRMC Analysis
Includes [Se, Sp] or ROC endpoints Well established methodologies & tools Jackknife/ANOVA Dorfman, Berbaum and Metz ANOVA and correlation model Obuchowski Ordinal regression Toledano and Gatsonis Bootstrap Beiden, Wagner, and Campbell One-shot estimate Gallas

78 Location-Based MRMC Analysis
Accounts for correct localization of lesions Statistical methodologies & tools are available Region-of-Interest ROC analysis Obuchowski et al., Rutter Divide patient data into ROIs (e.g., quadrant or lobe) Jackknife FROC (JAFROC) Chakraborty & Berbaum Bootstrap FROC analysis Samuelson and Petrick, Bornefalk and Hermansson

79 Evaluating CAD Algorithms Further Statistical Issues
Next Talk Evaluating CAD Algorithms Further Statistical Issues

80 Extra Slides

81 ROC and Operating Point
It is possible to obtain both a rating/ranking as well as action item within the same reader study Not necessarily just one or the other Examples Determine if patient should have workup Rate patient level of suspicion Rate level of suspicion for individual lesions Determine if individual lesions require workup

82 Example from literature
Jiang et. al, “Improving breast cancer diagnosis with computer-aided diagnosis,” Academic Radiology. 6(1):22-33, 1999. Authors studied ROC curves, ROC areas and [Se, Sp] operating point Characterization of microcalcifications Quasi-continuous ratings & action item

83


Download ppt "CAD Panel Meeting General CAD Methods Nicholas Petrick, Ph.D."

Similar presentations


Ads by Google