CAD Panel Meeting General CAD Methods Nicholas Petrick, Ph.D.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Donald T. Simeon Caribbean Health Research Council
Design of Experiments Lecture I
Chapter 4 Pattern Recognition Concepts: Introduction & ROC Analysis.
Understanding Statistics in Research Articles Elizabeth Crabtree, MPH, PhD (c) Director of Evidence-Based Practice, Quality Management Assistant Professor,
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.
1 FDA Radiological Devices Panel Meeting March 4-5, 2008 Mammography CAD Devices Robert C. Smith, MD, JD Medical Officer (Radiologist) Division of Reproductive,
Department of Biomedical Informatics 1 APIII Slide 1 Use of a ‘Mathematical Microscope’ to Understand Radiologists’ Errors in Breast Cancer Detection Claudia.
Critically Evaluating the Evidence: diagnosis, prognosis, and screening Elizabeth Crabtree, MPH, PhD (c) Director of Evidence-Based Practice, Quality Management.
Giger, FDA 2009 Accepting CAD for Clinical Practice Maryellen L. Giger, Ph.D., FAAPM Professor & Vice-Chair for Basic Science Research Department of Radiology.
ACR and SBI Statement Margarita Zuley, MD Associate Professor, Radiology Medical Director, Breast Imaging University of Pittsburgh.
COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –
Computer Aided Diagnosis: CAD overview
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Chapter 4 Validity.
WRSTA, 13 August, 2006 Rough Sets in Hybrid Intelligent Systems For Breast Cancer Detection By Aboul Ella Hassanien Cairo University, Faculty of Computer.
A Computer Aided Detection System For Digital Mammograms Based on Radial Basis Functions and Feature Extraction Techniques By Mohammed Jirari Shanghai,
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
A 3D Approach for Computer-Aided Liver Lesion Detection Reed Tompkins DePaul Medix Program 2008 Mentor: Kenji Suzuki, Ph.D. Special Thanks to Edmund Ng.
Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology.
For internal use only / Copyright © Siemens AG All rights reserved. Multiple-instance learning improves CAD detection of masses in digital mammography.
Chapter 11 Integration Information Instructor: Prof. G. Bebis Represented by Reza Fall 2005.
Automatic Detection And Classification Of Microcalcifications In Digital Mammograms Institute for Brain and Neural Systems Brown University Providence.
Thoughts on Biomarker Discovery and Validation Karla Ballman, Ph.D. Division of Biostatistics October 29, 2007.
12/10/02Harry Bushar1 Computerized Thermal Imaging Breast Cancer System 2100 (CTI BCS2100) Radiological Devices Advisory Panel December 10, 2002 Statistical.
1 Telba Irony, Ph.D. Mathematical Statistician Division of Biostatistics Statistical Analysis of InFUSE  Bone Graft/LT-Cage Lumbar Tapered Fusion Device.
1 History and Lessons from FDA Regulation of Digital Radiology Kyle J. Myers, Ph.D. Division of Imaging and Applied Mathematics OSEL/CDRH/FDA October 22,
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Multiple Choice Questions for discussion
A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns M.A.Sc. Candidate: Qianren (Tim) Xu The.
1 R2 ImageChecker CT CAD PMA: Clinical Results Nicholas Petrick, Ph.D. Office of Science and Technology Center for Devices and Radiological Health U.S.
Computers in Healthcare Jinbo Bi Department of Computer Science and Engineering Connecticut Institute for Clinical and Translational Research University.
EDRN Approaches to Biomarker Validation DMCC Statisticians Fred Hutchinson Cancer Research Center Margaret Pepe Ziding Feng, Mark Thornquist, Yingye Zheng,
November 18, CAD Panel Meeting Statistical Issues in CADe Evaluations Thomas E. Gwise, Ph.D. Mathematical Statistician / Acting Team Leader Division.
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
CpSc 810: Machine Learning Evaluation of Classifier.
CT image testing. What is a CT image? CT= computed tomography CT= computed tomography Examines a person in “slices” Examines a person in “slices” Creates.
2/3/04Sacks1 Clinical Description William Sacks, PhD, MD—ODE/CDRH Clinical Description William Sacks, PhD, MD—ODE/CDRH R2 Technology, Inc. ImageChecker.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
Evidence-Based Medicine Diagnosis Component 2 / Unit 5 1 Health IT Workforce Curriculum Version 1.0 /Fall 2010.
1 Risk Assessment Tests Marina Kondratovich, Ph.D. OIVD/CDRH/FDA March 9, 2011 Molecular and Clinical Genetics Panel for Direct-to-Consumer (DTC) Genetic.
Face Detection Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Prostate Cancer CAD Michael Feldman, MD, PhD Assistant Professor Pathology University Pennsylvania.
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
Unit 15: Screening. Unit 15 Learning Objectives: 1.Understand the role of screening in the secondary prevention of disease. 2.Recognize the characteristics.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
EBM --- Journal Reading Presenter :呂宥達 Date : 2005/10/27.
10 May Understanding diagnostic tests Evan Sergeant AusVet Animal Health Services.
Some Difficult Decisions are Easier without Computer Support / TA Mammography, RT Diversity / Andrey A. Povyakalo (work together with E Alberdi, L Strigini.
1 MITA Observations On Draft CADe Guidances Released by FDA October 21, 2009.
EVALUATING u After retrieving the literature, you have to evaluate or critically appraise the evidence for its validity and applicability to your patient.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Radiological Devices Advisory Panel Meeting Radiological Devices Advisory Panel Meeting Computer-Assisted Detection Devices Panel Questions Radiological.
Developing outcome prediction models for acute intracerebral hemorrhage patients: evaluation of a Support Vector Machine based method A. Jakab 1, L. Lánczi.
12/10/02Sacks - Clinical Assessment1 Clinical Assessment – Part II William Sacks, PhD, MD Clinical Assessment – Part II William Sacks, PhD, MD COMPUTERIZED.
國立雲林科技大學 National Yunlin University of Science and Technology Intelligent Database Systems Lab 1 Self-organizing map for cluster analysis of a breast cancer.
Radiology Advisory Panel Meeting Radiology Advisory Panel Meeting Computer-Assisted Detection (CADe) Devices Joyce M. Whang Deputy Division Director Radiological.
Screening Tests: A Review. Learning Objectives: 1.Understand the role of screening in the secondary prevention of disease. 2.Recognize the characteristics.
Module 8 Guidelines for evaluating the SDGs through an equity focused and gender responsive lens: Overview Technical Assistance on Evaluating SDGs: Leave.
On Draft CADe Guidances Released by FDA October 21, 2009
Robust Lung Nodule Classification using 2
CS 698 | Current Topics in Data Science
Diagnosis II Dr. Brent E. Faught, Ph.D. Assistant Professor
Computerized Decision Support for Medical Imaging
Evidence Based Diagnosis
Presentation transcript:

CAD Panel Meeting General CAD Methods Nicholas Petrick, Ph.D. Deputy Director, Division of Imaging and Applied Math, OSEL Radiological Devices Panel Meeting March 4, 2008

Outline What is CAD Basic components of CAD algorithms Clinical implementations of CAD Evaluating CAD Algorithms Non-clinical testing Clinical testing Basic statistical tools

What is CAD?

What is CADe? CADe: Computer-aided detection devices Also termed CAD Designed to identify findings (or regions) on an image that may be abnormal Prompting devices only

What is CADx? CADx: Computer-aided diagnosis Also termed CAD Designed to process a specific finding (or region) to characterize the finding Likelihood of malignancy Recommended clinical action Describe the finding Helps physician determine what he/she is looking at 0.26 0.77 0.27

What is CADx? CADx: Computer-aided diagnosis Termed CAD also Designed to process a specific finding (or region) to characterize the finding Likelihood of malignancy Recommended clinical action Describe the finding Helps physician determine what he/she is looking at B1 B4 B2

Artificial Intelligence What is a CAD? Statistics Pattern Recognition Artificial Intelligence CAD Medicine Physics Biology Image Processing CAD encompasses many disciplines

Basic Blocks in CADe Algorithms Image processing Segmentation Features/feature selection Classification Sequencing and block details differ between CADe algorithms Acquire Digital Data Image Processing Segmentation Features & Feature Selections Classification Annotation

Acquire Data Digital data can come from Digitized film Acquire Digital Data Digital data can come from Digitized film Direct digital devices FFDM CT Many others Image Processing Segmentation Features & Feature Selections Annotation Classification Mass

Image Processing Image is enhanced or processed to facilitate analysis Acquire Digital Data Image is enhanced or processed to facilitate analysis Image Processing Segmentation Features & Feature Selections Annotation Classification

Segmentation Identify boundaries or regions within the image Acquire Digital Data Identify boundaries or regions within the image Lesion candidates Organs Image Processing Segmentation Features & Feature Selections Annotation Classification

Features Features Feature selection F1: Area F2: Perim Acquire Digital Data Image Processing Segmentation Features & Feature Selections Classification Annotation Features Characterize regions or pixels within a dataset Shape Texture Curvature … Feature selection Process for selecting informative features F1: Area F2: Perim

Classification Classification Classifier types Acquire Digital Data Classification Features input to learning algorithm Combine into an output score Classifier types Multiple thresholds, LDA, Neural Network Training/Test paradigm critical Image Processing Segmentation Features & Feature Selections Annotation Classification F1 FN Trained Learning Machine Object Score

Threshold applied to object scores Classification Acquire Digital Data Classification Features input to learning algorithm Combine into an output score Classifier types Multiple thresholds, LDA, Neural Network Training/Test paradigm critical Image Processing Segmentation Features & Feature Selections Annotation Classification Threshold applied to object scores

Annotation CADe Annotations Prompts of potential abnormalities Acquire Digital Data CADe Annotations Prompts of potential abnormalities Image Processing Segmentation Features & Feature Selections Annotation Classification

Basic Blocks in a CADx Algorithm Characterization of a finding Basic blocks are similar Image processing Features/feature selection Classification Sequencing and block details differ between CADx algorithms Identified Region Image Processing Features & Feature Selections Classification Annotation

Basic Blocks in a CADx Algorithm Characterization of a finding Basic blocks are similar Image processing Features/feature selection Classification Sequencing and block details differ between CADx algorithms 0.79 0.91 0.66

Training CADs Process for systematically improving performance for a set of data known as the training set Maximize sensitivity Maximize area under ROC curve Training can be performed By computer Regression or optimization techniques By humans Tweak parameters or combination of parameters Algorithm fixed after training

No. of Patients per Class Training CADs Learning Curve Training (learning) is a dynamic process Increasing training data Increases performance Decrease variability ROC Area No. of Patients per Class Learning curve 3 feature linear classifier

Clinical Use of CAD

Discussion questions: M6, C7, L6 CAD Reading Paradigms First reader Physician reviews only regions or findings marked by the CAD device Unmarked regions not necessarily evaluated by physician No radiological CAD device approved/cleared for this mode Discussion questions: M6, C7, L6

CAD Reading Paradigms Second reader Physician first conducts a complete interpretation without CAD (unaided read) Then re-conducts an interpretation with the CAD device (aided read). Also termed “second detector” or “sequential reader” Example Mammography CADs Some lung CADs

CAD Reading Paradigms Concurrent read Physician performs a complete interpretation in the presence of CAD marks CAD marks are available at any time Examples Some colon CAD devices are potentially used in this way

CAD Factors Influencing Clinical Use Physical characteristics of mark Physicians may respond differently to different types of marks* CAD standalone performance Number of CAD marks Knowledge of Se & FP rate may affect user confidence in or attention to CAD marks Change in interpretation Change in reading time Increase review time Maintain/decrease review time *EA Krupinski et al., A Perceptually Based Method for Enhancing Pulmonary Nodule Recognition. 28(4) Investigative Radiology 289 (1993).

Evaluating CAD Algorithms Non-clinical Evaluation

Non-Clinical Evaluation Device & algorithm descriptions Stability analysis

Non-Clinical Evaluation Algorithm description Different CAD devices contain different processing Easier to assess/compare if devices are not “blackboxes” To understand a CAD the following info is needed Patients targeted by device Device usage (e.g., reading mode, etc) Image processing, segmentation, etc Features, classifiers, etc Training & training data, etc Discussion question: G1

Discussion question: G1 Algorithm Stability Stable algorithm Similar performance with changes in algorithm, features, training, or training databases Stability increases as No. of training cases increases No. of initial features decreases Complexity of the CAD decreases Discussion question: G1

Why Stability Analysis? Indicates if performance due to fortuitous training/test set Algorithm updates produce evolving performance *Example only: Not an actual device More Stable Training CI

Why Stability Analysis? Indicates if performance due to fortuitous training/test set Algorithm updates produce evolving performance *Example only: Not an actual device Less Stable Training CI

Evaluating CAD Algorithms Clinical Testing

Hierarchical Model of Efficacy* Level 1 Technical efficacy Physical & bench tests Level 2 Diagnostic accuracy Se/Sp, ROC curve, etc Level 3 Diagnostic thinking Effect on clinicians’ estimates of diagnostic probabilities, pretest to posttest Level 4 Therapeutic efficacy Effect on therapeutic management Level 5 Patient outcome Value in terms of quality-adjusted life years (QALYs), etc. Level 6 Societal efficacy Overall societal benefit *Fryback, Thornbury, “The efficacy of diagnostic imaging,” Med Decis Making 11:88–94, 1991.

Hierarchical Model of Efficacy* Levels imaging technology sponsors generally focus when going through FDA Sponsors & FDA are not constrained to these levels Level 1 Technical efficacy Physical & bench tests Level 2 Diagnostic accuracy Se/Sp, ROC curve, etc Level 3 Diagnostic thinking Effect on clinicians’ estimates of diagnostic probabilities, pretest to posttest Level 4 Therapeutic efficacy Effect on therapeutic management Level 5 Patient outcome Value in terms of quality-adjusted life years (QALYs), etc. Level 6 Societal efficacy Overall societal benefit *Fryback, Thornbury, “The efficacy of diagnostic imaging,” Med Decis Making 11:88–94, 1991.

Classes of Tests Standalone performance testing Performance of the device by itself Intrinsic functionality of the device Reader performance testing Performance of physicians using the device Impact on physician performance

Discussion questions: M1, C3, L2 Standalone Testing Performance of the device by itself Establish Scoring Rule & Method Discussion questions: M1, C3, L2 Establish Truthing Rule & Method Establish Ground Truth Apply CADe Device Acquire Test Dataset Apply Scoring Statistical Analysis

Discussion questions: M1, M4, C3, C6, L2, L5 Test Dataset Clinical images used to determine safety and effectiveness of a CAD Different from set used to train/develop or validate CAD Represents target population & target disease condition Usually includes clinically relevant spectrum of patients, imaging hardware & protocols Discussion questions: M1, M4, C3, C6, L2, L5

Acquiring Test Dataset Field test accrual Collection during real-time clinical interpretation Enrichment accrual Enrichment for low prevalence of disease Enrich with disease cases at a higher proportion than in population Enrichment for stress testing Enrich with cases containing challenging findings Stress testing usually includes a comparison modality

Reuse of Test Data Ideal testing paradigm Develop CAD algorithm Collect testing cases Apply CAD Report standalone and/or reader performance results G2

Discussion question: G2 Reuse of Test Data Sponsor may want to compare performance of revised algorithm with same or expanded version of test cases Developer may have gained knowledge (learned) by knowing performance of original CAD on test data For larger datasets and minimal feedback, knowledge gain may be quite small May be possible to reuse test data under appropriate constraints to streamline assessment What may be appropriate constraints to balance data integrity & data collection? Discussion question: G2

Standalone Testing Performance of the device by itself Establish Scoring Rule & Method Establish Truthing Rule & Method Establish Ground Truth Acquire Test Dataset Apply Scoring Statistical Analysis Apply CADe Device

Discussion questions: C2, L1 Ground Truth Ground truthing includes: Whether or not disease is present (patient level) Location and/or extent of the disease (lesion level) Types of ground truthing Cancerous lesions Biopsy/pathology (Follow-up imaging for normals) Non-cancerous lesions Expert panel reviews all available clinical information May be others Discussion questions: C2, L1

Ground Truth by Expert Panel Experts almost always required to determine lesion locations May also determine if abnormality is present Experts are susceptible to reader variability Multiple readers allow measure of truth variability

Ground Truthing: Mammography Patient-level Pathology verified cancer in left breast

Ground Truthing: Mammography Lesion-level Radiologist identifies region of lesion Clinician identified ROI

Ground Truthing: Mammography Lesion-level Radiologist segments region

Standalone Testing Performance of the device by itself Establish Scoring Rule & Method Establish Truthing Rule & Method Establish Ground Truth Acquire Test Dataset Apply Scoring Statistical Analysis Apply CADe Device

Scoring Rules and Methods Truth Segmentation Used to determine whether CAD marks a true lesion Overlap between CAD/truth Discussion questions: M1, C3, L2 CAD Segmentation

Scoring Rules and Methods Truth Centroid Used to determine whether CAD marks a true lesion Distance between CAD/truth centroids Distance= 2.1 mm Scoring by a physician CAD Centroid

Standalone Performance Measures Lesion-based sensitivity and number of FPs per image (or per scan) [Se, FPs/Image] Free Response Receiver Operating Characteristic (FROC) curve No. of FPs (per image) 1.0 2.0 3.0 4.0 0.0 0.2 0.4 0.6 0.8 TPF, sensitivity 5.0

Evaluating CAD Algorithms Clinical Testing Reader Performance Testing

Reader Performance Testing Performance of physicians using the device Establish Scoring Rule & Method Establish Truthing Rule & Method Establish Ground Truth Discussion questions: M2, C4, L3 Apply Scoring Apply CADe Read w CADe Read w/o CADe Acquire Test Dataset Statistical Analysis Apply Scoring Identify Study Readers

Reader Selection Readers generally selected to be representative of intended users Representative of clinicians who will use device Representative of proper clinician experience level Reader performance testing depends on Proper understanding & using of the CAD device Proper understanding & implementation of study protocol Training of readers is a key to achieving both

Designing Reader Studies Common endpoints Common CAD study designs

Evaluating CAD Algorithms Clinical Testing Study Endpoints

Discussion questions: M2, C4, L3 Study Endpoints Patient analysis [Sensitivity, Specificity] ROC analysis Location–specific analysis Location-specific ROC Free-response ROC (FROC) Discussion questions: M2, C4, L3

Patient Endpoints Assessing CADx Assessing CADe Not accounting for location Identified Region CADx Aid CADx POM: 0.91 Clinician POM: 0.95

Patient-Based Endpoints Patient analysis does not account for localizing the lesion Endpoints Binary decision (single threshold) [Sensitivity, Specificity] ([Se, Sp]) operating point Rating/ranking (range of thresholds) Receiver operating characteristic (ROC) curve

True Positive Fraction False Positive Fraction Se/Sp Operating Point [Se, Sp] operating point Comparing without/with CAD Often Higher Se, Lower Sp Many other possible endpoints 1.0 Reader+CAD True Positive Fraction = Sensitivity Reader alone 0.0 0.0 1.0 False Positive Fraction = 1.0  Specificity

ROC Assessment Non-diseased cases Diseased cases Computer score

Single Operating Point Single Threshold Non-diseased cases Single Operating Point TPF, sensitivity Diseased cases FPF, 1-specificity

Entire ROC Curve Non-diseased cases TPF, sensitivity Threshold Range FPF, 1-specificity

True Positive Fraction False Positive Fraction ROC Analysis Comparing without/with CAD Often Higher Se, Lower Sp ROC can facilitate comparison Requires ordering cases from least to most suspicious Ratings often used to facilitate ordering 1.0 Reader+CAD True Positive Fraction = Sensitivity Reader alone 0.0 0.0 1.0 False Positive Fraction = 1.0  Specificity

True Positive Fraction False Positive Fraction ROC Analysis Performance metrics ROC area (AUC) Average TPF across all possible FPFs Partial area under the curve (PAUC) Challenge to link AUC measures to clinical relevance 1.0 AUC1 True Positive Fraction = Sensitivity AUC2 0.0 0.0 1.0 False Positive Fraction = 1.0  Specificity

Location-Specific Endpoints CADe Device Assessing CADe Location is important Multiple prompt on the same image Truthing rule now is critical component

Location-Specific ROC ROC analysis that requires correct location of the lesion One scored location per patient Location must be on the lesion

Location-Based Operating Points [Se, No. FPs] operating point Comparing without/with CAD Often higher Se along with more FPs Many other possible endpoints 1.0 0.8 0.6 TPF, sensitivity 0.4 0.2 0.0 0.0 1.0 2.0 3.0 4.0 5.0 No. of FPs (per image)

Free-Response ROC All [Se, No. FPs] combinations All thresholds 1.0 0.8 0.6 TPF, sensitivity 0.4 0.2 0.0 0.0 1.0 2.0 3.0 4.0 5.0 No. of FPs (per image)

FROC Performance Metrics Area under FROC curve Need to choose FP range Area under alternative FROC (AFROC) Challenges Link measures to clinical relevance Statistical methodology 1.0 0.8 0.6 TPF, sensitivity 0.4 0.2 0.0 0.0 1.0 2.0 3.0 4.0 5.0 No. of FPs (per image)

Evaluating CAD Algorithms Clinical Testing Reader Study Designs

Reader Performance Study Designs Prospective studies Retrospective studies Some CAD study designs Warren-Burhenne MRMC Discussion questions: M4, C6, L5

Prospective Reader Studies CAD performance measured as part of actual clinical practice Field testing of CAD devices

Retrospective Reader Studies Cases are collected prior to image interpretation Typically enriched or stress test dataset used Read offline by one or more readers under specific reading conditions CAD Examples Mammography CAD devices Lung nodule CAD devices

Warren-Burhenne Study Design* Two separate studies Retrospective study of CAD Se to detect abnormalities “missed” in clinical practice Estimated relative reduction in false negative (FN) rate with CAD Commonly a prospective study of the work-up rate of readers with & without CAD in clinical practice Difference in work-up rate is attributed to use of CAD Study design in early mammography CAD approvals *Warren Burhenne et al, Radiology 215:554-562, 2000.

Warren-Burhenne Study Design Fundamental limitation is that reduction in FN rate & increase in work-up rate are not being evaluated in same study Study design can be difficult to interpret statistically Study design goal is to estimate “potential” effect on the FN rate

Multiple Reader Multiple Case (MRMC) Study Design Study where a set of readers interpret a set of patient images, in each of two competing reading conditions With and without CAD Could be either prospective or retrospective Fully-crossed design All readers read all of the cases in both modalities Most statistical power for given number of cases Hybrid designs are also evaluable

MRMC Study Design Advantages Generalizes to new readers & cases Cases are random effects Readers are random effects Advantages Greater statistical power for given number of cases MRMC studies can accommodate [Se, Sp] endpoints ROC endpoints FROC endpoint MRMC studies are generally statistically interpretable

Patient-Based MRMC Analysis Includes [Se, Sp] or ROC endpoints Well established methodologies & tools Jackknife/ANOVA Dorfman, Berbaum and Metz ANOVA and correlation model Obuchowski Ordinal regression Toledano and Gatsonis Bootstrap Beiden, Wagner, and Campbell One-shot estimate Gallas

Location-Based MRMC Analysis Accounts for correct localization of lesions Statistical methodologies & tools are available Region-of-Interest ROC analysis Obuchowski et al., Rutter Divide patient data into ROIs (e.g., quadrant or lobe) Jackknife FROC (JAFROC) Chakraborty & Berbaum Bootstrap FROC analysis Samuelson and Petrick, Bornefalk and Hermansson

Evaluating CAD Algorithms Further Statistical Issues Next Talk Evaluating CAD Algorithms Further Statistical Issues

Extra Slides

ROC and Operating Point It is possible to obtain both a rating/ranking as well as action item within the same reader study Not necessarily just one or the other Examples Determine if patient should have workup Rate patient level of suspicion Rate level of suspicion for individual lesions Determine if individual lesions require workup

Example from literature Jiang et. al, “Improving breast cancer diagnosis with computer-aided diagnosis,” Academic Radiology. 6(1):22-33, 1999. Authors studied ROC curves, ROC areas and [Se, Sp] operating point Characterization of microcalcifications Quasi-continuous ratings & action item