Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 FDA Radiological Devices Panel Meeting March 4-5, 2008 Mammography CAD Devices Robert C. Smith, MD, JD Medical Officer (Radiologist) Division of Reproductive,

Similar presentations


Presentation on theme: "1 FDA Radiological Devices Panel Meeting March 4-5, 2008 Mammography CAD Devices Robert C. Smith, MD, JD Medical Officer (Radiologist) Division of Reproductive,"— Presentation transcript:

1 1 FDA Radiological Devices Panel Meeting March 4-5, 2008 Mammography CAD Devices Robert C. Smith, MD, JD Medical Officer (Radiologist) Division of Reproductive, Abdominal, and Radiological Devices Office of Device Evaluation Center for Devices & Radiological Health

2 2 Mammography CAD Devices Primary purpose of Mammography CAD devices is to reduce errors when interpreting screening mammograms Screening mammograms are performed to identify patients with breast cancer

3 3 Breast Cancer Breast cancer is the most commonly diagnosed cancer and the second leading cause of cancer- related death among women in the United States

4 4 Screening for Breast Cancer 13,610 accredited mammography machines in the U.S., performing approximately 36 million annual mammography procedures Approximately 80% of all mammography examinations are performed for screening

5 5 Screening for Breast Cancer More than 500,000 women in randomized trials to investigate effect of screening mammography A 20–30% reduction of mortality was demonstrated in women aged 50 years and older

6 6 Patient Characteristics in a Screening population The clinical, mammographic and pathologic characteristics of patients who undergo screening mammography in the United States are well-known from large published clinical trials and publicly available databases.

7 7 Patient Characteristics in a Screening population The largest publicly available database is the Breast Cancer Surveillance Consortium (BCSC) It contains information on 6 million mammograms from more than 2 million women and 74,000 breast cancers

8 8 Patient and Cancer Characteristics in a Screening Population The relevant patient characteristics include: Cancer Size Breast Density Finding Type Histologic Type Palpability

9 9 Cancer Size Approximate distribution of cancer size on screening mammography: 35% are < 10 mm 60% are < 15 mm 75% are < 20 mm Larger cancers are more readily identified and characterized on mammography

10 10 Breast Density 10% almost entirely fatty 40% scattered fibroglandular 40% heterogeneously dense 10% extremely dense Greater breast density is associated with lower sensitivity for breast cancer detection and a higher incidence of interval development of breast cancer following a negative mammogram.

11 11 Finding Type 30-40% masses 30-40% microcalcifications 10-20% combination mass/microcalcifications 10-20% architectural distortion or focal asymmetry

12 12 Histologic Type 70-80% are invasive cancers 20-30% are DCIS

13 13 Palpability By definition, patients who undergo screening mammography are asymptomatic However, approximately 2-5% will retrospectively be shown to have symptoms

14 14 Types of Mammography Devices Two types of mammography devices Screen-Film Digital DR (using direct or indirect flat panel) CR (using a photostimulable phosphor)

15 15 Mammographic Projections Two standard projections of each breast: Craniocaudal (CC) view Mediolateral oblique (MLO) view

16 16 Mammography Interpretation Breast cancer is detected on the basis of four types of mammographic findings: characteristic morphology of a mass shape and spatial configuration of microcalcifications distortion of breast tissue architecture asymmetry between left and right breast

17 17 MQSA Mammography is unique among imaging tests as it must be performed (and even interpreted) in accordance with the Mammography Quality Standards Act (MQSA) MQSA does not apply to mammography CAD devices

18 18 Interpretation The CC and MLO projections in each breast are considered complementary and necessary for interpretation Mammography studies are always interpreted by examination of the CC views from each breast in a side-by-side manner, and likewise for the MLO views

19 19 Interpretation When a finding is identified on a single view (either the CC or MLO), then the corresponding region on the complementary view is examined to confirm the three-dimensionality of the finding Comparison should always be made to prior mammograms when these are available

20 20 Reporting Mammographic Examinations Mammographic characteristics and findings are reported according to the American College of Radiology (ACR) Breast Imaging Reporting and Data System (BI-RADS) Atlas BI-RADS is meant to standardize the language and descriptions used in mammography reports

21 21 Reporting Location A finding should always be triangulated so that its three-dimensional location within the breast is known

22 22 Reporting Location

23 23 BI-RADS Final Assessment Categories The ACR developed and standardized the final diagnostic assessment into 7 categories (that correspond to the reporting requirements in MQSA):

24 24 Final Assessment Categories Category 0 (Need Additional Imaging Evaluation and/or Prior Mammograms) Category 1 (Negative) Category 2 (Benign Finding(s)) Category 3 (Probably Benign Finding–Initial Short-Interval Follow-Up Suggested; “A finding placed in this category should have less than a 2% risk of malignancy;” “it is inadvisable to render such an assessment when interpreting a screening examination”)

25 25 Final Assessment Categories Category 4 (Suspicious Abnormality— Biopsy Should Be Considered) 4A: low suspicion for malignancy. 4B: intermediate suspicion of malignancy. 4C: moderate concern. A malignant result in this category is expected.

26 26 Final Assessment Categories Category 5 (Highly Suggestive of Malignancy These lesions have a high probability (≥95%) of being Cancer) Category 6 (Known Biopsy – Proven Malignancy)

27 27 Performance Measures of Mammographers There is great variability in the published literature: Sensitivity ranges from 60% to 100% Specificity ranges from 35% to 98% The BCSC database shows an average sensitivity of 79% specificity of 90%

28 28 Performance Measures of Mammographers Mammographic sensitivity is lowest in patients with dense breasts and for small masses

29 29 Radiologist Errors on Mammography Exams Given that > 99% of patients who undergo screening mammography do not have a cancer, the task of the mammographer is to find a “needle in a haystack.” A practicing radiologist may therefore perform detection and analysis very rapidly (i.e., both tasks performed almost simultaneously) in a clinical setting.

30 30 Radiologist Errors on Mammography Exams Cancers visible on a mammogram may draw the radiologist’s attention and be dismissed with or without formal description in the radiology report Such “missed” cancers do not constitute errors of detection, but are errors of analysis

31 31 Radiologist Errors on Mammography Exams Any device designed to reduce radiologist errors should focus on the types of cancers that radiologists tend to miss However, it might also be beneficial to detect cancers that are missed simply because the radiologist is “asleep at the wheel” Either way, increased detection should always be weighed against increased false positives

32 32 False Positive Mammograms False-positive (FP) mammograms greatly outnumber actual breast cancers found (approximately 10% of patients who undergo screening mammography will be recalled for diagnostic mammography) Approximately 0.4% of patients who undergo screening mammography have a breast cancer

33 33 False Positive Mammograms 50% of all women will have at least one false- positive mammogram over 10 years of screening False-positive mammograms can cause: increased dose exposure biopsy complications associated with biopsy unnecessary anxiety

34 34 False Negative Mammograms Approximately 20% of cancers are missed on screening mammography Of these 20% of cancers missed on screening mammography, approximately: 10% are visible 10% are not visible References in briefing package

35 35 False Negative Mammograms Therefore, there is room for improvement to capture (approximately) up to an additional 10% of cancers that are otherwise visible but go undetected or go misclassified on screening mammography

36 36 False Negative Mammograms Of the 10% of missed cancers that are visible on mammography, approximately: 5% are errors of detection 5% are errors of analysis References in briefing package

37 37 False Negative Mammograms Compared with cancers that are not missed, of the 10% of missed cancers that are visible on mammography: Most are masses, architectural distortion or focal asymmetry They are smaller in size They are present in patients with denser breasts

38 38 Reducing Radiologist Interpretation Errors How can we reduce radiologist errors when interpreting mammograms?

39 39 Reducing Radiologist Interpretation Errors Double reading of screening mammograms (i.e., reading by two radiologists) has been advocated as a way to increase radiologist sensitivity Clinical studies have shown that double reading of mammograms improves radiologist detection by 5% to 15% but typically with an associated increase in recall rate of 5-10% unless consensus double reading is used

40 40 Reducing Radiologist Interpretation Errors Double reading of screening mammograms can capture both errors of detection and errors of analysis The published literature shows that double reading can capture a substantial portion of the 10% of cancers that are visible but currently go undiagnosed

41 41 Origin of Mammography CAD Devices CAD devices have been developed as a potential “replacement” for one of the double readers

42 42 Origin of Mammography CAD Devices The intended use of current commercially available Mammography CAD devices is to reduce errors of detection That is, current commercially available Mammography CAD devices attempt to capture the approximate 5% of cancers that are visible but not detected

43 43 Origin of Mammography CAD Devices The potential for improved detection should always be weighed against the potential for false positive interpretations by radiologists using CAD devices

44 44 False Positive CAD Marks Screening cancer incidence = 4/1000 CAD devices place at least 2 marks per patient

45 45 False Positive CAD Marks Even assuming 100% sensitivity for CAD and marks placed on both views for each cancer, there will be 249 false positive marks for every true positive mark It is therefore important to measure how easy or difficult it is for radiologists to dismiss false positive CAD marks It is also important to measure the effect of false positive CAD marks on distracting the radiologist from other findings that may not be marked

46 46 Mammography CAD Devices Approved by FDA Four mammography CAD systems have been approved through premarket approval application (PMA) (approval orders in 1998, two in 2002, and 2004, respectively) The ‘first-of-the-kind’ device was the subject of a Radiological Devices Advisory Panel Meeting held on May 11, 1998

47 47 Mammography CAD Devices Approved by FDA All devices were first approved for use with digitized versions of screen film mammograms obtained for screening purposes

48 48 Mammography CAD Devices Approved by FDA Further PMA supplements were approved over time to expand the use of CAD devices to operate on digitized diagnostic screen film mammograms as well as mammograms obtained from FFDM devices Supplements have also been approved for modified software versions of CAD algorithms

49 49 Mammography CAD Devices Approved by FDA Labeling of approved mammography CAD devices have an IFU similar to the following: “… intended to identify and mark regions of interest on routine screening and [the CC and MLO views of] diagnostic mammograms to bring them to the attention of the radiologist after the initial reading has been completed. … [and to] … assist the radiologist in minimizing observational oversights by identifying areas on the original mammogram that may warrant a second review”

50 50 Data and Information Provided for Original Approval of Mammography CAD Devices At the time of approval of the first mammography CAD device in 1998, there was limited experience with its use by radiologists in clinical practice

51 51 Original Approval The data that served as the basis for approval for currently approved mammography CAD devices included four components:

52 52 Original Approval 1.Standalone performance on “missed” cancers “Missed” breast cancers were identified by obtaining “prior” mammograms from patients with newly diagnosed cancer (i.e., patients with interval cancers) and determining if the cancers were “visible” in retrospect and should have led to a clinical action

53 53 Original Approval 1.Standalone performance on “missed” cancers Considered a surrogate for the ability of the device to detect “difficult” findings Used to estimate the maximum potential reduction of detection errors if radiologists used the device in practice

54 54 Original Approval 2.Standalone performance on cancers detected at screening mammography Considered a measure for the ability of the device to detect more obvious and intermediate level of difficulty findings

55 55 Original Approval 3.Standalone performance on normal screening mammograms to determine the rate of false positive CAD marks on normal cases 4.Screening exams (with or without enrichment with some cancers) were used to determine the potential increase in recall rate resulting from use of the CAD devices

56 56 What Has Been Learned Since Original Approval? Large body of published literature on the standalone performance of Mammography CAD devices

57 57 What Has Been Learned Since Original Approval? Large body of published literature on the subject of reader performance testing of mammography CAD devices where radiologist performance is measured both without and with use of the CAD device These studies employed two different general designs: retrospective clinical performance testing and prospective clinical performance testing

58 58 What Has Been Learned Since Original Approval? The retrospective clinical studies use radiologist interpretations that are not part of an actual clinical practice The prospective clinical studies use radiologist interpretations that are part of an actual clinical practice The prospective studies include sequential or historical control designs

59 59 What Has Been Learned Since Original Approval? The sequential design presents the radiologist with an image without CAD information, requires interpretation, and then presents the same image with CAD markings and allows the radiologist to modify the assessment The historical control design compares radiologist performance over a period of time without CAD devices to radiologist performance over a period of time after CAD introduction

60 60 Key Points From the Published Literature Standalone testing has shown very high sensitivity to mark calcifications much lower sensitivity to mark masses, architectural distortion, or focal asymmetry Standalone testing has shown a FP mark rate of between 2 and 4 marks per patient Discussion questions: M7

61 61 Key Points From the Published Literature Reader performance testing has shown conflicting results for detection of invasive cancers Reader performance testing has shown a trend toward CAD improving radiologist detection of calcifications, especially DCIS Reader performance testing has shown an increased recall rate when using CAD devices. In some studies, these increases are statistically significant Discussion questions: M7

62 62 Clinical Testing Issues Specific to Mammography CAD

63 63 Ground Truth Is crucial for standalone and reader performance testing Ground truth includes : Whether or not the patient has a breast finding Whether or not the patient has one or more benign and/or malignant findings Precise location and extent of each finding on each view BI-RADS descriptors and final assessment of each finding Discussion questions: M1

64 64 Ground Truth Definition Ground truth for cancer is determined by biopsy or surgery Ground truth for benign findings is determined by biopsy/surgery, OR by one-year follow-up mammogram Ground truth for normal is determined by one- year follow-up mammogram Discussion questions: M1

65 65 Ground Truth Definition Ground truth for the location and extent (i.e., lesion boundary) of a finding is determined by a panel of experts and can be annotated either manually or digitally on the image Discussion questions: M1

66 66 Standalone Performance Testing Standalone performance is highly dependent on case selection including: Discussion questions: M1, M4

67 67 The precise mammographic characteristics: Finding size Pathologic type Number of masses versus microcalcifications Breast density Discussion questions: M1, M4 Standalone Performance Testing

68 68 The precise method of ground truth determination for location and extent of disease The precise scoring metric (e.g., per lesion, per CAD mark, per patient, versus per view) The precise scoring methodology (e.g., using overlap criteria on the actual CAD mark, or the actual region identified by the CAD system (i.e., the algorithm segmentation) that may not be displayed to the user) Discussion questions: M1, M4 Standalone Performance Testing

69 69 Standalone Performance Testing Standalone performance can be done using a larger database than used for reader performance testing This may allow meaningful stratified analysis on clinical, mammographic and pathologic subgroups which may influence user confidence in the ability of the device to detect findings in each subgroup Discussion questions: M1, M4

70 70 Standalone Performance Testing Stratified measures may include: mammographic finding types mass, microcalcifications, architectural distortion and asymmetries pathologic types size of mammographic findings breast composition In particular, number of small masses (< 10 mm in size) in patients with dense breasts Discussion questions: M1, M4

71 71 Standalone Performance Testing Both overall and stratified standalone performance can be reported on a per lesion, per view, per breast (2 views) or per patient (4 views) basis Given that clinical actions following mammography are finding specific, what are the advantages and disadvantages of each of the above reporting measures? Discussion questions: M1, M4

72 72 Standalone Performance Testing Without standardized methodologies for case selection, ground truth, scoring metric, scoring methodology and reporting … It may be difficult and invalid to compare performance between devices from different manufacturers or different versions of a device from the same manufacturer? It may also be difficult to account for differences in detection location and number for both true positives and false positives in such comparisons? Discussion questions: M1, M4

73 73 Reader Performance Testing While standalone performance testing indicates how well the device marks locations of interest in the absence of radiologist interaction, it does not measure the safety or effectiveness of the device for its intended uses and conditions of use by a Reader There are several types of reader performance tests that are designed to determine the impact of a CAD device on Reader performance (please refer to ‘General CAD Methods’ and ‘Statistical issues’ sections) Discussion questions: M2, M4

74 74 Testing Dataset The prevalence of breast cancer in a screening population is approximately 0.4% Following the Least Burdensome approach, reader performance testing may be accomplished using a dataset “enriched” with a significantly greater percentage of patients with breast cancer (i.e., a population of patients with a prevalence of cancer much higher than that in a real screening population) Discussion questions: M2, M4

75 75 Testing Dataset Ignoring prevalence, if the cancer and non-cancer cases used for enrichment otherwise have clinical, mammographic and pathologic characteristics typically seen in a screening population, such testing simulates a so-called “field test” (i.e., clinical assessment of a system) However, the much higher prevalence in enriched datasets can introduce bias Can reader performance testing using enriched datasets give an estimated measure for device effectiveness as would be seen in clinical practice? Discussion questions: M2, M4

76 76 Testing Dataset When enrichment is performed with only difficult cases, the testing is often referred as a stress test In this situation, the test dataset is enriched with only difficult cases that challenge the readers Stress testing will not capture information about the effect of CAD on cases that radiologists don’t tend to miss. This may lead to an incomplete assessment. Is stress testing alone sufficient to measure safety and effectiveness? Discussion questions: M2, M4

77 77 Study Endpoints for Reader Performance Testing Both overall and stratified reader performance can be reported per lesion, per view, per breast (2 views) or per patient (4 views) basis Given that clinical actions following mammography are finding specific, how critical is it to account for reader location accuracy? Discussion questions: M2, M4

78 78 Study Endpoints for Reader Performance Testing Meaningful stratified analysis may include: Breast Density Finding Size Finding Type masses, microcalcifications, architectural distortion and focal asymmetry Histologic Type Invasive Cancer DCIS Discussion questions: M2, M4

79 79 Other Issues For Mammography CAD Devices Screen film and FFDM have different spatial and contrast resolution and noise FFDM systems also vary in spatial and contrast resolution because of differences of solid state detectors, pixel sizes, and quantum and electronic noises Different FFDM manufacturers use different technologies and different image processing techniques Discussion questions: M5

80 80 Other Issues For Mammography CAD Devices Therefore, the standalone and reader performance testing of CAD devices on screen film images may differ from testing results on FFDM images This may also apply to testing on different FFDM devices Is there a reason why testing of CAD devices on a new or modified image input (digitized film or FFDM) be any different than the standalone and reader performance testing already discussed? Discussion questions: M5

81 81 Indications and Reader Paradigms Mammography CAD can be clinically implemented as a second reader or a concurrent reader. Second reading can only increase reading time while concurrent reading may reduce reading time. How is mammography CAD currently used in clinical practice? Mammography CAD has been reported with very high sensitivity for calcifications. Is there a clinical role for mammography CAD as a concurrent reader for calcifications (but not masses)? Discussion questions: M6


Download ppt "1 FDA Radiological Devices Panel Meeting March 4-5, 2008 Mammography CAD Devices Robert C. Smith, MD, JD Medical Officer (Radiologist) Division of Reproductive,"

Similar presentations


Ads by Google