Rapid Critical Appraisal of diagnostic accuracy studies Professor Paul Glasziou Centre for Evidence Based Medicine University of Oxford
What are tests used for? Diagnosis – what is the problem? Log of reasons by several docs: Monitoring – has it changed? Prognosis – risk/stage within Dx Treatment planning, e.g., location Stalling for time!
Is the test accurate? To be accurate a test should be: Reproducible oWe get the same (wrong?) answer every time oP I-I question Valid oWe get the right answer oP I O question
Reproducibility: Agreement of histopathologists Organ FeatureAgreementKappaReference Rectal Cancer Grading 50% to 69%0.11 to 0.5Thomas Hodgkins Classification 56%0.44Holman Melanoma depth 82%; 64%0.68; 0.23Breslow; Clark Breast cancer classification 73%0.46Stenkvist Ken Fleming, Evidence-based pathology. EBM 1997
Diagnostic Test Accuracy measurements Sensitivity is the probability of a positive test in a diseased person Specificity is the probability of a negative test in a non-diseased person.
Is the test helpful (valid)? The Youden Index Youden Index = sensitivity+specificity-1 For a test to be useful, then osensitivity + specificity > 1 (Youden Index > 0) Examples: Coin Toss with +ve = "heads" sensitivity = 0.5 specificity = 0.5 Youden = 0
Can a test rule-in or rule- out? SpPln Specific test, Positive rules In eg: Rovsing's sign, ST elevation > 2mm SnNout Sensitive test, Negative rules Out eg: Erect abdominal film for obstruction, Elevated WCC in CSF (>5/mm )
Can I trust the accuracy data from the study? RAMMbo Recruitment: Was an appropriate spectrum of patients included? (Spectrum Bias) Maintainence: All patients subjected to a Gold Standard? (Verification Bias) Measurements: Was there an independent, blind or objective comparison with a Gold Standard? Observer Bias; Differential Reference Bias
Appraisal of Tests: RAMMbo was the evaluatioin fair? Outcome Measures (Gold Standard) Outcome Measures (Gold Standard) Index test Comparator Test Maintained? Representation? Population Blinded or Objective?
QUADAS
The Literary Digest Poll Landon versus Roosevelt, 1936 % for Roosevelt Literary Digest: 2.4 Million reader poll Prediction for Roosevelt 43% Gallup's 50,000 random sample Prediction of the election result 56% Gallup's 3,000 Digest readers Prediction of Digest prediction 44% Election result 62%
Good sampling: needs a sample frame & unbiased selection Target Population Sample Frame Actual Sample Complete data
Were reference Measurements blinded or objective? Index +ve Index -ve High Threshold Low Threshold Apparent difference Use standardised measurement strategy across ALL patients
Smith H, et al BMJ, 2000 BNP screen of GP elderly patients UK GP setting 155 patients yrs old Echocardiogram 12 with CCF Sens=92% Spec=65% Sens=50% Spec=90%
Assessment process systematic reviews Assessment process original studies Potentially Eligible Systematic Reviews N= systematic reviews excluded 34 systematic reviews with 39 meta analyses containing 678 original studies 6 systematic reviews with 8 meta-analyses excluded 28 systematic reviews with 31 meta-analyses containing 545 original studies 31 meta-analyses containing 487 original studies 58 original studies excluded Replication & Extension Study AWS Rutjes et al. 2005
Other case-control designs Study characteristics No description population No description reference No description index Retrospective data collection Non-consecutive Non blinded studies Partial verification Differential verification Severe cases and healthy controls 487 studies; 31 meta-analyses
How well are diagnostic studies reported? 112 studies in 4 major journals ( ) StandardN(%) Spectrum composition 30(27) Avoidance of workup bias51(46) Avoidance of review bias43 (38) Test accuracy precision12(11) Indeterminate test results26(23) Test reproducibility26(23) Accuracy in subgroups9(8) Reid MC, Lachs MS, Feinstein AR. Use of Methodological Standard in diagnostic test research. JAMA 1995;274:
Using Evidence about Tests Appraise the study - PICO Always ask Is it useful at all? (Youden Index > 0) Usually ask Can it “rule in” or “rule out” a disease? Often ask What is the post-test probability in the same situation as the study? Rarely ask Calculation of post-test probabilities in a different situation