Diagnostic Testing Ethan Cowan, MD, MS Department of Emergency Medicine Jacobi Medical Center Department of Epidemiology and Population Health Albert Einstein College of Medicine
The Provider Dilemma u A 26 year old pregnant female presents after twisting her ankle. She has no abdominal or urinary complaints. The nurse sends a UA and uricult dipslide prior to you seeing the patient. What should you do with the results of these tests?
The Provider Dilemma u Should a provider give antibiotics if either one or both of these tests come back positive?
Why Order a Diagnostic Test? u When the diagnosis is uncertain u Incorrect diagnosis leads to clinically significant morbidity or mortality u Diagnostic test result changes management u Test is cost effective
Clinician Thought Process u Clinician derives patient prior prob. of disease: u H & P u Literature u Experience u “Index of Suspicion” u 0% - 100% u “Low, Med., High”
Threshold Approach to Diagnostic Testing u P < P(-)Dx testing & therapy not indicated u P(-) < P < P(+)Dx testing needed prior to therapy u P > P(+) Only intervention needed Pauker and Kassirer, 1980, Gallagher, 1998 Probability of Disease 0% 100% Testing Zone P(-) P(+)
Threshold Approach to Diagnostic Testing u Width of testing zone depends on: u Test properties u Risk of excess morbidity/mortality attributable to the test u Risk/benefit ratio of available therapies for the Dx Probability of Disease 0% 100% Testing Zone P(-) P(+) Pauker and Kassirer, 1980, Gallagher, 1998
Test Characteristics u Reliability u Inter observer u Intra observer u Correlation u B&A Plot u Simple Agreement u Kappa Statistics u Validity u Sensitivity u Specificity u NPV u PPV u ROC Curves
Reliability u The extent to which results obtained with a test are reproducible.
Reliability Not Reliable Reliable
Intra rater reliability u Extent to which a measure produces the same result at different times for the same subjects
Inter rater reliability u Extent to which a measure produces the same result on each subject regardless of who makes the observation
Correlation (r) u For continuous data u r = 1 perfect u r = 0 none O 1 = O 2 O1O1 O2O2 Bland & Altman, 1986
Correlation (r) u Measures relation strength, not agreement u Problem: even near perfect correlation may indicate significant differences between observations O 1 = O 2 r = 0.8 O1O1 O2O2 Bland & Altman, 1986
Bland & Altman Plot u For continuous data u Plot of observation differences versus the means u Data that are evenly distributed around 0 and are within 2 STDs exhibit good agreement O 1 – O 2 [O 1 + O 2 ] / 2 Bland & Altman, 1986
Simple Agreement u Extent to which two or more raters agree on the classifications of all subjects u % of concordance in the 2 x 2 table (a + d) / N u Not ideal, subjects may fall on diagonal by chance Rater 1 Rater 2 -+total - ab a + b + cd c + d totala + cb + dN
Kappa u The proportion of the best possible improvement in agreement beyond chance obtained by the observers u K = (p a – p 0 )/(1-p 0 ) u P a = (a+d)/N (prop. of subjects along the main diagonal) u P o = [(a + b)(a+c) + (c+d)(b+d)]/N 2 (expected prop.) Rater 1 Rater 2 -+total - ab a + b + cd c + d totala + cb + dN
Interpreting Kappa Values K=1 K > < K < < K < < K < 0.40 K = 0 K < 0 Perfect Excellent Good Fair Poor Chance (p a = p 0 ) Less than chance
Weighted Kappa u Used for more than 2 observers or categories u Perfect agreement on the main diagonal weighted more than partial agreement off of it. Rater 1 Rater Ctotal 1 n 11 n 12...n 1C n 1. 2 n 21 n 22...n 2C n C n C1 n C2...n CC n C. totaln.1 n.2...n.C N
Validity u The degree to which a test correctly diagnoses people as having or not having a condition u Internal Validity u External Validity
Validity Valid, not reliableReliable and Valid
Internal Validity u Performance Characteristics u Sensitivity u Specificity u NPV u PPV u ROC Curves
2 x 2 Table TP = True Positives FP = False Positives Test Result Disease Status cases noncases total + TP - positives negatives total cases noncases N FN FP TN TN = True Negatives FN = False Negatives
Gold Standard u Definitive test used to identify cases u Example: traditional agar culture u The dipstick and dipslide are measured against the gold standard
Sensitivity (SN) Test Result Disease Status cases noncases total + TP - positives negatives total cases noncases N FN FP TN u Probability of correctly identifying a true case u TP/(TP + FN) = TP/ cases u High SN, Negative test result rules out Dx (SnNout) Sackett & Straus, 1998
Specificity (SP) Test Result Disease Status cases noncases total + TP - positives negatives total cases noncases N FN FP TN u Probability of correctly identifying a true noncase u TN/(TN + FP) = TN/ noncases u High SP, Positive test result rules in Dx (SpPin) Sackett & Straus, 1998
Problems with Sensitivity and Specificity u Remain constant over patient populations u But, SN and SP convey how likely a test result is positive or negative given the patient does or does not have disease u Paradoxical inversion of clinical logic u Prior knowledge of disease status obviates need of the diagnostic test Gallagher, 1998
Positive Predictive Value (PPV) Test Result Disease Status cases noncases total + TP - positives negatives total cases noncases N FN FP TN u Probability that a labeled (+) is a true case u TP/(TP + FP) = TP/ total positives u High SP corresponds to very high PPV (SpPin) Sackett & Straus, 1998
Negative Predictive Value (NPV) Test Result Disease Status cases noncases total + TP - positives negatives total cases noncases N FN FP TN u Probability that a labeled (-) is a true noncase u TN/(TN + FN) = TP/ total negatives u High SN corresponds to very high NPV (SnNout) Sackett & Straus, 1998
Predictive Value Problems u Vulnerable to Disease Prevalence (P) Shifts u Do not remain constant over patient populations u As PPPV NPV Gallagher, 1998
Flipping a Coin to Dx AMI for People with Chest Pain SN = 3 / 6 = 50% SP = 47 / 94 = 50% AMINo AMI Heads (+)34750 Tails (-) ED AMI Prevalence 6% PPV= 3 / 50 = 6% NPV = 47 / 50 = 94% Worster, 2002
Flipping a Coin to Dx AMI for People with Chest Pain SN = 45 / 90 = 50% SP = 5 / 10 = 50% AMINo AMI Heads (+)45550 Tails (-) CCU AMI Prevalence 90% PPV= 45 / 50 = 90% NPV = 5 / 50 = 10% Worster, 2002
Receiver Operator Curve u Allows consideration of test performance across a range of threshold values u Well suited for continuous variable Dx Tests Specificity (FPR) Sensitivity (TPR)
Receiver Operator Curve u Avoids the “single cutoff trap” No Effect Sepsis Effect WBC Count Gallagher, 1998
Area Under the Curve (θ) 1-Specificity (FPR) Sensitivity (TPR) u Measure of test accuracy u (θ) 0.5 – 0.7 no to low discriminatory power u (θ) 0.7 – 0.9 moderate discriminatory power u (θ) > 0.9 high discriminatory power Gryzybowski, 1997
Problem with ROC curves u Same problems as SN and SP “Reverse Logic” u Mainly used to describe Dx test performance
Appendicitis Example u Study design: u Prospective cohort u Gold standard: u Pathology report from appendectomy or CT finding (negatives) u Diagnostic Test: u Total WBC Cardall, 2004 Appy No Appy CT Scan OR Physical Exam
Appendicitis Example WBCAppyNot AppyTotal > 10, < 10, Total SN 76% (65%-84%) SP 52% (45%-60%) PPV 42% (35%-51%) NPV 82% (74%-89%) Cardall, 2004
Appendicitis Example u Patient WBC: u 13,000 u Management: u Get CT with PO & IV Contrast Cardall, 2004 Appy No Appy CT Scan OR Physical Exam
Abdominal CT
Follow UP u CT result: acute appendicitis u Patient taken to OR for appendectomy
But, was WBC necessary? Answer given in talk on Likelihood Ratios