Diagnosis Articles Much Thanks to: Rob Hayward & Tanya Voth, CCHE
Outline Philosophy of Diagnosis: –Probability of disease –Test and treatment thresholds ANALYZING STUDIES Validity: –Gold (reference) standard Numbers: –Sensitivity, Specificity, Likelihood ratio Applicability: –Observer agreement, Kappa
Philosophy of Diagnosis? Pre-test Probability –The probability that a disease is present before doing a test. –A clinical best guess Post-test Probability –The probability that a disease is present after doing a test –a combination of clinical best guess & test result.
Philosophy of Diagnosis? When Tests are good: Target Negative (Normal) Target Positive (Severely ill) Test results Very NormalVery Abnormal AB
Philosophy of Diagnosis? When Tests aren’t so good: Test result (LR = 4) Target Positive Target Negative 4 1 Very NormalVery Abnormal Test result (LR = 1)
EBM TP: Diagnostic Tests How good are: –Phalen’s Test, –Shifting Dullness, –Patient Report of Fever, –Interstitial Edema on C-Xray, –Ottawa Ankle Rules –Canadian C-Spine Rules vs NEXUS.
Users Guides: Diagnosis
Are the results valid? Did clinicians face diagnostic uncertainty? –Were subjects drawn from a common group in which it is not known whether the condition of interest is present or absent? –E.g First CEA studies used known bowel cancer patients 1 1. Proc Natl Acad Sci USA 1969; 64: 161-7
Are the Results Valid Was an acceptable gold standard used? Imagine a study investigating WBC for Appendicitis that use U/S for the gold standard?
Are the results valid? The test being studied and the gold standard should be completely separate. Studied
Are the results valid? The test being studied and the gold standard should be completely separate? 1) Were the test and gold standard independent? A study looking at Serum Amylase for Pancreatitis that used a gold standard made of a combination of tests including serum amylase. 1 2) Were the test & gold standard results assessed blindly? Imagine a study investigating Ottawa Ankle Rules, in which the radiologist was told the results of the Ankle rules before reading the films. 1. NEJM 1997; 336:
Are the results valid? Did test being studied effect if gold standard was done? –Was a different gold standard applied to subjects testing negative? –E.g. When evaluating VQ scans for PE, those with normal scans often did not go on the gold standard (pulmonary angiography). 1 –In these cases (frequent) we need to be assured of a reasonable back-up gold standard. 1 JAMA 1990; 263:
Users Guides: Diagnosis
EBM Tool for Diagnostic Tests Should: Tell if a symptom, sign or test is useful Useful in which way: –Screening (Ruling out) –Making a Diagnosis (Ruling in) Help us determine the probability of a disease
EBM Diagnostic test Standards Sensitivity SNOUT –Sensitive tests if Negative rule OUT disease. Specificity SPIN –Specific tests if Positive rule IN disease Helpful to sort out if a test is good for Screening (Ruling out) or Diagnosis (Ruling in)
LR Advantage LR’s –Take into account all elements (false positives/negatives and true positives/negatives) –Have Criteria for Usefulness of each Test. –Can be used over a Range of Test Results (e.g. WBC) –Can calculate the actual Likelihood of a disease
Key Concept Likelihood Ratio: Determine the usefulness of tests. (Positive) Likelihood Ratios >1 : ↑ Likelihood Ratio (1 - ∞) = ↑ likelihood of disease Make the diagnosis (Rule in disease) (Negative) Likelihood Ratio <1: ↓ Likelihood Ratio (1 – 0) = ↓ likelihood of disease Exclude the diagnosis (Rule out disease)
What does the LR mean? (Criteria for Usefulness) LR Increase probabilityDecrease probability Excellent> 10< 0.1 Good Moderate/Small Poor
Nomogram LR calculator How do I use the LR?
What are the results? What range of likelihood ratios were associated with the range of possible test results? –Ferritin to detect Fe deficiency (GS = bone marrow) Serum Ferritin Iron Deficient PatientsNot Iron Deficient Positive (< 45)7015 Negative (>45)15135 Sensitivity = 82% Specificity = 90% LR + = 8.2 LR - = 0.2
What are the results? What range of likelihood ratios were associated with the range of possible test results? –Ferritin to detect Fe deficiency (GS = bone marrow) Serum Ferritin Iron Deficient PatientsNot Iron Deficient < – – > Total patients85150
What are the results? What range of likelihood ratios were associated with the range of possible test results? –Ferritin to detect Fe deficiency (GS = bone marrow) Serum Ferritin Iron Deficient Patients L1L1 Not Iron Deficient L2L2 LR = L 1 /L 2 < /85= /150= – /85= /150= – /85= /150= > /85= /150= Total patients85150
Applying LR: Examples A 30 y.o. woman complaining of fatigue and vague MDD Sx (Normal periods). –Guess 20% anemia before test. –Ferritin = 12, (LR = 42.5) Anemia = 90% Same woman, –Ferritin =108, (LR = 0.13) Anemia = 2%
LR Examples Phalen Test (Carpal Tunnel): LR= 1.3 Shifting Dullness (Ascites): LR= 2.3 Patient Reporting Fever (>38 Temp): LR = 4.9 Interstitial Edema on Chest X-Ray (CHF): LR= 12.7 Ottawa Ankle Rules (Ankle #): -ve LR = 0.08 Canadian C-Spine Rules (C-spine #): -ve LR= (vs NEXUS –ve LR = 0.25) JAMA 2000; 283: J Gen Intern Med 1988: Ann Emerg Med 1996: 27: Am J Med 2004; 116: BMJ 2003; 326: 417. NEJM 2003; 349:
Math Diagnostic Tests: Summary Likelihood Ratios are the best we have Tell if a symptom, sign or test is useful Help us determine the probability of a diagnosis
Users Guides: Diagnosis
Apply to patient care? Is the test and its interpretation reproducible (Kappa)? Is the test result the same when reapplied by the same observer (intra-observer variability)? Do different observers agree about the test result (inter-observer variability)? Examples –Specialist doing JVP = 0.42, –Specialist assessing DM retinopathy from photograph = 0.55 –Interpreting mammogram = 0.67 Greenhalgh T. How to Read a Paper (The basics of evidence based medicine). 2001
Apply to patient care? Are the results applicable to the patient in my practice? -Are the patients in the study like mine.
Apply to patient care? Will the results change my management strategy? –Are the test LRs high or low enough to shift post-test probability across a test or treatment threshold?
Apply to patient care? Will patients be better off as a result of the test? –Will the anticipated changes do more good than harm? –Effect of clinically insignificant disease
Key concepts: Reference Standard –You cannot decide if a test works unless you have a “gold standard”. Likelihood Ratio –To determined the utility of a test, Find how much a given result will shift the Likelihood of a Diagnosis. Who cares? –Think about the “ignore” and “act” thresholds and if the test moves you from uncertainty into either zone. Summary
The End Much Thanks to: Rob Hayward & Tanya Voth, CCHE