Common Errors by Teachers and Proponents of EBM

Slides:



Advertisements
Similar presentations
Welcome Back From Lunch
Advertisements

2) Multilevel Tests (Michael) Likelihood ratios for results other than + or -
Studying a Study and Testing a Test: Sensitivity Training, “Don’t Make a Good Test Bad”, and “Analyze This” Borrowed Liberally from Riegelman and Hirsch,
TESTING A TEST Ian McDowell Department of Epidemiology & Community Medicine November, 2004.
Is it True? Evaluating Research about Diagnostic Tests
Assessing Information from Multilevel (Ordinal) and Continuous Tests ROC curves and Likelihood Ratios for results other than “+” or “-” Michael A. Kohn,
Procalcitonin Over the past two decades, the body of literature on the clinical usefulness of procalcitonin (PCT) in adults has grown rapidly. Although.
“Diagnostic value of procalcitonin in well appearing young febrile infants” Pediatrics 2012; 130:
Critically Evaluating the Evidence: diagnosis, prognosis, and screening Elizabeth Crabtree, MPH, PhD (c) Director of Evidence-Based Practice, Quality Management.
Dr. Simon Benson GP Specialist Trainee. Introduction Diagnosis of pneumonia in children with wheeze is difficult Limited data exists regarding predictors.
Evaluation of Diagnostic Test Studies
Thursday, February 11, 2010 Hussein Unwala PEM Fellow.
Cohort Studies Hanna E. Bloomfield, MD, MPH Professor of Medicine Associate Chief of Staff, Research Minneapolis VA Medical Center.
Information Mastery: A Practical Approach to Evidence-Based Care Course Directors: Allen Shaughnessy, PharmD, MMedEd David Slawson, MD Tufts Health Care.
Diagnosis Concepts and Glossary. Cross-sectional study The observation of a defined population at a single point in time or time interval. Exposure and.
Studies of Diagnostic Tests
Statistics in Screening/Diagnosis
Multiple Choice Questions for discussion
Diagnosis Articles Much Thanks to: Rob Hayward & Tanya Voth, CCHE.
When is it safe to forego a CT in kids with head trauma? (based on the article: Identification of children at very low risk of clinically- important brain.
Dipstick Screening for Urinary Tract Infection in Febrile Infants Journal Club Tuesday 15 th July 2014 Charlotte Elder.
1 Lecture 2 Screening and diagnostic tests Normal and abnormal Validity: “gold” or criterion standard Sensitivity, specificity, predictive value Likelihood.
DEB BYNUM, MD AUGUST 2010 Evidence Based Medicine: Review of the basics.
EBM --- Journal Reading Presenter :李政鴻 Date : 2005/10/26.
Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 14, 2010.
Diagnostic Testing Ethan Cowan, MD, MS Department of Emergency Medicine Jacobi Medical Center Department of Epidemiology and Population Health Albert Einstein.
Assessing Information from Multilevel and Continuous Tests Likelihood Ratios for results other than “+” or “-” Tom Newman (based on previous lectures by.
Studies of Medical Tests Thomas B. Newman, MD, MPH September 9, 2008.
Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 16, 2008.
Vanderbilt Sports Medicine How to practice and teach EBM Chapter 3 May 3, 2006.
Evidence Based Medicine Workshop Diagnosis March 18, 2010.
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 11, 2012.
+ Clinical Decision on a Diagnostic Test Inna Mangalindan. Block N. Class September 15, 2008.
Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 11, 2007.
Rule Out UTI. Shaikh N et al. Prevalence of urinary tract infections in childhood. A meta- analysis. Ped Infect Dis J 2008.
Diagnosis: EBM Approach Michael Brown MD Grand Rapids MERC/ Michigan State University.
Appraising A Diagnostic Test
Assessing Information from Multilevel (Ordinal) and Continuous Tests ROC curves and Likelihood Ratios for results other than “+” or “-” Michael A. Kohn,
1 Risk Assessment Tests Marina Kondratovich, Ph.D. OIVD/CDRH/FDA March 9, 2011 Molecular and Clinical Genetics Panel for Direct-to-Consumer (DTC) Genetic.
Wipanee Phupakdi, MD September 15, Overview  Define EBM  Learn steps in EBM process  Identify parts of a well-built clinical question  Discuss.
1. Statistics Objectives: 1.Try to differentiate between the P value and alpha value 2.When to perform a test 3.Limitations of different tests and how.
Assessing Information from Multilevel (Ordinal) Tests ROC curves and Likelihood Ratios for results other than “+” or “-” Michael A. Kohn, MD, MPP 10/4/2007.
Division of Population Health Sciences Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn A Systematic Review and Meta-Analysis of.
Assessing Information from Multilevel and Continuous Tests Likelihood Ratios for results other than “+” or “-” Michael A. Kohn, MD, MPP 10/2/2008.
Assessing Information from Multilevel and Continuous Tests Likelihood Ratios for results other than “+” or “-” Michael A. Kohn, MD, MPP 10/13/2011.
Welcome Back From Lunch. Thursday Afternoon 2:00-3:00 Studies of Diagnostic Test Accuracy (Tom) 3:00-3:45 Combining Tests (Mark) 3:45-4:00 Break 4:00-5:30.
HSS4303B – Intro to Epidemiology Feb 8, Agreement.
Copyright restrictions may apply JAMA Pediatrics Journal Club Slides: Procalcitonin Use to Predict Bacterial Infection in Febrile Infants Milcent K, Faesch.
Diagnostic Tests Studies 87/3/2 “How to read a paper” workshop Kamran Yazdani, MD MPH.
SCH Journal Club Use of time from fever onset improves the diagnostic accuracy of C-reactive protein in identifying bacterial infections Wednesday 13 th.
Afebrile Infants With UTI and the Risk for Bacteraemia Journal Club Sheffield Children’s Hospital Naheed Maher 7 th January 2015.
Diagnostic Test Characteristics: What does this result mean
EBM --- Journal Reading Presenter :呂宥達 Date : 2005/10/27.
1 Medical Epidemiology Interpreting Medical Tests and Other Evidence.
Journal club Diagnostic accuracy of Urinalysis for UTI in Infants
EBM --- Journal Reading Presenter :傅斯誠 Date : 2005/10/26.
Validation and Refinement of a Prediction Rule to Identify Children at Low Risk for Acute Appendicitis Kharbanda AB, Dudley NC, Bajaj L, et al; Pediatric.
EVALUATING u After retrieving the literature, you have to evaluate or critically appraise the evidence for its validity and applicability to your patient.
Are well infants with urinary tract infections at risk of bacteraemia? Elspeth Ferguson ST6 Paediatrics.
PTP 560 Research Methods Week 12 Thomas Ruediger, PT.
Diagnostic Likelihood Ratio Presented by Juan Wang.
Diagnosis:Testing the Test Verma Walker Kathy Davies.
EBM --- Journal Reading Presenter :黃美琴 Date : 2005/10/27.
Critical Appraisal Course for Emergency Medicine Trainees Module 5 Evaluation of a Diagnostic Test.
Diagnostic studies Adrian Boyle.
Diagnostic Test Studies
When is the post-test probability sufficient for decision-making?
Refining Probability Test Informations Vahid Ashoorion MD. ,MSc,
Evidence Based Diagnosis
Presentation transcript:

Common Errors by Teachers and Proponents of EBM Thomas B. Newman, MD, MPH with thanks to Michael Kohn, MD, MPP and Andi Marmor, MD Evidence-Based Pediatrics SIG, 2012

Outline/Menu Interval likelihood ratios Septic arthritis When not to use likelihood ratios UTI in young febrile children Critical appraisal of studies of diagnostic tests: Beyond the checklist Signs and symptons of appendicitis Getting the most out of ROC curves (LAST YEAR): Meningitis in young infants ROC Curve demonstration

Septic Arthritis Bacterial infection in a joint.

Does this Adult Patient Have Septic Arthritis? JAMA. 2007;297:1478-1488. “A 48-year-old woman…presents to the emergency department with a 2-day history of a red, swollen right knee that is painful to touch…. On examination, she is afebrile and has a right knee effusion…An arthrocentesis is performed and initial laboratory results show a negative Gram stain...” Pre-Test Probability of Septic Arthritis = 38% Synovial Fluid WBC Count = 48,000/µL Post-Test Probability of Septic Arthritis = ?

Test Characteristics of Synovial Fluid Studies Margaretten, M. E. et al. JAMA 2007;297:1478-1488. Copyright restrictions may apply.

Sensitivity, Specificity, LR(+), and LR(-) of the Synovial Fluid WBC Count for Septic Arthritis at Different Cutoffs WBC (/uL) Sensitivity Specificity LR+ LR- >100,000 29% 99% 29.0 0.7 >50,000 62% 92% 7.8 0.4 >25,000 77% 73% 2.9 0.3 Synovial WBC Count = 48,000/uL Which LR should we use?

Synovial WBC Count = 48,000/uL Sensitivity, Specificity, LR(+), and LR(-) of the Synovial Fluid WBC Count for Septic Arthritis at 3 Different Cutoffs WBC (/uL) Sensitivity Specificity LR+ LR- >100,000 29% 99% 29.0 0.7 >50,000 62% 92% 7.8 0.4 >25,000 77% 73% 2.9 0.3 JAMA authors used this one Synovial WBC Count = 48,000/uL

Clinical Scenario Synovial WBC = 48,000/mL Pre-test prob: 0.38 Pre-test odds: 0.38/0.62 = 0.61 LR(+) = 2.9 (According to JAMA authors) Post-Test Odds = Pre-Test Odds x LR(+) = 0.61 x 2.9 = 1.75 Post-Test prob = 1.75/(1.75+1) = 0.64

Sensitivity, Specificity, LR(+), and LR(-) of the Synovial Fluid WBC Count for Septic Arthritis at 3 Different Cutoffs WBC (/uL) Sensitivity Specificity LR+ LR- >100,000 29% 99% 29.0 0.7 >50,000 62% 92% 7.8 0.4 >25,000 77% 73% 2.9 0.3 Synovial WBC Count = 48,000/uL Which LR should we use?

Sensitivity, Specificity, LR(+), and LR(-) of the Synovial Fluid WBC Count for Septic Arthritis at 3 Different Cutoffs WBC (/uL) Sensitivity Specificity LR+ LR- >100,000 29% 99% 29.0 0.7 >50,000 62% 92% 7.8 0.4 >25,000 77% 73% 2.9 0.3 Synovial WBC Count = 48,000/uL Which LR should we use? NONE of THESE!

LR(result) = P(result|D+)/P(result|D-) Likelihood Ratios P(Result) in patient WITH disease ---------------------------------------------------- P(Result) in patients WITHOUT disease LR(result) = P(result|D+)/P(result|D-)

Likelihood Ratio WBC (/uL) Interval % of D+ % of D- Interval LR >100,000 29% 1% 29.0 >50,000-100,000 33% 7% 4.7 >25,000-50,000 15% 19% 0.8 0 - 25,000 23% 73% 0.3

Likelihood Ratio WBC (/uL) Interval % of D+ % of D- Interval LR >100,000 29% 1% 29.0 >50,000-100,000 33% 7% 4.7 >25,000-50,000 15% 19% 0.8 0 - 25,000 23% 73% 0.3 More appropriate LR?

LR = Slope of ROC Curve > 25k > 50k 15% Slope = 15%/19% =0.8 19%

Clinical Scenario Synovial WBC = 48,000/uL Pre-test prob: 0.38 Pre-test odds: 0.38/0.62 = 0.61 LR(WBC btw 25,000 and 50,000) = 0.8 Post-Test Odds = Pre-Test Odds x LR(48) = 0.61 x 0.8 = 0.49 Post-Test prob = 0.49/(0.49+1) = 0.33

Doing it right makes a difference From JAMA paper: “Her synovial WBC count of 48,000/µL increases the probability from 38% to 64%.” (Used LR = 2.9) Alternative calculation: Her synovial WBC count of 48,000/µL decreases the probability from 38% to 33%.” (Used LR = 0.8) Fixed - -you had both LR =2.9

Does This Dyspneic Patient in the Emergency Department Have Congestive Heart Failure? JAMA. 2005;294:1944-1956. How to interpret serum BNP (B-type Natriuretic Peptide) results? “In this case, a BNP level could be very helpful. If it were less than 100 pg/mL, heart failure would be extremely unlikely (LR 0.09). If it were elevated, the probability of heart failure is higher but not diagnostic.”

Summary of Operating Characteristics of Serum BNP in Emergency Department Patients Wang, C. S. et al. JAMA 2005;294:1944-1956. Copyright restrictions may apply.

When NOT to use LR

Background Black children (at least girls) appear to be at lower risk of UTI (RR ~0.3) Circumcised boys are at much lower risk than uncircumcised boys (RR ~0.1) In diagnosing UTI, it makes sense to use both history findings like these with physical examination (height of fever, etc.) and laboratory (urine white cells) But there is a very important difference!

Does This Child Have a UTI? JAMA. 2007;298(24):2895-2904

Does This Child Have a UTI? JAMA. 2007;298(24):2895-2904

What is wrong with using LRs for these risk factors? LR will vary tremendously with the prevalence of the risk factor in each study!

Definitions   Disease Risk factor or Test Result Yes No Total Present (+) a b a+b Absent (-) c d c+d a+c b+d N LR+= a/(a+c) b/(b+d) LR- = c/(a+c) d/(b+d) OR = ad/bc = LR+/LR-

Figure 8.9 Figure 8.9 Relationship between prior odds, LR+ and LR−, posterior odds and the OR. Panel A: Low prevalence of strong risk factor.

Figure 8.9 Figure 8.9 Relationship between prior odds, LR+ and LR−, posterior odds and the OR. Panel B: High prevalence of strong risk factor.

OR vs LR

Except in blacks, urinalysis and urine culture recommended for: Additional problem: failing to quantify risks and benefits of tests and treatments, leading overly aggressive testing recommendations Except in blacks, urinalysis and urine culture recommended for: Girls and uncircumcised boys 3-24 months with any fever of any duration even if they look well and have an apparent source Circumcised boys with any fever > 24 hours even if they look well and have an apparent source *Shaikh N et al. JAMA 2007;298:2895-2904, figures 2 & 3

Critical Appraisal of Studies of Diagnostic Test Accuracy Index Test = Test Being Evaluated Gold Standard = Test Used to Determine True Disease Status

Chapter 5 – Studies of Diagnostic Tests Incorporation Bias – index test part of gold standard (Sensitivity Up, Specificity Up) Verification/Referral Bias – positive index test increases referral to gold standard (Sensitivity Up, Specificity Down) Double Gold Standard – positive index test causes application of definitive gold standard, negative index test results in clinical follow-up (Sensitivity Up, Specificity Up)* Spectrum Bias D+ sickest of the sick (Sensitivity Up) D- wellest of the well (Specificity Up) *If cases resolve spontaneously.

Bias #2 Example: Visual assessment of jaundice in newborns Study patients who are getting a bilirubin measurement Ask clinicians to estimate extent of jaundice at time of blood draw Compare with blood test

Visual Assessment of jaundice*: Results Sensitivity of jaundice below the nipple line for bilirubin ≥ 12 mg/dL = 97% Specificity = 19% What is the problem? Editor’s Note: The take-home message for me is that no jaundice below the nipple line equals no bilirubin test, unless there’s some other indication. --Catherine D. DeAngelis, MD *Moyer et al., APAM 2000; 154:391

Bias #2: Verification Bias* -1 Inclusion criterion for study: gold standard test was done in this case, blood test for bilirubin Subjects with positive index tests are more likely to be get the gold standard and to be included in the study clinicians usually don’t order blood test for bilirubin if there is little or no jaundice How does this affect sensitivity and specificity? *AKA Work-up, Referral Bias, or Ascertainment Bias

Verification Bias TSB >12 TSB < 12 Jaundice below nipple a b No jaundice below nipple c  d  Sensitivity, a/(a+c), is biased ___. Specificity, d/(b+d), is biased ___. *AKA Work-up, Referral Bias, or Ascertainment Bias

Double Gold Standard Bias Two different “gold standards” One gold standard (usually an immediate, more invasive test, e.g., angiogram, surgery) is more likely to be applied in patients with positive index test Second gold standard (e.g., clinical follow-up) is more likely to be applied in patients with a negative index test.

Double Gold Standard Bias There are some patients in whom the two “gold standards” do not give the same answer Spontaneously resolving disease (positive with immediate invasive test, but not with follow-up) Newly occurring or newly detectable disease (positive with follow-up but not with immediate invasive test)

Effect of Double Gold Standard Bias: Spontaneously resolving disease Test result will always agree with gold standard Both sensitivity and specificity increase Example: Joey has an intussusception that will resolve spontaneously. If his ultrasound scan is positive, he will get a contrast enema that will show (and cure) the intussusception (true positive) If his ultrasound scan is negative, his intussusception will resolve and we will think he never had one (true negative) Ultrasound scan can’t be wrong!

Does This Child Have Appendicitis? JAMA. 2007;298:438-451. RLQ Pain: Sensitivity = 96% Specificity = 5% (1 – Specificity = 95%) Likelihood Ratio =1.0 RLQ pain was present in 96% of those with appendicitis and 95% of those without appendicitis. Copyright restrictions may apply.

Verification (Referral) Bias Biases the accuracy of a finding when the presence of the finding makes the patient more likely to be studied. Specificity biased down (5%) . Sensitivity biased up (96%) .

No; it means only kids with RLQ pain get appendectomies. Does the LR of 1 mean that, in children, RLQ pain is not indicative of appendicitis? Bundy, D. G. et al. JAMA 2007;298:438-451. Study Population: Children who underwent appendectomy No; it means only kids with RLQ pain get appendectomies. Copyright restrictions may apply.

Studies of Diagnostic Test Accuracy: Checklist Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis? Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)? Was the reference standard applied regardless of the diagnostic test result? Was the test (or cluster of tests) validated in a second, independent group of patients? From Sackett et al., Evidence-based Medicine,2nd ed. (NY: Churchill Livingstone), 2000. p 68

A clinical decision rule to identify children at low risk for appendicitis* (Problem 5.6 in EBD) Study design: prospective cohort study Subjects 4140 patients 3-18 years presenting to Boston Children’s Hospital ED with abdominal pain 767 (19%) received surgical consultation for possible appendicitis 113 Excluded (chronic diseases, recent imaging) 53 missed 601 included in the study (425 in derivation set) *Kharbanda et al. Pediatrics 2005; 116(3): 709-16

A clinical decision rule to identify children at low risk for appendicitis Predictor variables Standardized assessment by pediatric ED attending Focus on “Pain with percussion, hopping or cough” (complete data in N=381) Outcome variable: Pathologic diagnosis of appendicitis (or not) for those who received surgery (37%) Follow-up telephone call to family or pediatrician 2-4 weeks after the ED visit for those who did not receive surgery (63%) Kharbanda et al. Pediatrics 116(3): 709-16

A clinical decision rule to identify children at low risk for appendicitis Results: Pain with percussion, hopping or cough 78% sensitivity and 83% NPV seem low to me. Are they valid for me in deciding whom to image? Kharbanda et al. Pediatrics 116(3): 709-16

Checklist Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis? Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)? Was the reference standard applied regardless of the diagnostic test result? Was the test (or cluster of tests) validated in a second, independent group of patients? From Sackett et al., Evidence-based Medicine,2nd ed. (NY: Churchill Livingstone), 2000. p 68

In what direction would these biases affect results? Sample not representative (population referred to pedi surgery)? Verification bias? Double-gold standard bias? Spectrum bias Sample NOT representative. Prevalence of Appy too high for decision about imaging Verification bias probably operating – lack of pain with hopping would make me LESS likely to seek surgical consultation. But this would bias sensitivity UP. DGSB COULD be a bias, if some cases of appendicitis spontaneously resolve, but this would bias sensitivity and specificity UP Spectrum bias probably operates for Specificity, not Sensitivity. Presumably the non-appy cases referred to pedi surgery looked more like appendicitis, therefore likely to have higher FP rate for pain with hopping than those note studied

For children presenting with abdominal pain to SFGH 6-M Sensitivity probably valid (not falsely low) But whether all of the kids in the study tried to hop is not clear Specificity probably low PPV is too high NPV is too low Does not address surgical consultation decision