Welcome Back From Lunch

Slides:



Advertisements
Similar presentations
Appraising a diagnostic test study using a critical appraisal checklist Mahilum-Tapay L, et al. New point of care Chlamydia Rapid Test – bridging the gap.
Advertisements

UOG Journal Club: November 2012
Radiographer Reporting Presented by Chan Tsz Chun d Ng Kwan Ho d Wong Kin Long, Gulliver d.
1 Eloise E. Kaizar The Ohio State University Combining Information From Randomized and Observational Data: A Simulation Study June 5, 2008 Joel Greenhouse.
CLINICAL QUESTION By: Resident Name. 25 y/o male w/ hx of ___, ____, and ____, who presented w/ _______ found to have ________ admitted for ______ and.
Likelihood ratios Why these are the most thrilling statistics in Emergency Medicine.
Lecture 3 Validity of screening and diagnostic tests
Breast Ca (In Pregnancy) Diagnostic Aid: 1) Imaging (w/ Radiation Precautions) [Illustration Follows] Avoid radiation exposure to the fetus Nuclear scans.
How do we delay disease progress once it has started?
Critical Reading VTS 22/04/09. “How to Read a Paper”. Series of articles by Trisha Greenhalgh - published in the BMJ - also available as a book from BMJ.
Understanding Statistics in Research Articles Elizabeth Crabtree, MPH, PhD (c) Director of Evidence-Based Practice, Quality Management Assistant Professor,
Studying a Study and Testing a Test: Sensitivity Training, “Don’t Make a Good Test Bad”, and “Analyze This” Borrowed Liberally from Riegelman and Hirsch,
Is it True? Evaluating Research about Diagnostic Tests
Critically Evaluating the Evidence: diagnosis, prognosis, and screening Elizabeth Crabtree, MPH, PhD (c) Director of Evidence-Based Practice, Quality Management.
The PERC Rule. The paper Kline et al Journal of Thrombosis and Haemostasis 2008 Prospective Multicenter Evaluation of the Pulmonary Embolism Rule Out.
Azita Kheiltash Social Medicine Specialist Tehran University of Medical Sciences Diagnostic Tests Evaluation.
Developing an Answerable Question
Dr. Simon Benson GP Specialist Trainee. Introduction Diagnosis of pneumonia in children with wheeze is difficult Limited data exists regarding predictors.
Evaluation of Diagnostic Test Studies
Rapid Critical Appraisal of diagnostic accuracy studies Professor Paul Glasziou Centre for Evidence Based Medicine University of Oxford
Evidence Based Medicine for the Athletic Trainer: What is It?
Cohort Studies Hanna E. Bloomfield, MD, MPH Professor of Medicine Associate Chief of Staff, Research Minneapolis VA Medical Center.
1 Telba Irony, Ph.D. Mathematical Statistician Division of Biostatistics Statistical Analysis of InFUSE  Bone Graft/LT-Cage Lumbar Tapered Fusion Device.
Information Mastery: A Practical Approach to Evidence-Based Care Course Directors: Allen Shaughnessy, PharmD, MMedEd David Slawson, MD Tufts Health Care.
Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/27/2005 Coursebook Chapter 8 – Multiple.
Studies of Diagnostic Tests
1 Lecture 20: Non-experimental studies of interventions Describe the levels of evaluation (structure, process, outcome) and give examples of measures of.
HOW TO READ AN ARTICLE ABOUT A DIAGNOSTIC TEST Chitkara MB, Boykan R, Messina C Stony Brook Long Island Children’s Hospital.
Statistics in Screening/Diagnosis
Multiple Choice Questions for discussion
Thomas B. Newman, MD, MPH Andi Marmor, MD, MSEd October 21, 2010.
Diagnosis Articles Much Thanks to: Rob Hayward & Tanya Voth, CCHE.
When is it safe to forego a CT in kids with head trauma? (based on the article: Identification of children at very low risk of clinically- important brain.
1 Lecture 2 Screening and diagnostic tests Normal and abnormal Validity: “gold” or criterion standard Sensitivity, specificity, predictive value Likelihood.
Evidence Based Diagnosis Mark J. Pletcher, MD MPH 6/28/2012 Combining Tests.
Diagnostic Cases. Goals & Objectives Highlight Bayesian and Boolean processes used in classic diagnosis Demonstrate use/misuse of tests for screening.
EBM --- Journal Reading Presenter :李政鴻 Date : 2005/10/26.
Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 14, 2010.
Diagnostic Testing Ethan Cowan, MD, MS Department of Emergency Medicine Jacobi Medical Center Department of Epidemiology and Population Health Albert Einstein.
Studies of Medical Tests Thomas B. Newman, MD, MPH September 9, 2008.
Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 16, 2008.
Evidence Based Medicine Workshop Diagnosis March 18, 2010.
EBCP. Random vs Systemic error Random error: errors in measurement that lead to measured values being inconsistent when repeated measures are taken. Ie:
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 11, 2012.
+ Clinical Decision on a Diagnostic Test Inna Mangalindan. Block N. Class September 15, 2008.
This material was developed by Oregon Health & Science University, funded by the Department of Health and Human Services, Office of the National Coordinator.
Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 11, 2007.
Diagnosis: EBM Approach Michael Brown MD Grand Rapids MERC/ Michigan State University.
Appraising A Diagnostic Test
Evidence-Based Medicine – Definitions and Applications 1 Component 2 / Unit 5 Health IT Workforce Curriculum Version 1.0 /Fall 2010.
Welcome Back From Lunch. Thursday Afternoon 2:00-3:00 Studies of Diagnostic Test Accuracy (Tom) 3:00-3:45 Combining Tests (Mark) 3:45-4:00 Break 4:00-5:30.
Diagnostic Tests Studies 87/3/2 “How to read a paper” workshop Kamran Yazdani, MD MPH.
Unit 15: Screening. Unit 15 Learning Objectives: 1.Understand the role of screening in the secondary prevention of disease. 2.Recognize the characteristics.
Excluding the Diagnosis of Pulmonary Embolism: Is There a Magic Ball? COPYRIGHT © 2015, ALL RIGHTS RESERVED From the Publishers of.
Diagnostic Test Characteristics: What does this result mean
EBM --- Journal Reading Presenter :呂宥達 Date : 2005/10/27.
EBM --- Journal Reading Presenter :傅斯誠 Date : 2005/10/26.
Common Errors by Teachers and Proponents of EBM
Validation and Refinement of a Prediction Rule to Identify Children at Low Risk for Acute Appendicitis Kharbanda AB, Dudley NC, Bajaj L, et al; Pediatric.
EVALUATING u After retrieving the literature, you have to evaluate or critically appraise the evidence for its validity and applicability to your patient.
WELCOME BNP testing. Aims of this education package Better understanding of what BNP testing is How to appropriately use the test How to request the.
Screening Tests: A Review. Learning Objectives: 1.Understand the role of screening in the secondary prevention of disease. 2.Recognize the characteristics.
Critical Appraisal Course for Emergency Medicine Trainees Module 5 Evaluation of a Diagnostic Test.
Diagnosis Recitation. The Dilemma At the conclusion of my “diagnosis” presentation during the recent IAPA meeting, a gentleman from the audience asked.
Is suicide predictable? Paul St John-Smith Short Courses in Psychiatry 15/10/2008.
Diagnostic studies Adrian Boyle.
Diagnostic Test Studies
Appraising a diagnostic test study using a critical appraisal checklist Mahilum-Tapay L, et al. New point of care Chlamydia Rapid Test – bridging the gap.
Evidence Based Diagnosis
Presentation transcript:

Welcome Back From Lunch

Thursday Afternoon 1:30-2:15 Studies and Systematic Review of Diagnostic Test Accuracy (Tom) 2:15-3:00 Prognostic and Genetic Tests (Mark) 3:00-3:45 Combining Tests (Michael) 3:45-4:00 Break 4:00-6:00 Small Groups 6:00 Meet in 6702 to head to Giants game

Studies of Diagnostic Test Accuracy After lunch. Tom again or Michael to start with incorporation and spectrum bias?

Checklist Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis? Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)? Was the reference standard applied regardless of the diagnostic test result? Was the test (or cluster of tests) validated in a second, independent group of patients? From Sackett et al., Evidence-based Medicine,2nd ed. (NY: Churchill Livingstone), 2000. p 68

Beyond the Checklist Consider not only possibility of bias, but WHY it may occur and DIRECTION it would affect results Incorporation bias Spectrum bias Verification bias Double gold standard bias

Incorporation Bias When the test itself can be incorporated into the gold standard Prevented by blinding

Example: Study of BNP as a test for congestive heart failure (CHF)* Gold standard: determination of CHF by two cardiologists blinded to BNP “The best clinical predictor of congestive heart failure was an increased heart size on chest roentgenogram (accuracy, 81 percent)” Is there a problem with assessing accuracy of chest x-rays to diagnose CHF in this study? *Maisel AS, Krishnaswamy P, Nowak RM, McCord J, Hollander JE, Duc P, et al. Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart failure. N Engl J Med 2002;347(3):161-7. Problem 4.3

Incorporation bias Cardiologists not blinded to Chest X-ray Used (incorporated) Chest x-ray for CHF diagnosis Incorporation bias for assessment of Chest X-ray, not BNP

Spectrum of Disease and Nondisease Disease is often easier to diagnose if severe “Nondisease” is easier to diagnose if patient is well than if the patient has other diseases

Spectrum Bias Sensitivity depends on the spectrum of disease in the population being tested. Specificity depends on the spectrum of non-disease in the population being tested. Example: Absence of Nasal Bone (on 13-week ultrasound) as a Test for Chromosomal Abnormality

Spectrum Bias Example: Absence of Nasal Bone as a Test for Trisomy 21* Sensitivity = 229/333 = 69% Specificity = 5094/5223 = 97.5% BUT the D- group only included chromosomally normal fetuses Cicero et al., Ultrasound Obstet Gynecol 2004; 23: 218-23

Spectrum Bias: Absence of Nasal Bone as a Test for Chromosomal Abnormality D- group excluded 295 fetuses with other chromosomal abnormalities (esp. Trisomy 18) Among these fetuses, 32% had absent nasal bone (not 2.5%) What decision is this test supposed to help with? If it is whether to test chromosomes using chorionic villus sampling or amniocentesis, these 295 fetuses should be included in D+ group!

Spectrum Bias: Absence of Nasal Bone as a Test for Chromosomal Abnormality, effect of including other trisomies in D+ group Sensitivity = 324/628 = 52% NOT 69% obtained when the D+ group only included fetuses with Trisomy 21

Verification bias: Example Visual assessment of jaundice in newborns Study patients who are getting a bilirubin measurement Ask clinicians to estimate extent of jaundice at time of blood draw

Visual Assessment of jaundice*: Results Sensitivity of jaundice below the nipple line for bilirubin ≥ 12 mg/dL = 97% Specificity = 19% What is the problem? Editor’s Note: The take-home message for me is that no jaundice below the nipple line equals no bilirubin test, unless there’s some other indication. --Catherine D. DeAngelis, MD *Moyer et al., Archives Pediatr Adol Med 2000; 154:391

Verification Bias* Inclusion criterion for study: gold standard test was done in this case, blood test for bilirubin Subjects with positive index tests are more likely to be get the gold standard and to be included in the study clinicians usually don’t order blood test for bilirubin if there is little or no jaundice How does this affect sensitivity and specificity? *AKA Work-up, Referral Bias, or Ascertainment Bias

Verification Bias Effects TSB >12 TSB < 12 Jaundice below nipple a b No jaundice below nipple c  d  Don’t change slide yet Sensitivity, a/(a+c), is biased ___. Specificity, d/(b+d), is biased ___. *AKA Work-up, Referral Bias, or Ascertainment Bias

Double Gold Standard Bias-1 Two different “gold standards” One gold standard (e.g., surgery, invasive test) is more likely to be applied in patients with positive index test Other gold standard (e.g., clinical follow-up) is more likely to be applied in patients with a negative index test.

Double Gold Standard Bias- 2 There are some patients in whom the two “gold standards” do not give the same answer Spontaneously resolving disease (positive with immediate invasive test, but not with follow-up) Newly occurring or newly detectable disease (positive with follow-up but not with immediate invasive test)

Double Gold Standard Bias, example Study Population: All patients presenting to the ED who received a V/Q scan Test: V/Q Scan Disease: Pulmonary embolism (PE) Gold Standards: 1. Pulmonary arteriogram (PA-gram) if done (more likely with more abnormal V/Q scan) 2. Clinical follow-up in other patients (more likely with normal VQ scan What happens if some PE resolve spontaneously? *PIOPED. JAMA 1990;263(20):2753-9.

Effect of Double Gold Standard Bias 1: Spontaneously resolving disease Test result will always agree with gold standard Both sensitivity and specificity increase Example: Joe has a small pulmonary embolus (PE) that will resolve spontaneously. If his VQ scan is positive, he will get an angiogram that shows the PE (true positive) If his VQ scan is negative, his PE will resolve and we will think he never had one (true negative) VQ scan can’t be wrong!

Effect of Double Gold Standard Bias 2: Newly occurring or newly detectable disease Test result will always disagree with gold standard Both sensitivity and specificity decrease Example: Jane has or will soon get a nasty breast cancer that is currently undetectable If her mammogram is positive, she will get biopsies that will not find the tumor (mammogram will look falsely positive) If her mammogram is negative, she will return in several months and we will think the tumor was initially missed (mammogram will look falsely negative) Mammogram can’t be right!

Effect of Double Gold Standard Bias Newly occurring or newly detectable disease Sensitivity falsely decreased Specificity falsely decreased Spontaneously resolving disease Sensitivity falsely increased Specificity falsely increased

Sensitivity is falsely … Specificity is falsely … Bias Description Sensitivity is falsely … Specificity is falsely … Incorporation Gold standard incorporates index test. Spectrum D+ only includes “sickest of the sick” D- only includes “wellest of the well: Verification Positive index test makes gold standard more likely. Double Gold Standard Disease resolves spontaneously Disease become sdetectable during follow-up

Systematic Reviews of Diagnostic Accuracy Studies

Meta-analyses of Diagnostic Tests Systematic and reproducible approach to finding studies Summary of results of each study Investigation into heterogeneity Summary estimate of results, if appropriate Unlike other meta-analyses (risk factors, treatments), results aren’t summarized with a single number (e.g., RR), but with two related numbers (sensitivity and specificity) These can be plotted on an ROC plane

MRI for the diagnosis of MS Whiting et al. BMJ 2006;332:875-84

Dermoscopy vs Naked Eye for Diagnosis of Malignant Melanoma Dermoscopy performed unequivocally better in 7 of the 9 studies. Can you circle results for the 2 studies for which this was not the case? Br J Dermatol. 2008 Sep;159(3):669-76

Kharbanda et al. Pediatrics 2005; 116(3): 709-16 Example: A clinical decision rule to identify children at low risk for appendicitis (Problem 5.6) Study design: prospective cohort study Subjects 4140 patients 3-18 years presenting to Boston Children’s Hospital ED with abdominal pain Of these, 767 (19%) received surgical consultation for possible appendicitis 113 excluded (chronic diseases, recent imaging) 53 missed 601 included in the study (425 in derivation set) 9 Kharbanda et al. Pediatrics 2005; 116(3): 709-16

Kharbanda et al. Pediatrics 2005; 116(3): 709-16 A clinical decision rule to identify children at low risk for appendicitis Predictor variable Standardized assessment by pediatric ED attending Focus on “Pain with percussion, hopping or cough” (complete data in N=381) Outcome variable: Pathologic diagnosis of appendicitis (or not) for those who received surgery (37%) Follow-up telephone call to family or pediatrician 2-4 weeks after the ED visit for those who did not receive surgery (63%) Kharbanda et al. Pediatrics 2005; 116(3): 709-16

Kharbanda et al. Pediatrics 2005; 116(3): 709-16 A clinical decision rule to identify children at low risk for appendicitis Results: Pain with percussion, hopping or cough 78% sensitivity and 83% NPV seem low to me. Are they valid for me in deciding whom to image? Kharbanda et al. Pediatrics 2005; 116(3): 709-16

Checklist Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis? Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)? Was the reference standard applied regardless of the diagnostic test result? Was the test (or cluster of tests) validated in a second, independent group of patients? From Sackett et al., Evidence-based Medicine,2nd ed. (NY: Churchill Livingstone), 2000. p 68

In what direction would these biases affect results? Sample not representative (population referred to pedi surgery)? Verification bias? Double-gold standard bias? Spectrum bias Sample NOT representative. Prevalence of Appy too high for decision about imaging Verification bias probably operating – lack of pain with hopping would make me LESS likely to seek surgical consultation. But this would bias sensitivity UP. DGSB COULD be a bias, if some cases of appendicitis spontaneously resolve, but this would bias sensitivity and specificity UP Spectrum bias probably operates for Specificity, not Sensitivity. Presumably the non-appy cases referred to pedi surgery looked more like appendicitis, therefore likely to have higher FP rate for pain with hopping than those note studied

For children presenting with abdominal pain to SFGH 6-M Sensitivity probably valid (not falsely low) But whether all of them tried to hop is not clear Specificity probably low PPV is too high NPV is too low Does not address surgical consultation decision

Prognostic and Genetic Tests (Mark)