Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 14, 2010.

Slides:



Advertisements
Similar presentations
Appraising a diagnostic test study using a critical appraisal checklist Mahilum-Tapay L, et al. New point of care Chlamydia Rapid Test – bridging the gap.
Advertisements

Welcome Back From Lunch
Lecture 3 Validity of screening and diagnostic tests
Understanding Statistics in Research Articles Elizabeth Crabtree, MPH, PhD (c) Director of Evidence-Based Practice, Quality Management Assistant Professor,
Studying a Study and Testing a Test: Sensitivity Training, “Don’t Make a Good Test Bad”, and “Analyze This” Borrowed Liberally from Riegelman and Hirsch,
TESTING A TEST Ian McDowell Department of Epidemiology & Community Medicine November, 2004.
Is it True? Evaluating Research about Diagnostic Tests
Critically Evaluating the Evidence: diagnosis, prognosis, and screening Elizabeth Crabtree, MPH, PhD (c) Director of Evidence-Based Practice, Quality Management.
The PERC Rule. The paper Kline et al Journal of Thrombosis and Haemostasis 2008 Prospective Multicenter Evaluation of the Pulmonary Embolism Rule Out.
Dr Ali Tompkins,ST6 East and North Herts Hospitals Sensitivity of Computed Tomography Performed Within Six Hours of Onset of Headache for Diagnosis of.
Dr. Simon Benson GP Specialist Trainee. Introduction Diagnosis of pneumonia in children with wheeze is difficult Limited data exists regarding predictors.
Evaluation of Diagnostic Test Studies
Rapid Critical Appraisal of diagnostic accuracy studies Professor Paul Glasziou Centre for Evidence Based Medicine University of Oxford
Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple.
Journal Club Alcohol and Health: Current Evidence November-December 2005.
BS704 Class 7 Hypothesis Testing Procedures
By Dr. Ahmed Mostafa Assist. Prof. of anesthesia & I.C.U. Evidence-based medicine.
Sample Size Determination
Cohort Studies Hanna E. Bloomfield, MD, MPH Professor of Medicine Associate Chief of Staff, Research Minneapolis VA Medical Center.
1 Telba Irony, Ph.D. Mathematical Statistician Division of Biostatistics Statistical Analysis of InFUSE  Bone Graft/LT-Cage Lumbar Tapered Fusion Device.
Information Mastery: A Practical Approach to Evidence-Based Care Course Directors: Allen Shaughnessy, PharmD, MMedEd David Slawson, MD Tufts Health Care.
Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/27/2005 Coursebook Chapter 8 – Multiple.
Studies of Diagnostic Tests
1 Lecture 20: Non-experimental studies of interventions Describe the levels of evaluation (structure, process, outcome) and give examples of measures of.
Statistics in Screening/Diagnosis
Multiple Choice Questions for discussion
Critical Reading. Critical Appraisal Definition: assessment of methodological quality If you are deciding whether a paper is worth reading – do so on.
Thomas B. Newman, MD, MPH Andi Marmor, MD, MSEd October 21, 2010.
Diagnosis Articles Much Thanks to: Rob Hayward & Tanya Voth, CCHE.
When is it safe to forego a CT in kids with head trauma? (based on the article: Identification of children at very low risk of clinically- important brain.
1 Lecture 2 Screening and diagnostic tests Normal and abnormal Validity: “gold” or criterion standard Sensitivity, specificity, predictive value Likelihood.
Evidence Based Diagnosis Mark J. Pletcher, MD MPH 6/28/2012 Combining Tests.
EBM --- Journal Reading Presenter :李政鴻 Date : 2005/10/26.
Basic statistics 11/09/13.
Studies of Medical Tests Thomas B. Newman, MD, MPH September 9, 2008.
Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 16, 2008.
Evidence Based Medicine Workshop Diagnosis March 18, 2010.
Screening and Diagnostic Testing Sue Lindsay, Ph.D., MSW, MPH Division of Epidemiology and Biostatistics Institute for Public Health San Diego State University.
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 11, 2012.
+ Clinical Decision on a Diagnostic Test Inna Mangalindan. Block N. Class September 15, 2008.
Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 11, 2007.
Diagnosis: EBM Approach Michael Brown MD Grand Rapids MERC/ Michigan State University.
Appraising A Diagnostic Test
Wipanee Phupakdi, MD September 15, Overview  Define EBM  Learn steps in EBM process  Identify parts of a well-built clinical question  Discuss.
Critical Reading. Critical Appraisal Definition: assessment of methodological quality If you are deciding whether a paper is worth reading – do so on.
TESTING A TEST Ian McDowell Department of Epidemiology & Community Medicine January 2008.
Welcome Back From Lunch. Thursday Afternoon 2:00-3:00 Studies of Diagnostic Test Accuracy (Tom) 3:00-3:45 Combining Tests (Mark) 3:45-4:00 Break 4:00-5:30.
Positive Predictive Value and Negative Predictive Value
Diagnostic Tests Studies 87/3/2 “How to read a paper” workshop Kamran Yazdani, MD MPH.
SCH Journal Club Use of time from fever onset improves the diagnostic accuracy of C-reactive protein in identifying bacterial infections Wednesday 13 th.
Excluding the Diagnosis of Pulmonary Embolism: Is There a Magic Ball? COPYRIGHT © 2015, ALL RIGHTS RESERVED From the Publishers of.
Diagnostic Test Characteristics: What does this result mean
EBM --- Journal Reading Presenter :呂宥達 Date : 2005/10/27.
Evidence based medicine Diagnostic tests Ross Lawrenson.
Journal club Diagnostic accuracy of Urinalysis for UTI in Infants
Common Errors by Teachers and Proponents of EBM
Validation and Refinement of a Prediction Rule to Identify Children at Low Risk for Acute Appendicitis Kharbanda AB, Dudley NC, Bajaj L, et al; Pediatric.
EVALUATING u After retrieving the literature, you have to evaluate or critically appraise the evidence for its validity and applicability to your patient.
NAAT identified chlamydial infections: Enhanced sensitivity, reduced transmissibility? Presenter: Maria Villarroel, MA Authors: Maria A. Villarroel, MA.
Pulmonary Embolism in Patients with Unexplained Exacerbation of COPD: Prevalence and Risk Factors Isabelle Tillie-Leblond, MD, PhD; Charles-Hugo Marquette,
Screening Tests: A Review. Learning Objectives: 1.Understand the role of screening in the secondary prevention of disease. 2.Recognize the characteristics.
Critical Appraisal Course for Emergency Medicine Trainees Module 5 Evaluation of a Diagnostic Test.
Diagnosis Recitation. The Dilemma At the conclusion of my “diagnosis” presentation during the recent IAPA meeting, a gentleman from the audience asked.
Diagnostic studies Adrian Boyle.
Diagnostic Test Studies
When is the post-test probability sufficient for decision-making?
Refining Probability Test Informations Vahid Ashoorion MD. ,MSc,
Lecture 4 Study design and bias in screening and diagnostic tests
Evidence Based Diagnosis
Presentation transcript:

Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 14, 2010

Reminders/Announcements n Write down answers to as many of the problems in the book as you can and check your answers! n Final exam to be passed out 12/2, reviewed 12/9 –Send questions!

Overview n Common biases of studies of diagnostic test accuracy n Prevalence, spectrum and nonindependence n Meta-analysis of diagnostic tests n Checklist & systematic approach n Examples: –Pain with percussion, hopping or cough for appendicitis –Pertussis

Bias #1 Example n Study of BNP to diagnose congestive heart failure (CHF, Chapter 4, Problem 3)

Bias #1 Example n Gold standard: determination of CHF by two cardiologists blinded to BNP n “The best clinical predictor of congestive heart failure was an increased heart size on chest roentgenogram (accuracy, 81 percent)” n Is there a problem with assessing accuracy of chest x-rays to diagnose CHF in this study? *Maisel AS, Krishnaswamy P, Nowak RM, McCord J, Hollander JE, Duc P, et al. Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart failure. N Engl J Med 2002;347(3):161-7.

Bias #1: Incorporation bias n Cardiologists not blinded to chest X-ray n Probably used (incorporated) it to make final diagnosis n Incorporation bias for assessment of Chest X-ray (not BNP) n Biases both sensitivity and specificity upward

Bias #2 Example: n Visual assessment of jaundice in newborns –Study patients who are getting a bilirubin measurement –Ask clinicians to estimate extent of jaundice at time of blood draw

Visual Assessment of jaundice*: Results *Moyer et al., APAM 2000; 154:391 n Sensitivity of jaundice below the nipple line for bilirubin ≥ 12 mg/dL = 97% n Specificity = 19% n What is the problem? Editor’s Note: The take-home message for me is that no jaundice below the nipple line equals no bilirubin test, unless there’s some other indication. --Catherine D. DeAngelis, MD

Bias #2: Verification Bias* -1 n Inclusion criterion for study: gold standard test was done –in this case, blood test for bilirubin n Subjects with positive index tests are more likely to be get the gold standard and to be included in the study –clinicians usually don’t order blood test for bilirubin if there is little or no jaundice n How does this affect sensitivity and specificity? *AKA Work-up, Referral Bias, or Ascertainment Bias

Bias #2: Verification Bias TSB >12TSB < 12 Jaundice below nipple ab No jaundice below nipple c  d  Sensitivity, a/(a+c), is biased ___. Specificity, d/(b+d), is biased ___. *AKA Work-up, Referral Bias, or Ascertainment Bias But is sensitivity what we really want to know to support Cathy’s conclusion?

Bias #2: Verification Bias TSB >12TSB < 12 Jaundice below nipple ab No jaundice below nipple c  d  n Negative predictive value was 94%. Is it biased? n The “Test negative” group (no jaundice) that still gets the gold standard may have other risk factors or indications n Therefore, c may be too high relative to d and NPV may be underestimated

Bias #3 n Example: Pioped study of accuracy of ventilation/perfusion (V/Q) scan to diagnose pulmonary embolus* n Study Population: All patients presenting to the ED who received a V/Q scan n Test: V/Q Scan n Disease: Pulmonary embolism (PE) n Gold Standards: –1. Pulmonary arteriogram (PA-gram) if done (more likely with more abnormal V/Q scan) –2. Clinical follow-up in other patients (more likely with normal VQ scan *PIOPED. JAMA 1990;263(20):

Double Gold Standard Bias n Two different “gold standards” –One gold standard (usually an immediate, more invasive test, e.g., angiogram, surgery) is more likely to be applied in patients with positive index test –Second gold standard (e.g., clinical follow-up) is more likely to be applied in patients with a negative index test.

Double Gold Standard Bias n There are some patients in whom the two “gold standards” do not give the same answer –Spontaneously resolving disease (positive with immediate invasive test, but not with follow-up) –Newly occurring or newly detectable disease (positive with follow-up but not with immediate invasive test)

Effect of Double Gold Standard Bias 1: Spontaneously resolving disease n Test result will always agree with gold standard n Both sensitivity and specificity increase n Example: Joe has a small pulmonary embolus (PE) that will resolve spontaneously. –If his VQ scan is positive, he will get an angiogram that shows the PE (true positive) –If his VQ scan is negative, his PE will resolve and we will think he never had one (true negative) n VQ scan can’t be wrong!

Effect of Double Gold Standard Bias 2: Newly occurring or newly detectable disease n Test result will always disagree with gold standard n Both sensitivity and specificity decrease n Example: Jane has a nasty breast cancer that is currently undetectable –If her mammogram is positive, she will get biopsies that will not find the tumor (mammogram will look falsely positive) –If her mammogram is negative, she will return in several months an we will think the tumor was initially missed (mammogram will look falsely negative) n Mammogram can’t be right!

Spectrum of Disease, Nondisease and Test Results n Disease is often easier to diagnose if severe n “Nondisease” is easier to diagnose if patient is well than if the patient has other diseases n Test results will be more reproducible if ambiguous results excluded

Spectrum Bias n Sensitivity depends on the spectrum of disease in the population being tested. n Specificity depends on the spectrum of non-disease in the population being tested. n Example: Absence of Nasal Bone (on 13-week ultrasound) as a Test for Chromosomal Abnormality

Spectrum Bias Example: Absence of Nasal Bone as a Test for Chromosomal Abnormality* Sensitivity = 229/333 = 69% BUT the D+ group only included fetuses with Trisomy 21 Cicero et al., Ultrasound Obstet Gynecol 2004; 23:

n The D+ group excluded 295 fetuses with other chromosomal abnormalities (mainly Trisomy 18) n Among these fetuses, the sensitivity of nasal bone absence was 32% (not 69%) n What decision is this test supposed to help with? –If it is whether to test chromosomes using chorionic villus sampling or amniocentesis, these 295 fetuses should be included! Spectrum Bias: Absence of Nasal Bone as a Test for Chromosomal Abnormality

Sensitivity = 324/628 = 52% NOT 69% obtained when the D+ group only included fetuses with Trisomy 21 Spectrum Bias: Absence of Nasal Bone as a Test for Chromosomal Abnormality, effect of including other trisomies in D+ group

Quiz: What if we considered the nasal bone absence as a test for Trisomy 21? n Then instead of excluding subjects with other chromosomal abnormalities or including them as D+, we should count them as D-. Compared with excluding them, n What would happen to sensitivity? n What would happen to specificity?

Quiz: What if we considered the nasal bone absence as a test for Trisomy 21? Nasal Bone AbsentD+ D- Yes =573 No =4945 Total =608 n Sensitivity unchanged n Specificity reduced

Prevalence, spectrum and nonindependence n Prevalence (prior probability) of disease may be related to disease severity n One mechanism is different spectra of disease or nondisease n Another is that whatever is causing the high prior probability is related to the same aspect of the disease as the test

Prevalence, spectrum and nonindependence n Examples –Iron deficiency, HIV –Diseases identified by screening n Urinalysis as a test for UTI in women with more and fewer symptoms (high and low prior probability)

Overfitting

Meta-analyses of Diagnostic Tests n Systematic and reproducible approach to finding studies n Summary of results of each study n Investigation into heterogeneity n Summary estimate of results, if appropriate n Unlike other meta-analyses (risk factors, treatments), results aren’t summarized with a single number (e.g., RR), but with two related numbers (sensitivity and specificity) n These can be plotted on an ROC plane

MRI for the diagnosis of MS Whiting et al. BMJ 2006;332:875-84

Dermoscopy vs Naked Eye for Diagnosis of Malignant Melanoma Br J Dermatol Sep;159(3):669-76

Studies of Diagnostic Test Accuracy: Checklist n Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis? n Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)? n Was the reference standard applied regardless of the diagnostic test result? n Was the test (or cluster of tests) validated in a second, independent group of patients? From Sackett et al., Evidence-based Medicine,2 nd ed. (NY: Churchill Livingstone), p 68

Systematic Approach n Authors and funding source n Research question n Study design n Study subjects n Predictor variable n Outcome variable n Results & Analysis n Conclusions Consider possible biases due to deviations from a perfect study and estimate the magnitude and direction of each

A clinical decision rule to identify children at low risk for appendicitis (Problem 5.6) n Study design: prospective cohort study n Subjects –4140 patients 3-18 years presenting to Boston Children’s Hospital ED with abdominal pain –Of these, 767 (19%) received surgical consultation for possible appendicitis –113 Excluded (Chronic diseases, recent imaging) –53 missed –601 included in the study (425 in derivation set) Kharbanda et al. Pediatrics 2005; 116(3):

A clinical decision rule to identify children at low risk for appendicitis n Predictor variable –Standardized assessment by pediatric ED attending –Focus on “Pain with percussion, hopping or cough” (complete data in N=381) n Outcome variable: –Pathologic diagnosis of appendicitis (or not) for those who received surgery (37%) –Follow-up telephone call to family or pediatrician 2-4 weeks after the ED visit for those who did not receive surgery (63%) Kharbanda et al. Pediatrics 116(3):

A clinical decision rule to identify children at low risk for appendicitis n Results: Pain with percussion, hopping or cough n 78% sensitivity and 83% NPV seem low to me. Are they valid for me in deciding whom to image? Kharbanda et al. Pediatrics 116(3):

Checklist n Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis? n Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)? n Was the reference standard applied regardless of the diagnostic test result? n Was the test (or cluster of tests) validated in a second, independent group of patients? From Sackett et al., Evidence-based Medicine,2 nd ed. (NY: Churchill Livingstone), p 68

In what direction would these biases affect results? n Sample not representative (population referred to pedi surgery)? n Verification bias? n Double-gold standard bias? n Spectrum bias

For children presenting with abdominal pain to SFGH 6-M n Sensitivity probably valid (not falsely low) –But whether all of them tried to hop is not clear n Specificity probably low n PPV is too high n NPV is too low n Does not address surgical consultation decision

Does this coughing patient have pertussis?* n RQ (for us): what are LR for coughing fits, whoop, and post-tussive vomiting in adults with persistent cough? n Design (for one study we reviewed**): Prospective cross-sectional study n Subjects: 217 adults ≥18 years with cough days, no fever or other clear cause for cough enrolled by 80 French GPs. –In a subsample from 58 GPs, of 710 who met inclusion criteria only 99 (14%) enrolled * Cornia et al. JAMA 2010;304(8): **Gilberg S et al. J Inf Dis 2002;186:415-8

Petussis diagnosis n Predictor variables: “GPs interviewed patients using a standardized questionnaire.” n Outcome variable: Evidence of pertussis based on –Culture (N=1) –PCR (N=36) –Or ≥ 2-fold change in anti-pertussis toxin IgG (N=40) –Total N = 70/217 with evidence of pertussis (32%) *Gilberg S et al. J Inf Dis 2002;186:415-8

Results n 89% in both groups met CDC criteria for pertussis

Issues n Verification (selection) bias: only 14% of eligible subjects included n Questionable gold standard –2-fold dilution too small –Increase or decrease counted –Internally inconsistent: pts with + PCR no more likely to have change in Ab titres.

Questions?

Additional slides

Double Gold Standard Bias: effect of spontaneously resolving disease PE +PE - V/Q Scan +ab V/Q Scan -cd Sensitivity, a/(a+c) biased __ Specificity, d/(b+d) biased __ Double gold standard compared with immediate invasive test for all Double gold standard compared with follow-up for all

Double Gold Standard Bias: effect of newly occurring cases PE +PE - V/Q Scan +ab V/Q Scan -cd Sensitivity, a/(a+c) biased __ Specificity, d/(b+d) biased __ Double gold standard compared with PA-Gram for all Double gold standard compared with follow-up for all

Double Gold Standard Bias: Ultrasound diagnosis of intussusception

What if 10% of the 86 U/S- followed subjects actually had intussusceptions that resolved spontaneously?