Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 16, 2008.

Slides:



Advertisements
Similar presentations
Welcome Back From Lunch
Advertisements

Sample size estimation
Does early Computerised Tomography exclude fracture in ‘Clinical Scaphoid Fracture’? Dr. Mark Harris Dr Jaycen Cruickshank Department of Orthopaedics,
Critical Reading VTS 22/04/09. “How to Read a Paper”. Series of articles by Trisha Greenhalgh - published in the BMJ - also available as a book from BMJ.
Understanding Statistics in Research Articles Elizabeth Crabtree, MPH, PhD (c) Director of Evidence-Based Practice, Quality Management Assistant Professor,
Studying a Study and Testing a Test: Sensitivity Training, “Don’t Make a Good Test Bad”, and “Analyze This” Borrowed Liberally from Riegelman and Hirsch,
GP 4001 Lecture Series Dealing with undifferentiated problems in primary care II.
Is it True? Evaluating Research about Diagnostic Tests
Critically Evaluating the Evidence: diagnosis, prognosis, and screening Elizabeth Crabtree, MPH, PhD (c) Director of Evidence-Based Practice, Quality Management.
1 Case-Control Study Design Two groups are selected, one of people with the disease (cases), and the other of people with the same general characteristics.
Dr Ali Tompkins,ST6 East and North Herts Hospitals Sensitivity of Computed Tomography Performed Within Six Hours of Onset of Headache for Diagnosis of.
Dr. Simon Benson GP Specialist Trainee. Introduction Diagnosis of pneumonia in children with wheeze is difficult Limited data exists regarding predictors.
Evaluation of Diagnostic Test Studies
Rapid Critical Appraisal of diagnostic accuracy studies Professor Paul Glasziou Centre for Evidence Based Medicine University of Oxford
Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple.
Journal Club Alcohol and Health: Current Evidence November-December 2005.
Biostatistics. But why? Why do we read scientific litterature? How do we read scientific litterature?
By Dr. Ahmed Mostafa Assist. Prof. of anesthesia & I.C.U. Evidence-based medicine.
Evidence Based Medicine for the Athletic Trainer: What is It?
Cohort Studies Hanna E. Bloomfield, MD, MPH Professor of Medicine Associate Chief of Staff, Research Minneapolis VA Medical Center.
1 Telba Irony, Ph.D. Mathematical Statistician Division of Biostatistics Statistical Analysis of InFUSE  Bone Graft/LT-Cage Lumbar Tapered Fusion Device.
Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/27/2005 Coursebook Chapter 8 – Multiple.
STrengthening the Reporting of OBservational Studies in Epidemiology
Studies of Diagnostic Tests
Statistics in Screening/Diagnosis
BASIC STATISTICS: AN OXYMORON? (With a little EPI thrown in…) URVASHI VAID MD, MS AUG 2012.
Multiple Choice Questions for discussion
 Be familiar with the types of research study designs  Be aware of the advantages, disadvantages, and uses of the various research design types  Recognize.
Prevalence of Retinal Haemorrhages in Critically Ill Children Journal Club Tuesday 26 th June 2012 Louise Ramsden.
Thomas B. Newman, MD, MPH Andi Marmor, MD, MSEd October 21, 2010.
Diagnosis Articles Much Thanks to: Rob Hayward & Tanya Voth, CCHE.
1 Lecture 2 Screening and diagnostic tests Normal and abnormal Validity: “gold” or criterion standard Sensitivity, specificity, predictive value Likelihood.
Evidence Based Diagnosis Mark J. Pletcher, MD MPH 6/28/2012 Combining Tests.
EBM --- Journal Reading Presenter :李政鴻 Date : 2005/10/26.
Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 14, 2010.
Diagnostic Testing Ethan Cowan, MD, MS Department of Emergency Medicine Jacobi Medical Center Department of Epidemiology and Population Health Albert Einstein.
Studies of Medical Tests Thomas B. Newman, MD, MPH September 9, 2008.
Study design P.Olliaro Nov04. Study designs: observational vs. experimental studies What happened?  Case-control study What’s happening?  Cross-sectional.
Evidence Based Medicine Workshop Diagnosis March 18, 2010.
Screening and Diagnostic Testing Sue Lindsay, Ph.D., MSW, MPH Division of Epidemiology and Biostatistics Institute for Public Health San Diego State University.
CAT 3 Harm, Causation Maribeth Chitkara, MD Rachel Boykan, MD.
Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 11, 2012.
+ Clinical Decision on a Diagnostic Test Inna Mangalindan. Block N. Class September 15, 2008.
Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 11, 2007.
INTRODUCTION Upper respiratory tract infections, including acute pharyngitis, are common in general practice. Although the most common cause of pharyngitis.
Diagnosis: EBM Approach Michael Brown MD Grand Rapids MERC/ Michigan State University.
Lecture 7 Objective 18. Describe the elements of design of observational studies: case ‑ control studies (retrospective studies). Discuss the advantages.
Clinical Writing for Interventional Cardiologists.
Appraising A Diagnostic Test
Wipanee Phupakdi, MD September 15, Overview  Define EBM  Learn steps in EBM process  Identify parts of a well-built clinical question  Discuss.
Division of Population Health Sciences Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn A Systematic Review and Meta-Analysis of.
Welcome Back From Lunch. Thursday Afternoon 2:00-3:00 Studies of Diagnostic Test Accuracy (Tom) 3:00-3:45 Combining Tests (Mark) 3:45-4:00 Break 4:00-5:30.
Screening and its Useful Tools Thomas Songer, PhD Basic Epidemiology South Asian Cardiovascular Research Methodology Workshop.
Diagnostic Tests Studies 87/3/2 “How to read a paper” workshop Kamran Yazdani, MD MPH.
SCH Journal Club Use of time from fever onset improves the diagnostic accuracy of C-reactive protein in identifying bacterial infections Wednesday 13 th.
Diagnostic Test Characteristics: What does this result mean
EBM --- Journal Reading Presenter :呂宥達 Date : 2005/10/27.
Screening.  “...the identification of unrecognized disease or defect by the application of tests, examinations or other procedures...”  “...sort out.
Common Errors by Teachers and Proponents of EBM
Validation and Refinement of a Prediction Rule to Identify Children at Low Risk for Acute Appendicitis Kharbanda AB, Dudley NC, Bajaj L, et al; Pediatric.
EVALUATING u After retrieving the literature, you have to evaluate or critically appraise the evidence for its validity and applicability to your patient.
FAST Exam Versus CT Scan in the Diagnosis of Interperitoneal Injury in a Hemodynamically Stable Patient With Blunt Abdominal Trauma: A Systematic Review.
© 2010 Jones and Bartlett Publishers, LLC. Chapter 12 Clinical Epidemiology.
Critical Appraisal Course for Emergency Medicine Trainees Module 5 Evaluation of a Diagnostic Test.
Diagnosis Recitation. The Dilemma At the conclusion of my “diagnosis” presentation during the recent IAPA meeting, a gentleman from the audience asked.
Is suicide predictable? Paul St John-Smith Short Courses in Psychiatry 15/10/2008.
Diagnostic studies Adrian Boyle.
Diagnostic Test Studies
Evidence Based Diagnosis
UOG Journal Club: October 2019
Presentation transcript:

Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 16, 2008

Reminders/Announcements n Corrected page proofs of all of EBD are now on the web –Tell us if you find additional mistakes, ASAP –Index is a mess; if you look for things there and do not find them, let us know n Final exam to be passed out 12/4, reviewed 12/11 –Send questions!

Overview n Common biases of studies of diagnostic test accuracy –Incorporation bias –Verification bias –Double gold standard bias –Spectrum bias n Prevalence, spectrum and nonindependence n Meta-analysis of diagnostic tests n Checklist & systematic approach n Examples: –Physical examination for presentation –Pain with percussion, hopping or cough for appendicitis

Incorporation bias n Recall study of BNP to diagnose congestive heart failure (CHF, Chapter 4, Problem 3)

Incorporation Bias n Gold standard: determination of CHF by two cardiologists blinded to BNP n Chest X-ray found to be highly predictive of CHF, but cardiologists not blinded to Chest X-ray n Incorporation bias for assessment of Chest X-ray, not BNP *Maisel AS, Krishnaswamy P, Nowak RM, McCord J, Hollander JE, Duc P, et al. Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart failure. N Engl J Med 2002;347(3):161-7.

Verification Bias* n Inclusion criterion: gold standard was applied n Subjects with positive index tests are more likely to be referred for the gold standard n Example: V/Q Scan as a test for pulmonary embolism (PE; blood clot in lungs) –Gold standard is a pulmonary arteriogram –Retrospective study of patients receiving arteriograms to rule out PE –Patients with negative V/Q scans less likely to be referred for PA-gram n Many additional examples –E.g., visual assessment of jaundice mentioned in DCR *AKA Work-up, Referral Bias, or Ascertainment Bias

Verification Bias PA-gram+PA-gram- V/Q Scan +ab V/Q Scan - c  d  Sensitivity, a/(a+c), is biased ___. Specificity, d/(b+d), is biased ___.

Double Gold Standard Bias n Two different “gold standards” –One gold standard (e.g., surgery, invasive test) is more likely to be applied in patients with positive index test, –Other gold standard (e.g., clinical follow-up) is more likely to be applied in patients with a negative index test. n There are some patients in whom the tests do not give the same answer –spontaneously resolving disease –newly occurring disease

Double Gold Standard Bias, example n Study Population: All patients presenting to the ED who received a V/Q scan n Test: V/Q Scan n Disease: Pulmonary embolism (PE) n Gold Standards: –1. Pulmonary arteriogram (PA-gram) if done (more likely with more abnormal V/Q scan) –2. Clinical follow-up in other patients (more likely with normal VQ scan n What happens if some PEs resolve spontaneously? *PIOPED. JAMA 1990;263(20):

Double Gold Standard Bias: effect of spontaneously resolving cases PE +PE - V/Q Scan +ab V/Q Scan -cd Sensitivity, a/(a+c) biased __ Specificity, d/(b+d) biased __ Double gold standard compared with PA-Gram for all Double gold standard compared with follow-up for all

Double Gold Standard Bias: effect of newly occurring cases PE +PE - V/Q Scan +ab V/Q Scan -cd Sensitivity, a/(a+c) biased __ Specificity, d/(b+d) biased __ Double gold standard compared with PA-Gram for all Double gold standard compared with follow-up for all

Double Gold Standard Bias: Ultrasound diagnosis of intussusception

What if 10% resolve spontaneously?

Spectrum of Disease, Nondisease and Test Results n Disease is often easier to diagnose if severe n “Nondisease” is easier to diagnose if patient is well than if the patient has other diseases n Test results will be more reproducible if ambiguous results excluded

Spectrum Bias n Sensitivity depends on the spectrum of disease in the population being tested. n Specificity depends on the spectrum of non-disease in the population being tested. n Example: Absence of Nasal Bone (on 13-week ultrasound) as a Test for Chromosomal Abnormality

Spectrum Bias Example: Absence of Nasal Bone as a Test for Chromosomal Abnormality* Sensitivity = 229/333 = 69% BUT the D+ group only included fetuses with Trisomy 21 Cicero et al., Ultrasound Obstet Gynecol 2004; 23:

n D+ group excluded 295 fetuses with other chromosomal abnormalities (esp. Trisomy 18) n Among these fetuses, sensitivity 32% (not 69%) n What decision is this test supposed to help with? –If it is whether to test chromosomes using chorionic villus sampling or amniocentesis, these 295 fetuses should be included! Spectrum Bias: Absence of Nasal Bone as a Test for Chromosomal Abnormality

Sensitivity = 324/628 = 52% NOT 69% obtained when the D+ group only included fetuses with Trisomy 21 Spectrum Bias: Absence of Nasal Bone as a Test for Chromosomal Abnormality, effect of including other trisomies in D+ group

Quiz: What if we considered the nasal bone absence as a test for Trisomy 21? n Then instead of excluding subjects with other chromosomal abnormalities or including them as D+, we should count them as D-. Compared with excluding them, n What would happen to sensitivity? n What would happen to specificity?

Prevalence, spectrum and nonindependence n Prevalence (prior probability) of disease may be related to disease severity n One mechanism is different spectra of disease or nondisease n Another is that whatever is causing the high prior probability is related to the same aspect of the disease as the test

Prevalence, spectrum and nonindependence n Examples –Iron deficiency –Diseases identified by screening n Urinalysis as a test for UTI in women with more and fewer symptoms (high and low prior probability)

Meta-analyses of Diagnostic Tests n Systematic and reproducible approach to finding studies n Summary of results of each study n Investigation into heterogeneity n Summary estimate of results, if appropriate n Unlike other meta-analyses (risk factors, treatments), results aren’t summarized with a single number (e.g., RR), but with two related numbers (sensitivity and specificity) n These can be plotted on an ROC plane

MRI for the diagnosis of MS Whiting et al. BMJ 2006;332:875-84

Studies of Diagnostic Test Accuracy: Checklist n Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis? n Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)? n Was the reference standard applied regardless of the diagnostic test result? n Was the test (or cluster of tests) validated in a second, independent group of patients? From Sackett et al., Evidence-based Medicine,2 nd ed. (NY: Churchill Livingstone), p 68

Systematic Approach n Authors and funding source n Research question –Relevance? –What decision is the test supposed to help you make? n Study design –Timing of measurements of predictor and outcome –Cross-sectional vs “case-control sampling

Systematic Approach, cont’d n Study subjects –Disease subjects representative? –Nondiseased subjects representative? –If not, in what direction will results be affected? n Predictor variable –How was the test done? –Is it difficult? –Will it be done as well in your setting?

Systematic Approach, cont’d n Outcome variable –Is the “Gold Standard” really gold? –Were those measuring it blinded to results of the index test? n Results& Analysis –Were all subjects analyzed –If predictive value was reported, is prevalence similar to your population –Would clinical implications change depending on location of true result within confidence intervals? n Conclusions –Do they go beyond data? –Do they apply to patients in your setting?

Diagnostic Accuracy of Clinical Examination for Detection of Non- cephalic Presentation in Late Pregnancy* n RQ: (above) –important to know presentation before onset of labor to know whether to try external version n Study design: Cross sectional study n Subjects: –1633 women with singleton pregnancies at weeks at antenatal clinics at a Women’s and Babies Hospital in Australia –96% of those eligible for the study consented *BMJ 2006;333:578-80

Diagnostic Accuracy of Clinical Examination for Detection of Non- cephalic Presentation in Late Pregnancy* n Predictor variable –Clinical examination by one of more than 60 clinicians residents or registrars 55% midwives 28% obstetricians 17% –Results classified as cephalic or noncephalic n Outcome variable: presentation by ultrasound, blinded to clinical examination *BMJ 2006;333:578-80

Diagnostic Accuracy of Clinical Examination for Detection of Non- cephalic Presentation in Late Pregnancy* n Results n No significant differences in accuracy by experience level n Conclusions: clinical examination is not sensitive enough *BMJ 2006;333:578-80

Diagnostic Accuracy of Clinical Examination for Detection of Non- cephalic Presentation in Late Pregnancy: Issues: Issues* n RQ n Subjects n Predictor n Outcome n Results n Conclusions – what decision was the test supposed to help with? *BMJ 2006;333:578-80

A clinical decision rule to identify children at low risk for appendicitis n Study design: prospective cohort study n Subjects –Of 4140 patients 3-18 years presenting to Boston Children’s Hospital ED with CC abdominal pain –767 (19%) received surgical consultation for possible appendicitis –113 Excluded (Chronic diseases, recent imaging) –53 missed –601 included in the study (425 in derivation set) Kharbanda et al. Pediatrics 116(3):

A clinical decision rule to identify children at low risk for appendicitis n Predictor variable –Standardized assessment by PEM attending –For today, focus on “Pain with percussion, hopping or cough” (complete data in N=381) n Outcome variable: –Pathologic diagnosis of appendicitis for those who received surgery (37%) –Follow-up telephone call to family or pediatrician 2-4 weeks after the ED visit for those who did not receive surgery (63%) Kharbanda et al. Pediatrics 116(3):

A clinical decision rule to identify children at low risk for appendicitis n Results: Pain with percussion, hopping or cough n 78% sensitivity seems low to me. Is it valid for me in deciding whom to image? Kharbanda et al. Pediatrics 116(3):

Checklist n Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis? n Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)? n Was the reference standard applied regardless of the diagnostic test result? n Was the test (or cluster of tests) validated in a second, independent group of patients? From Sackett et al., Evidence-based Medicine,2 nd ed. (NY: Churchill Livingstone), p 68

Systematic approach n Study design: prospective cohort study n Subjects –Of 4140 patients 3-18 years presenting to Boston Children’s Hospital ED with CC abdominal pain –767 (19%) received surgical consultation for possible appendicitis Kharbanda et al. Pediatrics 116(3):

A clinical decision rule to identify children at low risk for appendicitis n Predictor variable –“Pain with percussion, hopping or cough” (complete data in N=381) n Outcome variable: –Pathologic diagnosis of appendicitis for those who received surgery (37%) –Follow-up telephone call to family or pediatrician 2-4 weeks after the ED visit for those who did not receive surgery (63%) Kharbanda et al. Pediatrics 116(3):

Issues n Sample representative? n Verification bias? n Double-gold standard bias? n Spectrum bias

For children presenting with abdominal pain to SFGH 6-M n Sensitivity probably valid (not falsely low) –But whether all of them tried to hop is not clear n Specificity probably low n PPV is high n NPV is low