Welcome Back From Lunch. Thursday Afternoon 2:00-3:00 Studies of Diagnostic Test Accuracy (Tom) 3:00-3:45 Combining Tests (Mark) 3:45-4:00 Break 4:00-5:30.

Welcome Back From Lunch

Thursday Afternoon 2:00-3:00 Studies of Diagnostic Test Accuracy (Tom) 3:00-3:45 Combining Tests (Mark) 3:45-4:00 Break 4:00-5:30 Small Groups 6:00 Meet in 6702 to head to Giants game

Studies of Diagnostic Test Accuracy: Outline Diagnostic accuracy study checklist Understanding specific biases Incorporation Spectrum Verification Double gold standard Example: Does ability to hop without pain rule out appendicitis in children?

Checklist Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis? Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)? Was the reference standard applied regardless of the diagnostic test result? Was the test (or cluster of tests) validated in a second, independent group of patients? From Sackett et al., Evidence-based Medicine,2 nd ed. (NY: Churchill Livingstone), 2000. p 68

Beyond the Checklist Consider not only possibility of bias, but WHY if may occur and DIRECTION Incorporation bias Spectrum bias Verification bias Double gold standard bias

Incorporation Bias When the test itself can be incorporated into the gold standard Prevented by blinding

Gold standard: determination of CHF by two cardiologists blinded to BNP “The best clinical predictor of congestive heart failure was an increased heart size on chest roentgenogram (accuracy, 81 percent)” Is there a problem with assessing accuracy of chest x-rays to diagnose CHF in this study? Example: Study of BNP as a test for congestive heart failure (CHF)* *Maisel AS, Krishnaswamy P, Nowak RM, McCord J, Hollander JE, Duc P, et al. Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart failure. N Engl J Med 2002;347(3):161-7. Problem 4.3

Cardiologists not blinded to Chest X-ray Used (incorporated) Chest x-ray for CHF diagnosis Incorporation bias for assessment of Chest X-ray, not BNP Incorporation bias

Spectrum of Disease and Nondisease Disease is often easier to diagnose if severe “Nondisease” is easier to diagnose if patient is well than if the patient has other diseases

Spectrum Bias Sensitivity depends on the spectrum of disease in the population being tested. Specificity depends on the spectrum of non-disease in the population being tested. Example: Absence of Nasal Bone (on 13- week ultrasound) as a Test for Chromosomal Abnormality

Spectrum Bias Example: Absence of Nasal Bone as a Test for Chromosomal Abnormality* Sensitivity = 229/333 = 69% BUT the D+ group only included fetuses with Trisomy 21 Cicero et al., Ultrasound Obstet Gynecol 2004; 23: 218-23

D+ group excluded 295 fetuses with other chromosomal abnormalities (esp. Trisomy 18) Among these fetuses, sensitivity 32% (not 69%) What decision is this test supposed to help with? If it is whether to test chromosomes using chorionic villus sampling or amniocentesis, these 295 fetuses should be included! Spectrum Bias: Absence of Nasal Bone as a Test for Chromosomal Abnormality

Sensitivity = 324/628 = 52% NOT 69% obtained when the D+ group only included fetuses with Trisomy 21 Spectrum Bias: Absence of Nasal Bone as a Test for Chromosomal Abnormality, effect of including other trisomies in D+ group

Verification bias: Example Visual assessment of jaundice in newborns Study patients who are getting a bilirubin measurement Ask clinicians to estimate extent of jaundice at time of blood draw

Visual Assessment of jaundice*: Results *Moyer et al., Archives Pediatr Adol Med 2000; 154:391 Sensitivity of jaundice below the nipple line for bilirubin ≥ 12 mg/dL = 97% Specificity = 19% What is the problem? Editor’s Note: The take-home message for me is that no jaundice below the nipple line equals no bilirubin test, unless there’s some other indication. --Catherine D. DeAngelis, MD

Verification Bias* Inclusion criterion for study: gold standard test was done in this case, blood test for bilirubin Subjects with positive index tests are more likely to be get the gold standard and to be included in the study clinicians usually don’t order blood test for bilirubin if there is little or no jaundice How does this affect sensitivity and specificity? *AKA Work-up, Referral Bias, or Ascertainment Bias

Verification Bias Effects TSB >12TSB < 12 Jaundice below nipple ab No jaundice below nipple c  d  Sensitivity, a/(a+c), is biased ___. Specificity, d/(b+d), is biased ___. *AKA Work-up, Referral Bias, or Ascertainment Bias

Visual Assessment of jaundice*: Results *Moyer et al., Archives Pediatr Adol Med 2000; 154:391 Recall “Gold Standard” was bilirubin ≥ 12 mg/dL Specificity = 19% This low specificity was a clue! What does it mean? NIH: 19% of newborns who don’t have a bilirubin ≥ 12 mg/dL are not jaundiced below the nipple line 81% of babies with bilirubin <12 mg/dL are jaundiced below the nipple line

Copyright restrictions may apply. Does This Child Have Appendicitis? JAMA. 2007;298:438-451. RLQ Pain: Sensitivity = 96% Specificity = 5% (1 – Specificity = 95%) Likelihood Ratio =1.0 RLQ pain was present in 96% of those with appendicitis and 95% of those without appendicitis.

Double Gold Standard Bias-1* Two different “gold standards” One gold standard (e.g., surgery, invasive test) is more likely to be applied in patients with positive index test Other gold standard (e.g., clinical follow- up) is more likely to be applied in patients with a negative index test. *AKA Differential verification bias

Double Gold Standard Bias- 2 There are some patients in whom the two “gold standards” do not give the same answer Spontaneously resolving disease (positive with immediate invasive test, but not with follow-up) Newly occurring or newly detectable disease (positive with follow-up but not with immediate invasive test)

Double Gold Standard Bias, example Study Population: All patients presenting to the ED who received a V/Q scan Test: V/Q Scan Disease: Pulmonary embolism (PE) Gold Standards: 1. Pulmonary arteriogram (PA-gram) if done (more likely with more abnormal V/Q scan) 2. Clinical follow-up in other patients (more likely with normal VQ scan) What happens if some PE resolve spontaneously? *PIOPED. JAMA 1990;263(20):2753-9.

Effect of Double Gold Standard Bias 1: Spontaneously resolving disease Test result will always agree with gold standard Both sensitivity and specificity increase Example: Joe has a small pulmonary embolus (PE) that will resolve spontaneously. If his VQ scan is positive, he will get an angiogram that shows the PE (true positive) If his VQ scan is negative, his PE will resolve and we will think he never had one (true negative) Joe’s VQ scan can’t be wrong!

Effect of Double Gold Standard Bias 2: Newly occurring or newly detectable disease Test result will always disagree with gold standard Both sensitivity and specificity decrease Example: Jane has or will soon get a nasty breast cancer that is currently undetectable If her mammogram is positive, she will get biopsies that will not find the tumor (mammogram will look falsely positive) If her mammogram is negative, she will return in several months and we will think the tumor was initially missed (mammogram will look falsely negative) Jane’s mammogram can’t be right!

Effect of Double Gold Standard Bias Spontaneously resolving disease Sensitivity falsely high Specificity falsely high Newly occurring or newly detectable disease Sensitivity falsely low Specificity falsely low

BiasDescriptionSensitivity is falsely … Specificity is falsely … Incorporation Gold standard incorporates index test. Spectrum D+ only includes “sickest of the sick” D- only includes “wellest of the well: Verification Positive index test makes gold standard more likely. Double Gold Standard Disease resolves spontaneously Disease become sdetectable during follow-up

Example: Does ability to hop or jump without pain rule out appendicitis in children? Kharbanda et al. Pediatrics 2005; 116(3): 709-16

Example: A clinical decision rule to identify children at low risk for appendicitis (Problem 5.6) Study design: prospective cohort study Subjects 4140 patients 3-18 years presenting to Boston Children’s Hospital ED with abdominal pain Of these, 767 (19%) received surgical consultation for possible appendicitis 113 excluded (chronic diseases, recent imaging) 53 missed 601 included in the study (425 in derivation set)

A clinical decision rule to identify children at low risk for appendicitis Predictor variable Standardized assessment by pediatric ED attending Focus on “Pain with percussion, hopping or cough” (complete data in N=381) Outcome variable: Pathologic diagnosis of appendicitis (or not) for those who received surgery (37%) Follow-up telephone call to family or pediatrician 2- 4 weeks after the ED visit for those who did not receive surgery (63%) Kharbanda et al. Pediatrics 2005; 116(3): 709-16

A clinical decision rule to identify children at low risk for appendicitis Results: Pain with percussion, hopping or cough 78% sensitivity and 83% NPV seem low to me. Are they valid for me in deciding whom to image? Kharbanda et al. Pediatrics 2005; 116(3): 709-16

Checklist Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis? Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)? Was the reference standard applied regardless of the diagnostic test result? Was the test (or cluster of tests) validated in a second, independent group of patients? From Sackett et al., Evidence-based Medicine,2 nd ed. (NY: Churchill Livingstone), 2000. p 68

In what direction would these biases affect results? Sample not representative (population referred to pedi surgery)? Verification bias? Double-gold standard bias? Spectrum bias

For children presenting with abdominal pain to SFGH 6-M Sensitivity probably valid (not falsely low) But whether all of them tried to hop is not clear Specificity probably low PPV is too high NPV is too low Does not address surgical consultation decision

Questions?

Welcome Back From Lunch. Thursday Afternoon 2:00-3:00 Studies of Diagnostic Test Accuracy (Tom) 3:00-3:45 Combining Tests (Mark) 3:45-4:00 Break 4:00-5:30.

Similar presentations

Presentation on theme: "Welcome Back From Lunch. Thursday Afternoon 2:00-3:00 Studies of Diagnostic Test Accuracy (Tom) 3:00-3:45 Combining Tests (Mark) 3:45-4:00 Break 4:00-5:30."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Welcome Back From Lunch. Thursday Afternoon 2:00-3:00 Studies of Diagnostic Test Accuracy (Tom) 3:00-3:45 Combining Tests (Mark) 3:45-4:00 Break 4:00-5:30.

Similar presentations

Presentation on theme: "Welcome Back From Lunch. Thursday Afternoon 2:00-3:00 Studies of Diagnostic Test Accuracy (Tom) 3:00-3:45 Combining Tests (Mark) 3:45-4:00 Break 4:00-5:30."— Presentation transcript:

Similar presentations

About project

Feedback