Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test.

Slides:



Advertisements
Similar presentations
Appraising Diagnostic Studies CEBM Course April 2013 Matthew Thompson Reader, Dept Primary Care Health Sciences Director, Oxford Centre for.
Advertisements

ASSESSING RESPONSIVENESS OF HEALTH MEASUREMENTS. Link validity & reliability testing to purpose of the measure Some examples: In a diagnostic instrument,
Exploring uncertainty in cost effectiveness analysis NICE International and HITAP copyright © 2013 Francis Ruiz NICE International (acknowledgements to:
ELISA BETTER THAN WE THOUGHT Dr Direk Limmathurotsakul, MD MSc PhD.
ECG screening in asymptomatic children Delith Garrick.
Chance, bias and confounding
Conducting systematic reviews for development of clinical guidelines 8 August 2013 Professor Mike Clarke
EVAL 6970: Meta-Analysis Fixed-Effect and Random- Effects Models Dr. Chris L. S. Coryn Spring 2011.
Meta-Analysis of PSA Growth Lurdes Y.T. Inoue, Ph.D. Ruth Etzioni, Ph.D. Elizabeth Slate, Ph.D. Christopher Morrel, Ph.D.
CAP and ROC curves.
Journal Club Alcohol and Health: Current Evidence July-August 2006.
Statistical Issues in Developing Adaptive Treatment Strategies for Chronic Disorders S.A. Murphy Univ. of Michigan CDC/ATSDR: March, 2005.
Journal Club Alcohol and Health: Current Evidence November-December 2005.
PSY 307 – Statistics for the Behavioral Sciences
Meta-analysis & psychotherapy outcome research
Heterogeneity in Hedges. Fixed Effects Borenstein et al., 2009, pp
Mild Cognitive Impairment as a Target for Drug Development Steven H. Ferris, Ph.D. Silberstein Aging and Dementia Research Center New York University School.
Meta-Analysis. Why Meta-Analysis? There is an urgent need for reliable summaries of primary research in music therapy. 1. Music therapists can not keep.
Thoughts on Biomarker Discovery and Validation Karla Ballman, Ph.D. Division of Biostatistics October 29, 2007.
Are the results valid? Was the validity of the included studies appraised?
Statistics in Screening/Diagnosis
Multiple Choice Questions for discussion
Funded through the ESRC’s Researcher Development Initiative
O Type 2 diabetes has traditionally been managed as a single chronic disease state but it can commonly exist with co-morbidities such as depression. o.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Division of Population Health Sciences Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Indices of Performances of CPRs Nicola.
1 ICEBOH Split-mouth studies and systematic reviews Ian Needleman 1 & Helen Worthington 2 1 Unit of Periodontology UCL Eastman Dental Institute International.
Division of Population Health Sciences Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Diagnostic accuracy of the STRATIFY clinical.
Title Name Institute. Background -1 (Main problem)
How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette (on behalf of Andrew Vickers)
Quiz 2 - Review. Descriptive Statistics Be able to interpret: -Box Plots and Histograms -Mean, Median, Standard Deviation, and Percentiles.
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
INTRODUCTION Upper respiratory tract infections, including acute pharyngitis, are common in general practice. Although the most common cause of pharyngitis.
RevMan for Registrars Paul Glue, Psychological Medicine What is EBM? What is EBM? Different approaches/tools Different approaches/tools Systematic reviews.
Do Instrumental Activities of Daily Living Predict Dementia at 1- and 2- Year Follow-Up? Findings from the Development of Screening Guidelines and Diagnostic.
Retain H o Refute hypothesis and model MODELS Explanations or Theories OBSERVATIONS Pattern in Space or Time HYPOTHESIS Predictions based on model NULL.
Hospital racial segregation and racial disparity in mortality after injury Melanie Arthur University of Alaska Fairbanks.
Assessing Responsiveness of Health Measurements Ian McDowell, INTA, Santiago, March 20, 2001.
EBM --- Journal Reading Presenter :呂宥達 Date : 2005/10/27.
10 May Understanding diagnostic tests Evan Sergeant AusVet Animal Health Services.
1 Lecture 10: Meta-analysis of intervention studies Introduction to meta-analysis Selection of studies Abstraction of information Quality scores Methods.
Systematic Synthesis of the Literature: Introduction to Meta-analysis Linda N. Meurer, MD, MPH Department of Family and Community Medicine.
Systematic Reviews and Meta-analyses. Introduction A systematic review (also called an overview) attempts to summarize the scientific evidence related.
1 Lecture 10: Meta-analysis of intervention studies Introduction to meta-analysis Selection of studies Abstraction of information Quality scores Methods.
Appropriate use of Design Effects and Sample Weights in Complex Health Survey Data: A Review of Articles Published using Data from Add Health, MTF, and.
BASELINE BMI DOES NOT PREDICT SIX MONTH REMISSION RATE FOR DEPRESSION MANAGED UNDER COLLABORATIVE CARE MANAGEMENT Kurt B. Angstman, MS MD Todd W. Wade,
CoRPS Center of Research on Psychology in Somatic diseases Tilburg University The Netherlands Predictors of posttraumatic stress (PTSD) 18 months post.
1 Overview of presentation 1.Context 2.Objectives 3.Methods 4.What has been achieved 5.What has to be done NCSI-CYP – Risk Stratification Investigation.
CMS SAS Users Group Conference Learn more about THE POWER TO KNOW ® October 17, 2011 Medicare Payment Standardization Modeling using SAS Enterprise Miner.
Physical examination for diagnosing disc herniation in patients with back pain: systematic review of diagnostic accuracy studies Daniëlle van der Windt.
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Fish Intake, Contaminants, and Human Health: Evaluating.
Understanding Populations & Samples
Association of Body Mass Index (BMI) and Depression Severity
Overview Modern chip designs have multiple IP components with different process, voltage, temperature sensitivities Optimizing mix to different customer.
(my biased thoughts on)
Quality of Electronic Emergency Department Data: How Good Are They?
Is High Placebo Response Really a Problem in Clinical Trials?
Materials & Methods what to include and where
Pilot Study for a Novel Measure Designed to Detect ADHD Simulators
Lecture 4: Meta-analysis
PHQ2 Screening Negative PHQ2 Screening Positive
Diagnosis II Dr. Brent E. Faught, Ph.D. Assistant Professor
What Is Major Depressive Disorder (MDD)?
Dr. Muhammad Ajmal Zahid Chairman, Department of Psychiatry,
Aiying Chen, Scott Patterson, Fabrice Bailleux and Ehab Bassily
Vancouver Risk Calculator Compared with ACR Lung-RADS in Predicting Malignancy: Analysis of the National Lung Screening Trial   When using CT to screen.
by Simon Gilbody, David Richards, and Michael Barkham
Receiver under the operator characteristic (ROC) curve for the test accuracy of the final risk score in the entire external validation sample (c statistic=0.84,
Evidence Based Diagnosis
Presentation transcript:

Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Selective cutoff reporting in studies of diagnostic test accuracy of depression screening tools: Comparing traditional meta-analysis to individual patient data meta-analysis Brooke Levis, MSc, PhD Candidate Jewish General Hospital and McGill University Montreal, Quebec, Canada

Does Selective Reporting of Data-driven Cutoffs Exaggerate Accuracy? The Hockey Analogy 2

What is Screening? Illustration: This information was originally developed by the UK National Screening Committee/NHS Screening Programmes ( and is used under the Open Government Licence v1.0 Purpose to identify otherwise unrecognisable disease Purpose to identify otherwise unrecognisable disease By sorting out apparently well persons who probably have a condition from those who probably do not By sorting out apparently well persons who probably have a condition from those who probably do not Not diagnostic Not diagnostic Positive tests require referral for diagnosis and, as appropriate, treatment Positive tests require referral for diagnosis and, as appropriate, treatment A program – of which a test is one component A program – of which a test is one component 3

Patient Health Questionnaire (PHQ-9) Patient Health Questionnaire (PHQ-9) Depression screening tool Depression screening tool Scores range from 0 to 27 Scores range from 0 to 27 Higher scores = more severe symptoms Higher scores = more severe symptoms The Patient Health Questionnaire (PHQ-9) depression screening tool 4

Extreme scenarios: Extreme scenarios: Cutoff of ≥ 0 Cutoff of ≥ 0 All subjects above cutoff All subjects above cutoff sensitivity = 100% sensitivity = 100% Cutoff of ≥ 27 Cutoff of ≥ 27 All subjects below cutoff All subjects below cutoff specificity = 100% specificity = 100% Selective Reporting of Results Using Data-Driven Cutoffs 5

Does Selecting Reporting of Data-driven Cutoffs Exaggerate Accuracy? Sensitivity increases from cutoff of 8 to cutoff of 11 For standard cutoff of 10, missing 897 cases (13%) For cutoffs of 7-9 and 11, missing 52-58% of data Manea et al., CMAJ,

Questions Does selective cutoff reporting lead to exaggerated estimates of accuracy? Does selective cutoff reporting lead to exaggerated estimates of accuracy? Can we identify predictable patterns of selective cutoff reporting? Can we identify predictable patterns of selective cutoff reporting? Why does selective cutoff reporting appear to impact sensitivity, but not specificity? Why does selective cutoff reporting appear to impact sensitivity, but not specificity? Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates? Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates? 7

Methods Data source: Data source: Studies included in published traditional meta-analysis on the diagnostic accuracy of the PHQ-9. (Manea et al, CMAJ 2012) Studies included in published traditional meta-analysis on the diagnostic accuracy of the PHQ-9. (Manea et al, CMAJ 2012) Inclusion criteria: Inclusion criteria: Unique patient sample Unique patient sample Published diagnostic accuracy for MDD for at least one PHQ-9 cutoff Published diagnostic accuracy for MDD for at least one PHQ-9 cutoff Data transfer: Data transfer: Invited authors of the eligible studies to contribute their original patient data (de-identified) Invited authors of the eligible studies to contribute their original patient data (de-identified) Received data from 13 of 16 eligible datasets (80% of patients, 94% of MDD cases) Received data from 13 of 16 eligible datasets (80% of patients, 94% of MDD cases) 8

Methods Data preparation Data preparation For each dataset, extracted PHQ-9 scores and MDD diagnostic status for each patient, and information pertaining to weighting For each dataset, extracted PHQ-9 scores and MDD diagnostic status for each patient, and information pertaining to weighting Statistical analyses (2 sets performed) Statistical analyses (2 sets performed) Traditional meta-analysis Traditional meta-analysis For each cutoff between 7 and 15, included data from the studies that reported accuracy results for the respective cutoff in the original publication For each cutoff between 7 and 15, included data from the studies that reported accuracy results for the respective cutoff in the original publication IPD meta-analysis IPD meta-analysis For each cutoff between 7 and 15, included data from all studies For each cutoff between 7 and 15, included data from all studies 9

Published data (traditional MA)All data (IPD MA) Cutoff# of studies# of patients# mdd cases# of studies# of patients# mdd cases Comparison of data availability 10

Methods Model: Bivariate random-effects* Model: Bivariate random-effects* meta-analysis models meta-analysis models Models sensitivity and specificity at the same time Models sensitivity and specificity at the same time Accounts for clustering by study Accounts for clustering by study Provides an overall pooled sensitivity and specificity for each cutoff, for the 2 sets of analyses Provides an overall pooled sensitivity and specificity for each cutoff, for the 2 sets of analyses Within each set of analyses, each cutoff requires its own model Within each set of analyses, each cutoff requires its own model Estimates between study heterogeneity Estimates between study heterogeneity Note: model accounts for correlation between sensitivity and specificity at each threshold, but not for correlation of parameters across thresholds *Random effects model: sensitivity & specificity assumed to vary across primary studies 11

Questions Does selective cutoff reporting lead to exaggerated estimates of accuracy? Does selective cutoff reporting lead to exaggerated estimates of accuracy? Can we identify predictable patterns of selective cutoff reporting? Can we identify predictable patterns of selective cutoff reporting? Why does selective cutoff reporting appear to impact sensitivity, but not specificity? Why does selective cutoff reporting appear to impact sensitivity, but not specificity? Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates? Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates? 12

Comparison of Diagnostic Accuracy Published data (traditional MA) All data (IPD MA) Cutoff N studies SensSpecCutoff N studies SensSpec

Comparison of ROC Curves 14

Questions Does selective cutoff reporting lead to exaggerated estimates of accuracy? Does selective cutoff reporting lead to exaggerated estimates of accuracy? Can we identify predictable patterns of selective cutoff reporting? Can we identify predictable patterns of selective cutoff reporting? Why does selective cutoff reporting appear to impact sensitivity, but not specificity? Why does selective cutoff reporting appear to impact sensitivity, but not specificity? Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates? Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates? 15

Publishing trends by study 16

Comparison of Sensitivity by Cutoff 17

Questions Does selective cutoff reporting lead to exaggerated estimates of accuracy? Does selective cutoff reporting lead to exaggerated estimates of accuracy? Can we identify predictable patterns of selective cutoff reporting? Can we identify predictable patterns of selective cutoff reporting? Why does selective cutoff reporting appear to impact sensitivity, but not specificity? Why does selective cutoff reporting appear to impact sensitivity, but not specificity? Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates? Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates? 18

Comparison of Diagnostic Accuracy 19 Published data (traditional MA) All data (IPD MA) Cutoff N studies SensSpecCutoff N studies SensSpec

Why Sensitivity Changes with Moving Cutoffs, but Not Specificity 20

Questions Does selective cutoff reporting lead to exaggerated estimates of accuracy? Does selective cutoff reporting lead to exaggerated estimates of accuracy? Can we identify predictable patterns of selective cutoff reporting? Can we identify predictable patterns of selective cutoff reporting? Why does selective cutoff reporting appear to impact sensitivity, but not specificity? Why does selective cutoff reporting appear to impact sensitivity, but not specificity? Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates? Does selective cutoff reporting transfer high heterogeneity in sensitivity due to small numbers of cases to heterogeneity in cutoff scores, but homogeneous accuracy estimates? 21

Heterogeneity 22

Summary Selective cutoff reporting in depression screening tool DTA studies may distort accuracy across cutoffs. Selective cutoff reporting in depression screening tool DTA studies may distort accuracy across cutoffs. It will lead to exaggerated estimates of accuracy. It will lead to exaggerated estimates of accuracy. These distortions were relatively minor in the PHQ, but would likely be much larger for other measures where standard cutoffs are less consistently reported and more data-driven reporting seems to occur (e.g., HADS). These distortions were relatively minor in the PHQ, but would likely be much larger for other measures where standard cutoffs are less consistently reported and more data-driven reporting seems to occur (e.g., HADS). IPD meta-analysis can address this and will allow subgroup-based accuracy evaluation. IPD meta-analysis can address this and will allow subgroup-based accuracy evaluation. 23

Summary STARD undergoing revision: STARD undergoing revision: Needs to require precision-based sample size calculation to avoid very small samples – particularly number of cases – and unstable estimates Needs to require precision-based sample size calculation to avoid very small samples – particularly number of cases – and unstable estimates Needs to require reporting of spectrum of cutoffs, which is easily done with online appendices Needs to require reporting of spectrum of cutoffs, which is easily done with online appendices 24

Acknowledgements Brett Thombs Brett Thombs Andrea Benedetti Andrea Benedetti Roy Ziegelstein Roy Ziegelstein Pim Cuijpers Pim Cuijpers Simon Gilbody Simon Gilbody John Ioannidis John Ioannidis Alex Levis Alex Levis Danielle Rice Danielle Rice Scott Patten Dean McMillan Ian Shrier Russell Steele Lorie Kloda DEPRESSD Investigators Other Contributors 25