Net Reclassification Risk: a graph to clarify the potential prognostic utility of new markers Ewout Steyerberg Professor of Medical Decision Making Dept.

Slides:



Advertisements
Similar presentations
Lecture 3 Validity of screening and diagnostic tests
Advertisements

Grading the Strength of a Body of Evidence on Diagnostic Tests Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for.
Statistical Analysis and Data Interpretation What is significant for the athlete, the statistician and team doctor? important Will Hopkins
Receiver Operating Characteristic (ROC) Curves
Departments of Medicine and Biostatistics
New concepts and guidelines in the management of LDL-c and CV Risk: Need for early intervention Prof. Ulf Landmesser University Hospital Zürich Switzerland.
ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.
Centre Cérébrovasculaire COMORBIDITY ANALYSIS AND 3 MONTHS FUNCTIONAL OUTCOME IN ACUTE ISCHEMIC STROKE: DATA FROM ACUTE STROKE REGISTRY AND ANALYSIS.
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,
Midterm Review Goodness of Fit and Predictive Accuracy
How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette on behalf of Andrew Vickers.
Lucila Ohno-Machado An introduction to calibration and discrimination methods HST951 Medical Decision Support Harvard Medical School Massachusetts Institute.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Validation of predictive regression models Ewout W. Steyerberg, PhD Clinical epidemiologist Frank E. Harrell, PhD Biostatistician.
Health Economics & Policy 3 rd Edition James W. Henderson Chapter 4 Economic Evaluation in Health Care.
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
1 Comparing Zero Coronary Artery Calcium With Other Negative Risk Factors for Coronary Heart Disease A Novel Methodology: Risk-Adjusted Negative Likelihood.
BASIC STATISTICS: AN OXYMORON? (With a little EPI thrown in…) URVASHI VAID MD, MS AUG 2012.
Multiple Choice Questions for discussion
UOG Journal Club: January 2013
Medical decision making. 2 Predictive values 57-years old, Weight loss, Numbness, Mild fewer What is the probability of low back cancer? Base on demographic.
Criteria for Assessment of Performance of Cancer Risk Prediction Models: Overview Ruth Pfeiffer Cancer Risk Prediction Workshop, May 21, 2004 Division.
© Copyright 2009 by the American Association for Clinical Chemistry Plasma Myeloperoxidase Predicts Incident Cardiovascular Risks in Stable Patients Undergoing.
ROC 1.Medical decision making 2.Machine learning 3.Data mining research communities A technique for visualizing, organizing, selecting classifiers based.
Division of Population Health Sciences Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Indices of Performances of CPRs Nicola.
Assessing Survival: Cox Proportional Hazards Model
Copyright © 2003, SAS Institute Inc. All rights reserved. Cost-Sensitive Classifier Selection Ross Bettinger Analytical Consultant SAS Services.
The PCA3 Assay improves the prediction of initial biopsy outcome and may be indicative of prostate cancer aggressiveness de la Taille A, Irani J, Graefen.
How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette (on behalf of Andrew Vickers)
Kevin Kennedy, MS Saint Luke’s Hospital, Kansas City, MO
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
Long-Term Prognostic Value for Patients with Chronic Heart Failure of Estimated Glomerular Filtration Rate Calculated with the New CKD-EPI Equations Containing.
MEASURES OF TEST ACCURACY AND ASSOCIATIONS DR ODIFE, U.B SR, EDM DIVISION.
1 Risk Assessment Tests Marina Kondratovich, Ph.D. OIVD/CDRH/FDA March 9, 2011 Molecular and Clinical Genetics Panel for Direct-to-Consumer (DTC) Genetic.
Evaluating Results of Learning Blaž Zupan
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.
Evaluating Predictive Models Niels Peek Department of Medical Informatics Academic Medical Center University of Amsterdam.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Heart Disease Example Male residents age Two models examined A) independence 1)logit(╥) = α B) linear logit 1)logit(╥) = α + βx¡
Evaluating Classification Performance
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Timothy Wiemken, PhD MPH Assistant Professor Division of Infectious Diseases Diagnostic Tests.
PTP 560 Research Methods Week 12 Thomas Ruediger, PT.
Relationship between performance measures: From statistical evaluations to decision-analysis Ewout Steyerberg Dept of Public Health, Erasmus MC, Rotterdam,
SC-PM6: Prediction Models in Medicine: Development, Evaluation and Implementation Michael W. Kattan, Ph.D. Ewout Steyerberg, Ph.D. Brian Wells, M.S., M.D.
Assessing the additional value of diagnostic markers: a comparison of traditional and novel measures Ewout W. Steyerberg Professor of Medical Decision.
Atherosclerosis quantification by ultrasound Henrik Sillesen MD, DMSc Chairman & professor Dept. Vascular Surgery, Rigshospitalet Univ. of Copenhagen,
내과 R1 문정락 / prof. 정경환 N Engl J Med 369;10 nejm.org 932 september 5, 2013.
The index test results: positivity and negativity criteria.
Bootstrap and Model Validation
Comparison of Four "Time in Intensity“ Physical Activity Indices as
Background/Objective
Measuring prognosis Patients want to know likely outcome
Evaluation – next steps
Leah Li MRC Centre of Epidemiology for Child Health
Class session 7 Screening, validity, reliability
Evaluating Results of Learning
S117: Acute Setting Predictive Analytics Sharon E. Davis, MS
More Than Survival: Futility
Diagnosis II Dr. Brent E. Faught, Ph.D. Assistant Professor
The receiver operating characteristic (ROC) curve
VALIDATION AND UPDATING OF MODELS WITH BIOMARKERS
ERRORS, CONFOUNDING, and INTERACTION
Signature of CRC‐associated gut microbial species Relative abundances of 22 gut microbial species, collectively associated with CRC, are displayed as heatmap.
Roc curves By Vittoria Cozza, matr
Regression and Clinical prediction models
Professor of Clinical Biostatistics and Medical Decision Making Nov-19 Why Most Statistical Predictions Cannot Reliably Support Decision-Making:
Assessing a Patient’s Individual Risk of Biopsy-detectable Prostate Cancer: Be Aware of Case Mix Heterogeneity and A Priori Likelihood  Jan F.M. Verbeek,
Presentation transcript:

Net Reclassification Risk: a graph to clarify the potential prognostic utility of new markers Ewout Steyerberg Professor of Medical Decision Making Dept of Public Health, Erasmus MC Birmingham July, 2013

Ideas from  Colleagues  Michael Pencina  Stuart Baker  Douwe Posmus  Ben van Calster  PhD students:  Moniek Vedder  Maarten Leening

Overview  Performance evaluation:  Prediction vs classification  Graphical displays  Traditional and novel measures  Additional value of markers: NRI and other measures  Apparent vs external validation

Performance evaluation of predictions  First requirement: statistical significance  Likelihood ratio test  Model vs no model  Marker vs without marker  Do not use other test (DeLong for ROC area; NRI test;..)  Traditional: focus on prediction  Model fit (R 2 / Brier/..)  Calibration (testing; recalibration; graphics)  AUC (c statistic /..) …… Steyerberg, Vickers, …, Kattan, Epidemiology 2010

Performance evaluation: value of markers  Better decision making by adding a marker  Net Reclassification Improvement (Pencina, Stat Med 2008, weigthed version: 2011; AJE 2012)  Net Benefit and decision curves (Vickers, MDM 2006)  Relative Utility (Baker, JNCI 2009)  Focus shifts from continuous predictions to categorization  NRI often considered in CVD  3 categories (thresholds: 6%, 20%)  NB popular in (prostate) cancer  2 categories (biopsy threshold 10-40%)

Aim  Compare graphical formats and summary measures to quantify usefulness of adding a marker to a model

Framingham case study  Cohort: 3264 participants in Framingham Heart Study  Age 30 to 74 years  183 developed CHD (10 year risk: 5.6%)  Incremental value of adding HDL to Framingham model  HR=0.65; p<.0001  Data used in  Pencina, Stat Med 2008, >1000 citations  Steyerberg, Rev Esp Cardiol 2011, 7 citations

Basic idea for discrimination

Density plots: marker leads to higher prediction for events, lower for non-events?

Box plots

Summarize differences in distributions  Difference in mean + SD  delta c statistic:  Log based scoring rules; Nagelkerke R 2 : +1.6%  Mean prediction by outcome: discrimination slope ≈ Pearson R 2 ≈ scaled Brier: +1.9%

Classic: ROC curve plot (but often a waste of journal space) AUC increases from to How to interpret 0.01 increase?

Interpretation of AUC  Typically small improvement in discriminative ability according to AUC (or c statistic)  c stat blamed for being insensitive

A new perspective: reclassification

 Net Reclassification Improvement:  (move up | event– move down | event) + (move down | non-event – move up | non-event)

/183=12% 1/3081=.03%

 NRI  Introduced with 3 categories  Extended to “any category”: continuous NRI  Simplified to 2 categories: delta Youden index

Extension to any threshold: continuous NRI (Stat Med 2011) NRI events 24.6%; NRI non-e 5.5%; NRI

Binary ROC curve AUC increases from to How to interpret increase? Make it twice as big: NRI = 0.058

Criticism on NRI  ‘Nothing new’; related to other measures  for binary classiifcation NRI = delta sens + delta spec  AUC for binary classification = (sens + spec) / 2  NRI = 2 x delta AUC  Interpretation:  delta AUC larger with fewer cut-offs; NRI larger with more cut-offs  Interpretation as ‘net reclassified in population’ (%)

Interpretation as a %, with p-value (LTTE by Leening, JAMA 2013)  “The net reclassification index increased significantly after addition of intima-media thickness of the internal carotid artery (7.6%, P<0.001)”  Recent example, Gulati et al, JAMA 2013;  LTTE Leening  Reply: NRI components more important than their sum; and erratum

Limitation of NRI  Nothing new  Interpretability / misleadingness  Utility / weighting

NRI has ‘absurd’ weighting?

Loss function in NRI  Binary NRI = sum (delta(sens) + delta(spec))  Improvements for events / non-events are weighted equally; implies weighting TP (or FN, for events) and FP (or TN, for non-events) by odds(non-events vs events)  E.g. incidence 5.6%, relative weight 94.4 vs 5.6≈17  But cut-off of 20% implies 80:20=4  Inconsistent?  Choosing a cut-off on the probability scale implies a relative weight of TP vs FP, or harm vs benefit ; and vice versa (Peirce Science 1884; Pauker NEJM 1975; Localio AR, Goodman S: Beyond the usual prediction accuracy metrics: reporting results for clinical decision making. Ann Intern Med Aug 21Ann Int Med 2012)

Decision-analytic  Decision analytic, e.g. Net Benefit in Decision Curve  Net Benefit = (TP – w FP) / N w = threshold probability / (1-threshold probability) e.g.: threshold 50%: w =.5/.5=1; threshold 20%: w=.2/.8=1/4  Number of true-positive classifications, penalized for false-positive classifications Vickers & Elkin, MDM 2006

Reclassification table Reclassification risk table NRI events = 14-3=11 on 183 events = 6.0% NRI non events = 26 – 31 = –5 on 3081 non events = –0.2% NB = ( *5) / 3264 = Test tradeoff = 1/0.003 = 335

Net Reclassification Risk graph  Area = N * risk  Preventable events = area L/H – area H/L  Overtreatment = complementary area L/H – complementary H/L  Large prognostic utility: large areas and high risk differences

External validation: miscalibration Predictions twice too high Predictions twice too low

Sensitivity analyses for cut-off: decision curve

Conclusions  Evaluation of improvement in predictive ability requires different graphs and measures for continuous predictions than for classifications  Overall discriminative ability: delta AUC  Usefulness for decision making  Utility-respective measures such as Net Benefit and test tradeoff  Net Reclassification Risk graph for visual assessment and Decision Curve as a sensitivity analysis

Limitations of utility-respecting measures  Properties of Net Reclassification Risk need more work  External validation: model misspecification / miscalibration  Extension to survival / competing risk  Indicators of uncertainty  Definition of ‘important gain’ remains arbitrary; vs p<.05 well accepted; formal cost-effectiveness at the end of evaluation pyramid

References Assessing the performance of prediction models: a framework for traditional and novel measures. Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW. Epidemiology Jan;21(1): Assessing the incremental value of diagnostic and prognostic markers: a review and illustration. Steyerberg EW, Pencina MJ, Lingsma HF, Kattan MW, Vickers AJ, Van Calster B. Eur J Clin Invest Feb;42(2): [Performance measures for prediction models and markers: evaluation of predictions and classifications]. Steyerberg EW, Van Calster B, Pencina MJ. Rev Esp Cardiol Sep;64(9): Evaluating a new marker for risk prediction using the test tradeoff: an update. Baker SG, Van Calster B, Steyerberg EW. Int J Biostat Mar 22;8(1) Evaluation of markers and risk prediction models: Overview of relationships between NRI and decision- analytic measures. Ben Van Calster, Vickers AJ, Pencina MJ, Baker SG, Timmerman D, Steyerberg EW. Medical Decision Making, in press