Download presentation
Presentation is loading. Please wait.
Published byAshley Sparks Modified over 5 years ago
5
Professor of Clinical Biostatistics and Medical Decision Making
Nov-19 Why Most Statistical Predictions Cannot Reliably Support Decision-Making: Problems Caused By Common Regression Modeling Approaches Ewout Steyerberg Professor of Clinical Biostatistics and Medical Decision Making Leiden, October 2017
6
Why most prediction models are false
Methods at development Rigorous validation
7
Validation
8
Three competitive models
MMRPredict (NEJM, 2006) - Common regression modeling approach; small data set MMRPro (JAMA 2006) - Bayesian modeling approach; moderate size data set PREMM (JAMA 2006; Gastroenterology 2011; JCO 2016) - Sensible regression modeling approach; large data set Which model wins? Which may do harm?
9
6 clinic-based, 5 pop-based cohorts
10
Discrimination Clinic-based Population-based
11
Calibration plots: obs vs predicted
Calibration slope as a measure of overfitting
12
Calibration
13
Calibration
14
Clinical usefulness Statistical performance: Discrimination and calibration Consider full range of predictions Decision-analytic performance: Define a decision threshold: act if risk > threshold TP and FP classifications Net Benefit as a summary measure: NB = (TP – w FP) / n, with w = harm/benefit (Vickers & Elkin, MDM 2006)
15
Decision curve analysis
Clinic-based Population-based
16
Overview Clinical context: testing for Lynch syndrome
Statistical and decision-analytic performance Could poor performance have been foreseen? Prevented?
17
Example of “barbarian modeling strategy”
18
Selection based on statistical significance
19
Many predictors, >37 df; dichotomized
20
Exaggerated effects
21
Sample size issues Robust: strong, vigorous, sturdy, tough, powerful, powerfully built, solidly built, as strong as a horse/ox, muscular, sinewy, rugged, hardy, strapping, brawny, burly, husky
22
Poor performance foreseeable?
Simulate modeling strategy Small sample size 38 events at development 35 events vs >2000 at validation Stepwise selection Univariate and multivariable statistical testing Dichotomization New cohort: n=19,866; 2,051 mutations
23
Poor discrimination Poor calibration
24
Poor decision-making Illustration with 10 random samples
25
Could poor performance be prevented?
PREMM modeling strategy Coding of family history Continuous age
26
SiM 2007
27
Could poor performance be prevented?
PREMM modeling strategy Coding of family history Continuous age Larger sample size
28
Better discrimination and calibration if a) more sensible modeling and b) larger sample size
29
Substantially better decision-making if a) more sensible modeling and b) larger sample size
30
Discussion Avoid stepwise selection Avoid dichotomization
Prespecification with summary variables Advanced estimation Avoid dichotomization Keep continuous Increase sample size Combining development and validation sets Collaborative efforts Rigorous validation Statistical and decision-analytic perspective
32
Evaluation of decision-making
Net Benefit: “utility of the method” Peirce, Science 1884 Youden index: sens + spec – 1 Net Benefit Vickers, MDM 2006 Weight FP:TP = H:B = odds(threshold) (Vergouwe 2003) Decision Curve Analysis
34
Youden index and Net Benefit
35
Avoid miscalibration by overfitting
Shrinkage Reduce coefficients by multiplying by s, s<1 E.g.: multiply by 0.8 Penalization Ridge regression: shrink during fitting LASSO: shrink to zero; implicit selection Elastic Net: combination of Ridge and LASSO Machine learning ?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.