Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics
Evaluating Model’s Predictive Power u Linear regression (continuous outcomes) u Logistic regression (dichotomous outcomes)
Evaluating Linear Regression Models u R 2 is percentage of variation in outcomes explained by the model - best for continuous dependent variables –Length of stay –Health care costs u Ranges from 0-100% u Generally more is better
Risk Adjustment Models u Typically explain only 20-25% of variation in health care utilization u Explaining this amount of variation can be important if remaining variation is extremely random u Example: supports equitable allocation of capitation payments from health plans to providers
More to Modeling than Numbers u R 2 biased upward by more predictors u Approach to categorizing outliers can affect R 2 as predicting less skewed data gives higher R 2 u Model subject to random tendencies of particular dataset
Evaluating Logistic Models u Discrimination - accuracy of predicting outcomes among all individuals depending on their characteristics u Calibration - how well prediction works across the range of risk
Discrimination u C index - compares all random pairs of individuals in each outcome group (alive vs dead) to see if risk adjustment model predicts a higher likelihood of death for those who died (concordant) u Ranges from 0-1 based on proportion of concordant pairs and half of ties
Adequacy of Risk Adjustment Models u C index of 0.5 no better than random u C index of 1.0 indicates perfect prediction u Typical risk adjustment models
C statistic u Area under ROC curve for a predictive model no better than chance at predicting death is 0.5 u Models with improved prediction of death by –0.5 SDs better than chance results in c statistic =0.64 –1.0 SDs better than chance resutls in c statistic = 0.76 –1.5 SDs better than chance results in c statistic =0.86 –2.0 SDs better tha chance results in c statistic =0.92
Best Model Doesn’t Always Have Biggest C statistic u Adding health conditions that result from complications will raise c statistic of model but not make the model better for predicting quality.
Spurious Assessment of Model Performance u Missing values can lead to some patients being dropped from models u Be certain when comparing models that the same group of patients is being used for all models otherwise comparisons may reflect more than model performance
Calibration - Hosmer-Lemeshow u Size of C index does not indicate how well model performs across range of risk u Stratify individuals into groups (e.g. 10 groups) of equal size according to predicted likelihood of adverse outcome (eg death) u Compare actual vs expected outcomes for each stratum u Want a non significant p value for each stratum and across strata (Hosmer-Lemeshow statistic)
Hosmer-Lemeshow u For k strata the chi squared has k-2 degrees of freedom u Can obtain false negative (non significant p value) by having too few cases in a stratum
Calculating Expected Outcomes u Solve the multivariate model incorporating an individual’s specific characteristics u For continuous outcomes the predicted values are the expected values u For dichotomous outcomes the sum of the derived predictor variables produces a “logit” which can be algebraically converted to a probability u (e nat log odds /1 + e nat log odds )
Individual’s CABG Mortality Risk u 65 y.o obese non white woman with diabetes and serum creatinine of 1 mg/dl presents with an urgent need for CABG surgery. What is her risk of death?
Individual’s Predicted CABG Mortality Risk u 65 y.o obese non white woman with diabetes presents with an urgent need for CABG surgery. What is her risk of death? u Log odds = (0.06) (1.15) +.09 = 3.39 u Probability of death = 0.034/1.034=3.3%
Observed CABG Mortality Risk u Actual outcome of whether individual lived or died u Observed rate for a group is number of deaths per the number of people in that group
Actual and Expected CABG Surgery Mortality Rates by Patient Severity of Illness in New York Chi squared p=.16
Goodness-of-fit tests for AMI mortality models
Stratifying by Risk u Hosmer Lemeshow provides a summary statistic of how well model is calibrated u Also useful to look at how well model performs at extremes (high risk and low risk)
Validating Model – Eye Ball Test u Face validity/Content validity u Does empirically derived model correspond to a pre- determined conceptual model? u If not is that because of highly correlated predictors? A dataset limitation? A modeling error?
Validating Model in Other Datasets: Predicting Mortality following CABG STSNYVADukeMN C statistic Jones et al, JACC, 1996
Recalibrating Risk Adjustment Models u Necessary when observed outcome rate different than expected derived from a different population u This could reflect quality of care or differences in coding practices u Assumption is that relative weights of predictors to one another is correct u Recalibration is an adjustment to all predictor coefficients to force average expected outcome rate to equal observed outcome rate
Recalibrating Risk Adjustment Models u New York AMI mortality rate is 15% u California AMI mortality rate is 13% u Is care or coding different? u If want to use New York derived risk adjustment model to predict expected deaths in California need to adjust predictors (eg multiply by 13/15)
Summary u Summary statistics provide a means for evaluating the predictive power of multivariate models u Care should be taken to look beyond summary statistics to ensure that the model is not overspecified and that it conforms to a conceptual model u Models should be validated with internal and ideally external data u Next time we will review how a risk-adjustment model can be used to identify providers who perform better and worse than expected given their patient mix