Risk Adjustment Network Meeting. The Hague. October 11-14, 2017

Risk Adjustment Network Meeting. The Hague. October 11-14, 2017
Risk Adjustment, Big Data and Machine Learning; Challenge and Opportunity Dov Chernichovsky, Ben Gurion University of the Negev, Israel Alvaro Riascos, Los Andes University, Colombia Ran Bergman, Deloitte, Israel Risk Adjustment Network Meeting. The Hague. October 11-14, 2017

Goal of presentation Prompt further discussion about the role of big data and machine learning in RA

Rationale The technology is there, and evolving fast
Insurers and plans, at least in Israel, are using it for assessing risk and potentially for risk selection Can induce government (e.g., Israel, Colombia) to use more rigorous risk adjustment mechanism than those used today

Presentation Introduction to Big Data and Machine Learning
The case of Israel The case of Colombia Conclusion

Big Data – Its 4 V’s

Major Types of Data Traditional: “Omics” – data at the cellular level
Electronic medical records (e.g., billing, physician visits, measurements, lab tests, prescriptions and purchases) Claims data Patient and MDs’ surveys Registry data “Omics” – data at the cellular level Genomic and genetic data Patient generated data Social media Sensors

Machine Learning and Conventional Methods
Handle different types of data, including numbers, text, and/or visual images, that are of different dimensions and structures Detect and learn fast, through computer algorithms, highly complex and intricate relationships in high-dimensional data Do not pre-impose constraints on the relationship between inputs and outcomes Fewer assumptions in a non-parametric statistical model Need Structured data Slow Resource consuming Require pre-specified modeling and parameters

Goals of Machine Learning
Predict an outcome Measure success by ‘out of sample performance’ Take advantage of rich and complex data that entail outcomes that are determined by many potential predictors with complex interrelationships

Issues (Still) imprecision and ambiguity of medical data
Validation of model, for policy making predicts well an outcome (e.g., 90 percent of the time) but has a high false positive rate The incentives that the calculations produce

Some Evidence Ross S. (2016) argues that a simplified risk adjustment formula selected via this nonparametric framework maintains much of the efficiency of a traditional larger formula. The ensemble approach also outperformed classical regression and all other algorithms studied. Buchner, F., Wasem, J. & Schillo, S. (2015) show that including interactions from a machine learning algorithm improves the adjusted R2 from 25.43% to 25.81% on the evaluation data set. Predictive ratios are calculated for subgroups affected by the interactions. The R2 improvement detected is only marginal. Li et al. (2013) argue that the non linear relations among risk factors is usually very difficult to capture with linear models. The random forest model reaches a R2 of 38% with an standard deviation on while the linear regression model reaches a R2 of 31% with a standard deviation on 0.01

Based on confidential estimates of a sickness fund
Israel Based on confidential estimates of a sickness fund

Colombia (as in Israel)
Adds variables to current formula and explores potential interactions Estimates with conventional and machine learning methods

Colombia – Step I: New Specification
Add to current state formula, based on linear regression of gender, age groups, location, and their two-way interactions (UPC): enrollees' morbidity characterized by 29 long-term disease groups (Dx) the severity of health condition using indicators of hospitalizations (H) and consultation with specialists (E), and admission to an intensive care unit (U) UPC + Dx + H + E + U UPC * H * E * U + Dx

Data Panel of claims and 2011

Estimation Models The linear model estimated through weighted least squares Three machine learning models: Artificial neural networks (ANN) Random forests (RF) Boosted trees (GBM) To control for a selection bias, some specification of the machine learning models include an additional regressor: the probability of claiming a service since 20% of enrollees do not claim any service during the year Negative predictions of machine learning models were truncated at zero.

Criteria for Evaluating Models

Entire Distribution Estimate

Lowest Quintile Estimate

Upper Quintile Esimate

Conclusions (for Colombia and Israel)
Risk adjustment policy can redistribute resources more efficiently by adjusting for the enrollees' health conditions and by using non parametric specifications that capture better than the linear models the non linear relation between risk factors The non-parametric machine learning approach appears (in Colombia) to perform well for the entire cost distribution, poorly at its lower end, and well at the its higher end The Israeli political economy associated with Big Data and Machine Learning suggests that sickness funds “are tempted” to use the new technological options for risk selection

References Buchner, F., Wasem, J., & Schillo, S “Regression Trees Identify Relevant Interactions: Can This Improve the Predictive Performance of Risk Adjustment?” Health economics 26 (1):74-85 Li, L., Bagheri, S., Goote, H., Hasan, A., & Hazard, G Risk adjustment of patient expenditures: A Big Data Analytics Approach. Ieee international conference on Big data (pp. 12{14) Rose S “A Machine Learning Framework for Plan Payment Risk Adjustment.” Health Services Research 51:6, Part I

Some more food for thought……..
Thanks !

Risk Adjustment Network Meeting. The Hague. October 11-14, 2017

Similar presentations

Presentation on theme: "Risk Adjustment Network Meeting. The Hague. October 11-14, 2017"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Risk Adjustment Network Meeting. The Hague. October 11-14, 2017

Similar presentations

Presentation on theme: "Risk Adjustment Network Meeting. The Hague. October 11-14, 2017"— Presentation transcript:

Similar presentations

About project

Feedback