Download presentation
Presentation is loading. Please wait.
Published byKaitlin Roland Modified over 9 years ago
1
Predicting Risk of Re-hospitalization for Congestive Heart Failure Patients (in collaboration with ) Jayshree Agarwal Senjuti Basu Roy, Ankur Teredesai, Si-Chi Chin, David Hazel, Kiyana, Mehrdad, (UWT) Paul Amoroso, Yoshi Williams, Dr. Lester Reed, Sheila, Eric Johnson (MHS)
2
Motivation Congestive Heart Failure(CHF) Many hospitalizations readmissions 19.6% patients readmitted within 30 days [Jencks et al. 2009] 31.1% patients readmitted within 60 days [Jencks et al. 2009] LOW Readmission rate = HIGH quality of care by hospital No reimbursement for readmission within 30 days $$$COST - 2004 unplanned re-admits = $17.4 billion [Jencks et al. 2009] 2
3
MHS - UWT Web and Data Science collaboration objectives Predict the RISK of Readmission for CHF patients Reduce the Readmission rate and cost Improve patient satisfaction and quality of care Appropriate pre-discharge and post-discharge planning Proper resource utilization 3
4
Benefits of predicting Risk or Readmission Proper resource utilization Improvement in quality of care Targeted interventions can be planned Proper pre-discharge and post-discharge plans can be made Reduction in cost Medicare expenditure on potentially preventable re- hospitalization is around $12 billion [Jencks et al. 2009] 4
5
Problem Develop models that can predict risk of readmission for CHF patients within 30 days after discharge 60 days after discharge The readmission may happen for other reasons in addition to CHF 5
6
Overall Approach How to solve the problem? – Apply predictive data mining techniques such as, classification What do these predictive mining techniques require? – Data in homogeneous format Information Extraction, Integration, and data preparation Prepare labeled dataset to train the model; used later on for testing. 6
7
Our Challenges Building domain knowledge – Which variables to consider? – How to merge and unify them in a homogeneous format (information extraction and integration) – How to understand the relative importance of the variables in the prediction task? How to prepare data? – Class label generation – Noisy real world data (missing values, inconsistencies, etc.) – Serious skew in the dataset 7
8
Solution 8
9
Building Predictive Classification Models Data Understanding Data Preprocessing Modeling Evaluation 9
10
Data Understanding Collect initial data Acquire Domain knowledge Describe and explore dataset Create data visualization 10
11
Building Predictive Classification Models Data Understanding Data Preprocessing Modeling Evaluation 11
12
Data Preprocessing Define class label Attribute selection Data Integration Removal of incomplete data Finding Eligible CHF admissions 12
13
Eligible CHF admissions and Generating Class Labels All CHF Admissions Eligible CHF Admissions In hospital deaths removed Is there any readmission within x days of discharge? The class label is assigned as 1 The class label is assigned as 0 YES NO X=30 X=60 13
14
Attribute selection Yale Model [ Krumholz et al] -Socio-Demographic variable(2) -Comorbidities(35) “Baseline” Additional predictor variables identified by us (14) “New” “Correlated”“All” Chi-square correlation test 14
15
Data Extraction Labeled data Patient details Primary and Secondary diagnosis Lab measurement Administrative data Data used for training the Models Data Incomplete data removed Table Joins 15
16
Data Distribution 30 days time frame 60 days time frame 16
17
Building Predictive Classification Models Data Understanding Data Preprocessing Modeling Evaluation 17
18
Modeling Logistic regression Naïve Bayes classifier Support Vector Machine Balancing imbalanced data by under-sampling and over sampling Selecting modeling technique for Binary Classification Building prediction models 18
19
Logistic Regression Model P (Probability of Y) Z ------> 19
20
Naïve Bayesian Classification 20
21
Support Vector Machine A method of classification for both linear and non linear data Searches for optimal separating hyperplane separating the two classes 21
22
Building Predictive Classification Models Data Understanding Data Preprocessing Modeling Evaluation 22
23
Performance Evaluation Metrics Precision – percentage of tuples labeled as positive are actually positive = TP/TP+FP Recall – measures the percentage of positive tuples that are labeled positive = TP/TP+FN Accuracy – percentage of tuples correctly classified = (TP+TN)/P+N ROC curves and area under the curve (AUC) – Shows the trade-off between true positive rate and false positive rate. 23
24
Baseline Model Hospital 30 day Heart Failure Readmission Measure submitted by Yale University [Krumholz et al. ] Used Hierarchical Logistic Regression Model The Area under the Curve (AUC) is 0.60 24
25
Evaluation Predictive models are assessed using 10 fold cross validation The performance is compared using different evaluation metrics mentioned previously 25
26
RESULTS
27
Logistic Regression for 30 days Area Under the Curve (AUC) Recall 27
28
Logistic regression for 60 days Area Under the Curve (AUC) Recall 28
29
Naïve Bayes classifier for 30 days 29 Area Under the Curve (AUC)
30
Support Vector Machine for 30 days 30 Area Under the Curve (AUC)
31
Results of Logistic regression AttributePrecisionRecallAccuracyF1 scoreAUC 30 days time frame A10.66670.00350.77060.00710.5962 A20.74940.15150.76340.25010.6309 A30.65790.15950.74760.25680.6333 A40.69260.14580.74880.24100.6358 60 days time frame A10.57320.03110.68550.05900.5999 A20.57050.19580.64760.29100.6197 A30.56040.2750.63740.3690.6284 A40.56930.26050.63930.35740.6303 31
32
Results of Naïve Bayes AttributePrecisionRecallAccuracyF1 scoreAUC 30 days time frame A10.44120.13360.71050.20490.5908 A20.45920.43990.60270.34750.6206 A30.41210.48380.61930.40170.6348 A40.42510.48600.61860.40030.6384 60 days time frame A10.50400.29710.61170.37380.6063 A20.42800.55400.49760.53400.6009 A30.44190.73440.52060.54090.6303 A40.44900.74790.50150.55010.6330 32
33
Results of Support Vector Machine AttributePrecisionRecallAccuracyF1 scoreAUC 30 day time frame A10.57640.48820.56290.52800.5896 A20.61040.25990.55780.35400.5172 A30.58680.30850.54960.40150.4985 A40.59420.43170.58290.49750.5220 60 day time frame A10.55210.56690.55800.55870.4663 A20.56290.39560.53920.45190.5019 A30.54890.49440.55150.51930.5232 A40.58750.56730.58460.57530.4707 33
34
Comparison of AUC of different Models Baseline Model Logistic Regression Naïve Bayesian Support vector machine 30 days timeframe 0.600.63580.63840.5220 60 days timeframe 0.63030.63300.5232 34
35
Conclusion and Discussion It is one of the difficult problem to solve Feature selection gives the best results. With data balancing recall of the model improves 35
36
Future Work Investigate other classifier techniques like ensemble methods, neural networks To explore additional features and study their relevance To employ other feature selection techniques To device a method to impute missing values Deploying the predictive models 36
37
Acknowledgement Multicare health System (MHS) and Dr. Lester Reed for giving us this opportunity Data architects and domain experts in MHS for their inputs Professors Dr. Ankur Teredesai and Dr. Senjuti Basu Roy for their guidance Other team members in UWT for their support 37
38
References S. F. Jencks, M. V. Williams, and E. A. Coleman, “Rehospitalizations among Patients in the Medicare Fee-for-Service Program,” New England Journal of Medicine, vol. 360, no. 14, pp. 1418–1428, 2009. J. Han and M. Kamber, Data mining: concepts and techniques. Morgan Kaufmann, 2006 H. M. Krumholz, S. L. T. Normand, P. S. Keenan, Z. Q. Lin, E. E. Drye, K. R. Bhat, Y. F. Wang, J. S. Ross, J. D. Schuur, and B. D. Stauffer, Hospital 30-day heart failure readmission measure methodology. Report prepared for the Centers for Medicare & Medicaid Services. 38
39
Questions 39
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.