Predicting Hospital Length of Stay in Intensive Care Unit

Predicting Hospital Length of Stay in Intensive Care Unit
Namita Singh

Introduction Methods Analysis Results Discussion Conclusion

Predictive Analytics in Healthcare
*

Predictive Analytics in Healthcare
Wall of Analytics at John Hopkins Hospital Johns Hopkins Hospital in Baltimore improved access for very sick patients by 78%, reduced ED patient waiting by 35% and reduced patient waiting following surgery 70% all during an 18‐month period when inpatient occupancy grew 8 points.

Importance of LOS prediction task
US hospital stays cost the system at least $377.5 billion per year* Recently Medicare legislation has proposed fixed amount of insurance payment for certain procedures.

Machine Learning Applications in LoS Prediction
Machine learning methods have been widely employed for predicting LOS as short, medium and long stay LOS in cardiac patients LOS in diabetic patients LOS as >5 days or <5 days Some researchers have used datasets with detailed clinical information, but such data is not available to all researchers Others have used the publicly available dataset like MIMIC

MIMIC-III Dataset “MIMIC is an openly available dataset developed by the MIT Lab for Computational Physiology, comprising de-identified health data associated with ~40,000 critical care patients. It includes demographics, vital signs, laboratory tests, medications, and more.” MIMIC-III, a freely accessible critical care database. Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG. Scientific Data (2016).DOI: /sdata Available at:

Data Used for Experiments
Drop rows with negative LoS, usually related to a time of death before admission.

Reducing the ICD_9 codes from 6, 985 to 17 would make a better machine learning model for this study.

One interesting observation is the fact that Asians have the lowest median stay.

Predicting Length of Stay
Regression approach Predicts the numeric value of the duration of stay Hard to predict exactly value Classification approach Practically sufficient to enable healthcare providers to make more informed decisions Not studied sufficiently

MIMIC-III Features for Predicting LOS
Feature name Data type Number of feature values Description Blood Numeric 6 ICD_9 category Circulatory 16 Congenital 10 Digestive 12 Endocrine 11 Genitourinary 8 Infectious Injury 22 Mental Misc 9 Muscular Neoplasm Nervous Pregnancy 13 Prenatal Respiratory Skin

Feature name Data type Number of feature values Description GENDER Nominal Binary Male or Female ICU ICU admission NICU NICU admission ADM_ELECTIVE Elective admission ADM_EMERGENCY Emergency admission ADM_NEWBORN Newborn admission ADM_URGENT Urgent admission INS_Government Government insurance INS_Medicaid Medicaid insurance INS_Medicare Medicare admission INS_Private Private insurance INS_Self Pay Self-payment type REL_NOT SPECIFIED Religion not specified REL_RELIGION Religious or not REL_UNOBTAINABL E Religion unobtainable ETH_ASIAN Asian ethnicity ETH_BLACK/AFRICAN AMERICAN Black/African American ethnicity ETH_HISPANIC/LATINO Hispanic/Latino ethnicity ETH_OTHERS Ethnicity as others. ETH_WHITE White ethnicity

Feature name Data type Number of feature values Description AGE_middle_adult Nominal Binary Age category as middle_adult AGE_newborn Age category as newborn AGE_senior Age category as senior AGE_young_adult Age category as young_adult MAR_DIVORCED Marital status as divorced MAR_LIFE PARTNER Marital status as life partner MAR_MARRIED Marital status as married MAR_SEPARATED Marital status as separated MAR_SINGLE Marital status as single MAR_UNKNOWN Marital status as unknown MAR_WIDOWED Marital status as widowed LOS Numeric 0-299 Regression model 31 1-day classification 16 2-day classification 11 3-day classification 7 5-day classification

Regression Results We built survivability prediction models using supervised learning algorithm of Linear Regression Results are only marginally better than ZeroR which predicts average. ALGORITHM RMSE ZEROR 2.882 Linear Regression 2.5869

Classification Results -1
We built LoS prediction model using three machine learning methods(Naïve Bayes, Logistic Regression and Multilayer Perceptron) for 3 class classification Classification Condition Classification Result Short Intermediate Long <3 days 3-5 days >5 days Diagnosis Instances Naïve Bayes Logistic Regression Multilayer Perceptron AUC Joint 53104 0.693 0.745 0.789

Further we built LoS prediction model using three machine learning methods(Naïve Bayes, Logistic Regression and Multilayer Perceptron) for 1-day, 2-day ,3-day and 5-day classification Classification Condition : 1-day, 2-day ,3-day and 5-day classification Classification Result: Diagnosis Instances One-Day Classification Two-Day Classification Three-Day Classification Five- Day Classification Naïve Bayes Logistic Reg. MLP Logistic Reg Joint 53104 0.650 0.690 0.661 0.648 0.686 0.672 0.691 0.676 0.646 0.680 0.656

We created diagnosis specific LoS prediction model using three machine learning methods(Naïve Bayes, Logistic Regression and Multilayer Perceptron) for 1-day classification. Classification Condition : 17 Diagnosis specific models based on ICD-9 codes. Classification Result : On next slide

Classification Results - 3
Diagnosis Records One Day Classification Naïve Bayes Logistic Regression Multilayer Perceptron Joint 53104 0.650 0.690 0.661 Blood 15692 0.656 0.696 0.663 Circulation 37537 0.620 0.592 Congenital 3109 0.703 0.745 0.662 Digestive 18407 0.624 0.645 0.593 Endocrine 31862 0.616 0.637 0.630 Genitourinary 18381 0.603 0.622 0.574 Infectious 11918 0.676 0.702 0.689 Injury 41851 0.719 0.727 0.710 Mental 14686 0.647 0.639 Misc 14329 0.611 0.635 Muscular 8805 0.615 0621 Neoplasm 7481 0.591 Nervous 13788 0.680 0.673 Pregnancy 169 0.600 Prenatal 10241 0.660 0.722 0.709 Respiratory 21126 0.605 0.634 Skin 5694 0.594 0.612 0.602

Evaluation Measures Sensitivity = Out of all who survived what fraction were correctly predicted to survive Specificity = Out of all who did not survive what fraction were correctly predicted to not survive There is a trade-off between sensitivity and specificity: Predict everyone to survive: 100% sensitivity but 0% specificity Predict everyone to not survive: 100% specificity but 0% sensitivity

ROC Curve Most machine learning methods also give a confidence (or probability) on their prediction For example, 90% confident will survive or 75% confident will survive Using different thresholds on this confidence, different sensitivity and specificity measures can be obtained which are then plotted on a graph called ROC curve

ROC Curve Predictive Model Random Baseline Perfect Predictions Predict
everyone to survive Different confidence thresholds Predictive Model Randomly predict 80% to survive Random Baseline Randomly predict 50% to survive Predict everyone to not survive

Area Under ROC Curve (AUC)
One number that summarizes performance of a predictive model Commonly used for comparing predictive models Higher the better; perfect AUC = 1.0 Random baseline has AUC=0.5

Past Work by Other Researchers
Some researchers have used MIMIC data to build predictive models for LoS Prediction All of them built one model for all ICD-9 codes ICD-9 code was used only as a feature Did not distinguish between different stages during training or testing

Joint Predictive Model (as used by others in past)
Machine Learning Method All diagnosis together Training Data Joint Predictive Model

Stage-Specific Predictive Models (as used in this work)
Machine Learning Method Pregnancy Predictive Model for Pregnancy Training Data Machine Learning Method Congenital Predictive Model for Congenital Training Data Machine Learning Method Injury Predictive Model for Injury Training Data

Joint vs. Diagnosis-Specific Models
Even though joint model gets more examples during training it does not do significantly better than diagnosis-specific models Additional training examples of other stages during training do not help. Results show that the most suitable model to predict LoS for a diagnosis is the model trained only for that diagnosis

Conclusions One day classification can be used to get more accurate prediction model as compared to binary or three class classification as was done in the past. Predictive models should be evaluated separately for each diagnosis otherwise it leads to overestimating or underestimating of the performance.

Predicting Hospital Length of Stay in Intensive Care Unit

Similar presentations

Presentation on theme: "Predicting Hospital Length of Stay in Intensive Care Unit"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Predicting Hospital Length of Stay in Intensive Care Unit

Similar presentations

Presentation on theme: "Predicting Hospital Length of Stay in Intensive Care Unit"— Presentation transcript:

Similar presentations

About project

Feedback