Download presentation
Presentation is loading. Please wait.
Published byMerryl Boone Modified over 9 years ago
1
Team Dogecoin: An Experience in Predicting Hospital Readmissions Acknowledgements The Problem Hospitals in the UK must keep track of which patients, once released, return to the hospital within 30 days. A high rate of patient readmission may suggest that a hospital is providing insufficient care. Figuring out which patients are likely to return helps hospitals improve their preventative care and patient assessment strategies. Given data from thousands of patients, including demographic information, test results over time, and the ultimate outcome of their hospital stay, we set out to learn a model to predict when individual patients are likely to be readmitted. The Data We had access to labeled data from 14,878 patients, with unlabeled data from an additional 6,359 patients. For each patient, we knew their age, gender, how they were admitted, to where they were released, how many prior admissions they had, and, most significantly, the results and times of every lab test given to each patient. To get a sense of what this data signified, we researched each test to find what physiological system it related to and what its normal ranges were. We tried to derive features and trends in the data based on our analysis of the test results and the demographic data. Methodology - Toolset Logistic Regression Starting point for evaluating feature selection We obtained maximum 56 % accuracy Random Forest Ensemble methods draw more complex decision boundaries than logistic regression, ease some feature finding difficulties Two underlying weak learners: Classifier and Regression Error with Classification Error with Regression Boosting Alternative ensemble method focusing on multiple weighted weak learners; weights represent importance of reclassification Three used methods: AdaBoost, GentleBoost, LogitBoost Error with Adaboost Error with GentleBoost Error with LogitBoost K-means Clustering Unsupervised learning algorithm focusing on minimizing distance in k clusters from center point mean distance from the rest of the cluster and every other point Potential Distance measure: Cartesian, CityBlock, and Correlation Used to find outliers in data for initial reweighting Hidden Markov Models Method for learning time-series data by using a hidden layer. Used in an ensemble fashion due to lack of single series data for accurate generations of transition matrix. Used to learn an underlying model for lab-test data Converted into features for later processing by another model Results Team Name: dogecoin Private Leaderboard Position: 20 th Final score: 0.59405 Our Generation Strategy 1.We clean up the features by turning admission and leave times into a length of stay 2.We add 3 features for each test: number of that test administered, difference between first and last test result, value of last test 3.We use a HMM illustrated below, which has been trained on the training data and saved it’s a and b matrix to generate a hidden state sequence, and for every test we add a feature indicating which hidden state the patient will be in after the final test was administered 4.We use cross-validation on the training data to determine the number of trees to use for the random forest learner 5.We train the random forest on the training data with the number of trees found above 6.We use the random forest to classify the test data Tools: Mathworks MATLAB, Python Data: Dr Tony Wolf, MD Testing Infrastructure: Kaggle Inc. Lab Test Informaton: Royal College of Physicians and Surgeons of Canada, US National Library of Medicine, Mayo Clinic Strategy Evaluation Hidden Markov Models Adding features derived from our HMMs did not improve our model Next steps involve synthesizing the hidden layer information into overall organ state Also combine model data for patients on sequential visits K-Means for Outlier Detections Clusters had very low predictive accuracy, hurting ability to detect outliers Next Step: evaluate cluster centers to obtain more knowledge about underlying data Also find alternative distance measure then Cartesian and evaluate their effectiveness Shrinking Features Our final model had over 100 features, with some features definitely cluttering the models Max Payton Katherine Ford
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.