Presentation is loading. Please wait.

Presentation is loading. Please wait.

Healthcare and Medicine: New frontiers for analytics and data mining

Similar presentations


Presentation on theme: "Healthcare and Medicine: New frontiers for analytics and data mining"— Presentation transcript:

1 Healthcare and Medicine: New frontiers for analytics and data mining
Akshay Bhat With Prof. Ramin Zabih and Dr. George Shih

2 Definitions Biology Medicine Healthcare Genes, Proteins and Organisms
How do proteins work? What differentiates different strains of viruses? Medicine Drugs, Devices and Surgeries How can we correctly diagnose a condition? How can we treat a condition? Healthcare Surveillance, Clinical trials and Payment policies Is nexium better than prilosec? How should we pay hospitals, when they commit errors.

3 This talk is about Medicine and Healthcare!
Not Bioinformatics Understanding the challenges faced by the Healthcare system. Who pays whom for what? What is appropriate care? How can we use data mining and analytics to identify issues with the healthcare system? Detecting outright fraud (Already a huge market) Areas for improving efficiency Decreasing overutilization

4 Ongoing changes in Healthcare & Medicine
Research on EMR/EHR vs. data captured by EMR/EHR First one is a usability problem The second one is analytics/data mining problem Escaping the EHR Trap — The Future of Health IT Shifting focus to using data captured by EMR/EHR Perspective in New England Journal of Medicine A Glimpse of the Next 100 Years in Medicine Editorial envisioning a Data Driven Future New England Journal of Medicine (December, 2012)

5 Hierarchy of Medical/Healthcare data
Size Availability Dataset ~ few Millions Easily available Census, Social Security, Cause of death records ~ 10 – 100 M Easily available for research Structured (Codes) Medicare, State, Insurance records, Prescription data. ~ 100 – 1000 M Requires affiliation Unstructured data (text) Discharge notes, Radiology/Lab reports. Few Billion to Trillion Requires affiliation & special arrangement Signals: Lab values, Physiologic time series, CT/MRI/Ultrasound scans. Infinite ? Very hard, Collect yourself Smartphone, Fitness, Heart rate sensor data

6 Easily available for research
This talk is about Size Availability Dataset ~ 10 – 300 M Easily available for research Structured (Codes) Medicare, State, Insurance records, Prescription data. ~ 100 – 1000 M Requires affiliation Unstructured data (text) Discharge notes, Radiology/Lab reports. Huge opportunity in next 10 years as EMR/EHR are adopted and standardized

7 Three projects Project Dataset
Determining optimal protocol for CT Imaging. (Multi-label semi supervised classification problem) Imaging orders (Text) from Weill Cornell Radiology, Building Predictive models for assessing risk of readmission (Classification problem) De-identified publicly available hospitalization records from California and Florida Building analytical tools for understanding healthcare systems (Data Mining / Data Analysis/ User Interface development) De-identified publicly available records

8 Determining optimal protocol for CT Imaging
Using short-text description of diagnoses and symptom, recommend an appropriate CT Imaging protocol A Multi-label classification problem. Insight: The domain of all possible symptoms and diagnoses is very large

9 Determining optimal protocol for CT Imaging
Extract UMLS (Unified Medical Language System) concepts from short texts Build a graph to represent relationships present in UMLS between different concepts. Such anatomical hierarchy and relations between location of diseases and diseases. Semi-supervised label propagation for learning: Assign labels to concepts in the graph using training data, propagate labels over the over the graph.

10 Determining optimal protocol for CT Imaging
Performs better than baseline for Mean Reciprocal Rank metric. Drawbacks: Protocols are not consistent across institutions or vendors. Represent protocols as structures and use structure learning SVM. Presented at a KDD 2011 Workshop and as an abstract at RSNA conference

11 Predicting risk of readmission upon discharge
Unplanned readmission are frequent. A 30 day readmission is generally defined as admission within 30 days of discharge from a hospital. Decreasing readmission is a big focus of current efforts, since hospitals get paid twice.

12 Predicting risk of readmission upon discharge
Can we predict risk of readmission upon discharge? Useful for understanding which patients we should follow up closely. Important for risk-adjustment, to avoid unnecessarily penalizing hospitals which treat complex patients.

13 Predicting risk of readmission upon discharge
Experiment 26 M hospitalizations California & Florida Linear SVM models for predicting readmission Hospitalizations from California for training/validations Hospitalizations from Florida as test set Results AUC scores 0.70 on test set Variation in predictive ability across diagnoses E.g. Readmission following appendectomy was difficult while readmission following diabetes was predictable.

14 Questions ?


Download ppt "Healthcare and Medicine: New frontiers for analytics and data mining"

Similar presentations


Ads by Google