Download presentation
Presentation is loading. Please wait.
1
Risk Prediction on Electronic Health Records with Prior Medical Knowledge
Fenglong Ma1, Jing Gao1, Qiuling Suo1 Quanzeng You2, Jing Zhou3, Aidong Zhang1 1 SUNY at Buffalo, 2 Microsoft AI & Research, 3 eHealth Inc. KDD 2018
2
Electronic Health Records
Background Electronic Health Records Personalized Medicine “An electronic health record (EHR), or electronic medical record (EMR), is the systematized collection of patient and population electronically-stored health information in a digital format.” Wikipedia
3
Electronic Health Records (EHR)
Background Electronic Health Records (EHR) A comprehensive EHR dataset that contains everything happened to a patient at the hospital. Structured Codes Spectrograms Lab Measures Images Free Text
4
Challenges of Mining EHR Data
EHR Data with Structured Codes Temporal High dimensional Noisy An example of a patient’s visit information.
5
Disease Risk Prediction
Task Disease Risk Prediction Utilizing historical EHR data of individuals to predict whether the patient will suffer a certain disease in the future. An example for heart failure risk prediction.
6
Ignore the importance of prior medical knowledge!
Existing Work Deep Learning based Risk Prediction Convolutional Neural Networks (CNN) Recurrent Neural Networks (RNN) Cheng et al. Risk Prediction with Electronic Health Records: A Deep Learning Approach. In SDM’16. Drawback Ignore the importance of prior medical knowledge! Choi et al. RETAIN: An Interpretable Predictive model for Healthcare Using Reverse Time Attention Mechanism. In NIPS’16.
7
Doctor Diagnosis Process
Motivation Doctor Diagnosis Process Heart Failure? Medical Knowledge
8
Challenge of Using Medical Knowledge
Almost all the medical knowledge is represented by arbitrary rules. Tobacco use. Using tobacco can increase your risk of heart failure. (Categorical) Rule: Tobacco use Heart failure High blood pressure. Your heart works harder than it has to if your blood pressure is high. (Continuous) Rule: High blood pressure Heart failure
9
Risk Prediction with Prior Medical Knowledge
Posterior Regularization An effective technique to convert the discrete knowledge into continuous real-valued features by modeling the posterior distribution as a constrained posterior feature set. Ganchev, et al., Posterior Regularization for Structured Latent Variable Models. JMLR, 2010.
10
Risk Prediction with Prior Medical Knowledge
Posterior Regularization Ground Truth Our Final Goal Rules A Function Given Value Drawback: Hard to manually set reasonable bounds for constraint features.
11
Risk Prediction with Prior Medical Knowledge
Solution Represent the desired distribution as a log-linear model. Desired distribution The Proposed Model PRIME. Any Existing Model
12
Risk Prediction with Prior Medical Knowledge
Constraint Feature Design Patient Characteristics Ethnicity Age Underlying Diseases Disease Duration Genetics Family History ℇ denotes the set of races related to the prediction. 𝐮 is the frequency vector of underlying diseases. 𝐝 is the duration vector of underlying diseases. 𝒞 is the set of all the diagnosis codes in 𝐗. 𝒢 denotes the set of genetic disorders. ℋ represents the set of family history disorders.
13
Example of Designing Constraint Features
Underlying Diseases and Durations Underlying Disease 401.9 278.0 305.02 Frequency 2 1 Duration (month) 23 17 9 𝐮 𝐝
14
Risk Prediction with Prior Medical Knowledge
An easy way to understand PRIME Prediction Training Deep Learning Feature Engineering Risk Model Prediction Prior Medical Knowledge Prediction
15
Patient Characteristic
Experiments Datasets Designing Constraint Features Feature Patient Characteristic Underlying Diseases Disease Duration Genetics Family History Ethnicity Age Heart Failure √ √ √ COPD Kidney Disease
16
Performance Evaluation
Measures F1 Score, Accuracy and AUROC The higher the better Results on three datasets
17
Constraint Feature Analysis
The advantage of the proposed PRIME is to automatically learn the weights for different risk factors and constraint feature categories. Confidence of Feature Categories Confidence Matrix Learned by PRIME on the Heart Failure Dataset.
18
Constraint Feature Analysis
Weights of Constraint Features Case Group Heart Failure ID Underlying Diseases 1 High blood pressure 2 Coronary artery disease 3 Diabetes 4 Congenital heart defects 5 Valvular heart disease 6 Alcohol use 7 Smoking 8 Obesity Control Group
19
Our new work is coming soon…
Discussions The proposed framework PRIME is only effective for common diseases. Our new work “Fake is the New Real: Predicting Rare Diseases with Deep Generative Networks and Reinforcement Learning” is coming soon… 350 million people globally are fighting rare diseases. Only 5% of rare diseases have FDA approved therapies. Rare diseases affect more people than HIV and Cancer combined.
20
Conclusions This work is the first attempt to take prior medical knowledge into account for risk prediction task. We propose a novel framework PRIME, which models prior medical knowledge as posterior regularization and learns the desired posterior distribution with a log-linear model. The proposed PRIME is a general model, which can be easily applied to any predictive models in healthcare. PRIME is able to distinguish the importance of different prior knowledge contributed to the risk prediction.
21
Thank You! Questions? Source code, slides and poster are publicly available at
22
Backup Directly use constraint features to predict the labels of patients? 86.3%
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.