Modeling Medical Records of Diabetes using Markov Decision Processes 1H. Asoh, 1M. Shiro, 1S. Akaho, 1T. Kamishima, 1K. Hasida, 2E. Aramaki, 3T. Kohro 1National Institute of Advanced Industrial Science and Technology 2Design School, Kyoto University 3The University of Tokyo Hospital Proceedings of the ICML2013 Workshop on Role of Machine Learning in Transforming Healthcare
Introduction State of problem Objective of the study Method Analyzing long-term medical records of patients suffering from chronic diseases is beginning to be recognized as an important issue in medical data analysis. Objective of the study To obtain the optimal policy for the treatment of diabetes and compare it with the averaged policy of doctors. Method They modeled the data regarding diabetes using Markov decision process (MDP).
Data Raw data Medical records of heart disease patients cared in University of Tokyo hospital. Over 10,000 patients since 1987. Data includes: Attributes of patients Examination results Prescription of medicines Surgical operations Data used They preprocess the data with the patients who periodically attended the hospital and underwent examinations and treatment They used the data after January 1, 2000. They focused on the data related to diabetes. Value of Hemoglobin-A1c(HbA1c).
MDP Model 𝑆,𝐴,𝑇,𝑅 𝑆 −𝑠𝑒𝑡 𝑜𝑓 𝑠𝑡𝑎𝑡𝑒, 𝐴−𝑠𝑒𝑡 𝑜𝑓 𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑇:𝑆 ×𝐴×𝑆 → 0,1 −𝑠𝑒𝑡 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑜𝑛 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑅:𝑆 ×𝐴×𝑆 ×ℛ → 0,1 −𝑠𝑒𝑡 𝑜𝑓 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑡𝑎𝑘𝑖𝑛𝑔 𝑖𝑚𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑟𝑒𝑤𝑎𝑟𝑑 Policy 𝜋: 𝑆 ×𝐴 → 0,1 Expectation of cumulative reward under policy 𝜋: ; 𝛾∈[0,1] – discount factor The value of an action 𝑎 at state 𝑠 under the policy 𝜋 can be defined as follows:
MDP Model Optimal policy 𝜋 ∗ Satisfies 𝑉 𝜋 ∗ 𝑠 ≥ 𝑉 𝜋 s for all state 𝑠∈𝑆 and for all policy 𝜋. The state values for optimal policy satisfy the following equation: With given MDP and policy, they evaluate state values 𝑉 𝜋 s and action values 𝑄 𝜋 s,𝑎 .
State & Action State Value of Hemoglobin-A1c(HbA1c). Discretized into three levels. (Normal, Medium, Severe). Action: Pharmaceutical treatment They grouped the drugs according to their functions and identified patterns of combinations of drug groups prescribed at a time. The number of identified combination patterns that appeared in the data was 38.
Experimenting the data To model and analyze medical records using MDP, they developed an MDP toolbox in R. Easily handle multiple episodes Estimate parameters of MDP Evaluate state and action values Compute the optimal policy. From the records they estimated MDP state transition probabilities 𝑇 and the policy 𝜋 of doctors. For the reward, they set state dependent values according to the opinion of a doctor.
State & Action values under estimated MDP They evaluated patients’ state values 𝑉 𝜋 𝑠 . Based on the estimated probabilities 𝑇 and policy 𝜋 Doctors’ action values 𝑄 𝜋 (𝑠,𝑎). (See appendix for all combinations)
State & Action values under “optimal policy” They obtained optimal policy 𝜋 ∗ By value iteration for MDP. The optimal policy for the each state is the same as the top actions in Table 4. The state value under optimal policy The state value for optimal policy are larger compared to the doctors’ policy. They noted that it doesn’t mean that the optimal policy performs better for the actual patients.
Evaluation of goodness of modeling Evaluation of one step future patients’ state prediction. They divided data into training data and test data. 90% : training data & 10% : test data Using the test data, they estimated the probabilities of MDP. For each state transition, they evaluated the log-likelihood of transition and averaged the values. 𝑁 𝑒 :number of action steps in episode e. The prediction achieves a log-likelihood value of -1.09. Evaluation of doctors’ action prediction. They evaluated the average log-likelihood of actions in test episodes. The number of candidates for the action prediction was 38. The prediction achieves a log-likelihood value of -3.63.
Conclusion In this paper, they exploited a Markov decision process to model the long-term process of disease treatment. They estimated the parameters of the model using the data extracted from patients’ medical records. Using the model they predicted the progression of the state of the patients and evaluate the value of treatment.
APPENDIX
Doctors’ action values Figure. Action values for the “normal" (left) and “medium" (right) states