Haokai Sheng, Yun Xiong, David W. Bates, Li Zhou A Temporal Visualization of Chronic Obstructive Pulmonary Disease Progression Using Deep Learning and Free-Text Clinical Reports Meihan Wan, Chunlei Tang, Joseph M. Plasek, Haohan Zhang, Min Jeoung Kang, Haokai Sheng, Yun Xiong, David W. Bates, Li Zhou
Learning Objectives After participating in this oral presentation, the learner should be better able to: • Handle the time dimension in the EHRs; • Use the irregular time-lapse segments to demonstrate disease progression DGIM Research Day 2019
Chronic Obstructive Pulmonary Disease COPD is the third leading chronic disease in the United States, which can take well over ten years to evolve from mild to very severe. The global initiative for chronic obstructive lung disease provides the GOLD standard guidelines which physicians use in managing COPD. DGIM Research Day 2019
Free-Text Clinical Document & Its Time Irregular Visits: The temporal granularity of a patient’s record may vary significantly over different time periods. Incomplete Records: Clinical data may not be available for the entire progression of COPD. Disease Progression Heterogeneity: There is no natural alignment between different patients as progression rates vary. Discrete Observations: Although the disease progression is a continuous-time process, the patient is only observed at discrete time points with varied intervals • 1 day for an office visit • a few days for a hospitalization DGIM Research Day 2019
Yet, constant time segments cannot show the temporal autocorrelation from the dynamics arising in the data. DGIM Research Day 2019
A Four-Layer Deep Learning Model We used a flatten layer to facilitate the unfolding process followed by a dense layer to combine the time segments into a fully-connected network. DGIM Research Day 2019
Capturing of Time Lapse Segments We used a sigmoid activation function to output a {0,1}-sequence, in which we set two or more consecutive zeros or ones as a time segment. DGIM Research Day 2019
Dataset {P, R, C, M} Dataset P a PHYSICIAN INTERPRETATION section of 78,489 pulmonary notes for 2,431unique patients Dataset R two main sections: FINDINGS and IMPRESSION of 1,893,498 chest X- ray radiology reports for 13,414 unique patients Dataset C an ABNORMAL ECG section of 1,029,363 cardiology reports for 13,918 patients Dataset M merged Datasets P, R, and C using a heuristic merger that inserts a note into the appropriate chronological place in a corpus that was initialized to represent the most prevalent domain. DGIM Research Day 2019
Results on Dataset M High Prediction Accuracy Our proposed model achieved a prediction accuracy of 80% on average on our corpus. DGIM Research Day 2019
Results on Dataset P Regular time segment (pre-set time window + delta window) LSTM irregular time segment Days before death COPD Stage documented by the experts 1 [0,70] IV [0,65] 2 [71,140] III [55,150] 3 [141,210] II-2 [145,270] 4 [211,280] II-1 5 [281,350] [262,360] 6 [351,420] [337,484] 7 [421,490] [450,552] 8 [491,560] 9 [560,630] I [449,630] DGIM Research Day 2019
DGIM Research Day 2019