A Data Reconstruction Algorithm for Temporal Clinical Expressions Zhikun Zhang, BS1,2, Chunlei Tang, PhD3,4,5, Meihan Wan, BS1,2, Joseph M. Plasek, PhD3, Yun Xiong, PhD1,2, Li Zhou, MD, PhD3,4, David W. Bates, MD, MSc 3,4,5 Podium Abstract Introduction Method Reference Temporal expressions annotated in clinical notes pose challenges to downstream analytical activities. For example, a disease-centric knowledge graph often requires massive time aggregation operations that organize itself around the relationship among multiple chronic diseases (e.g., chronic obstructive pulmonary disease (COPD) and heart failure). We present a novel data reconstruction algorithm that has three stages. First, it detects if an expression has temporal intent. Second, it decomposes and rewrites the expression into non-temporal sub-expression and temporal constraints. Finally, it clusters similar non-temporal sub-expressions by using unsupervised sentence embedding under the K-means paradigm. Experiments on a corpus of cardiology reports demonstrate that our method is feasible. Jia Z, Abujabal A, Saha Roy R, et al. TEQUILA: Temporal Question Answering over Knowledge Bases. Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 2018:1807-1810. Arora S, Liang Y, Ma T. A simple but tough-to-beat baseline for sentence embeddings. Proceeding of 5th International Conference on Learning Representations. Temporal information is crucial for many data analytic tasks; however temporal expressions in narrative clinical notes is challenging to process and be used for downstream analytical activities. Representing and reasoning about temporal expressions in clinical notes is critical for clinical diagnosis and treatment1. TimeML1-3 is used for annotating four types of temporal expression (i.e., time stamping, relative ordering, context, duration). However, TimeML and similar markup languages still struggle with representing implicit temporal conditions as well as complex expressions that require joining the results from the corresponding sub-expression. Consider the following examples: TimeML’s recognition capacity on E1 is adequate. In E2, no explicit date is mentioned, thus detecting the temporal nature within E2 is the first challenge. The phrase “after which” refers to after an event (IV adenosine infusion). TimeML could detect this phrase, but does not properly disambiguate it to a normalized date. The temporal preposition “with” is a cue as well, and words like “subsequently” are also used in temporal contexts. The second challenge in E2 is to judiciously decompose the temporal expressions into sub-expressions. For example, E2 should be decomposed into: E2.1 “what the patient did before a pharmacological stress test;” E2.2 “when the patient was administered a pharmacological stress test;” and E2.3 “when the patient was administered sestamibi.” E1: “When compared with ECG of 18-JUL-YYYY 10:41, (unconfirmed) no significant change was found. Confirmed by X MD on 7/22/YYYY 17:18.” E2: “Subsequently, a pharmacological stress test was performed with IV adenosine infusion after which sestamibi was injected IV at peak drug effect.” Author Affiliations Shanghai Key Laboratory of Data Science, Shanghai, CHN; School of Computer Science, Fudan University, Shanghai, CHN; Division of General Medicine and Primary Care, Brigham and Women’s Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; Clinical and Quality Analysis, Partners HealthCare System, Boston, MA, USA.
Conclusions and Relevance A Data Reconstruction Algorithm for Temporal Clinical Expressions Zhikun Zhang, BS1,2, Chunlei Tang, PhD3,4,5, Meihan Wan, BS1,2, Joseph M. Plasek, PhD3, Yun Xiong, PhD1,2, Li Zhou, MD, PhD3,4, David W. Bates, MD, MSc 3,4,5 Methods Results Materials We present a data reconstruction algorithm that has three stages. First, it detects if an expression has temporal intent. Second, it decomposes and rewrites the expression into non-temporal sub-expression and temporal constraints. Third, it clusters similar non-temporal sub-expressions by using unsupervised sentence embedding5 under the K-means paradigm. Specifically: Temporal signals and relations. Our TimeML expansion annotates textual elements that denote explicit and implicit temporal expressions such as the cues when a medical event is mentioned only implicitly such as “after which.” Decomposing and rewriting expressions Phrase/Sentence embeddings under K-means paradigm. Phrase embeddings is used to obtain similar phrases via comparing the distance between vectors. Our cohort consists of 15,500 COPD patients who had received care at Partners Healthcare network and died between 2011 and 2017. The clinical notes for this cohort were extracted from Partners Research Patient Data Registry (RPDR)4. In this study, we extracted the ABNORMAL ECG section from free-text cardiology reports. This study was approved by Partners Institutional Review Board (IRB). Figure 1 was reconstructed from 30,363 ECG notes in a time segment of 0-180 days before death. The format for reconstructed data (produced from our algorithm) is event data such as “duration (days) between two ECGs,” “ECG diagnosis,” “the probability at the same duration,” etc. Take E1 as an example, three sub-expressions are: E1.1 “When compared with ECG of 18-JUL-YYYY 10:41,” E1.2 “(unconfirmed) no significant change was found,” and E1.3 “Confirmed by X MD on 7/22/YYYY 17:18.” The duration is the time interval between E1.1 and E1.3. After having calculated the similarity of non-temporal sub-expression as E1.2, the number of clusters of similar ECG diagnosis is easy to obtain. Comparing positive ECG diagnoses against negative controls, we can extend the time interval between two ECGs to about 20 days based on the first “no significant change.” Author Affiliations Shanghai Key Laboratory of Data Science, Shanghai, CHN; School of Computer Science, Fudan University, Shanghai, CHN; Division of General Medicine and Primary Care, Brigham and Women’s Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; Clinical and Quality Analysis, Partners HealthCare System, Boston, MA, USA. Conclusions and Relevance Our data reconstruction algorithm for temporal clinical expression captured over phrase embeddings in a way that was feasible to address several gaps in natural language processing. This is a significant step toward handling further analytical activities such as knowledge graph that often requires massive time aggregation operations. 2