1 Discovery of Temporal Patterns in Course-of-Disease Medical Data Jorge C. G. Ramirez Ph.D. Candidate Lynn L. Peterson and Diane J. Cook Supervising Professors
2 Overview Objective Contributions Approach TEMPADIS Summary and Conclusions
3 Objective Discover patterns that represent groups of patients that had a similar course of disease for a catastrophic or chronic illness Motivation –Medical –AI
4 Contributions Data Preprocessing –Normalization –Learning Missing Data –Learning Implicit Knowledge Exploratory Analysis –Event Set Sequence Approach
5 Contributions Domain Understanding –New perspective on mass of data –Identify groups of patients for further medical study
6 Approach Example Events – Laboratory Results 461 L WBC L HCT L PLT L CD4% L CD4A
7 Approach Example Events 468 C CV 468 D AIDS-RELATED COMPLEX, UNSPECIFIED 469 P CTM 60 CO-TRIMOXAZOLE DS 469 P AZT 200 ZIDOVUDINE 100MG Example Events – Visits – Diagnoses – Pharmacy
8 Event Set Sequences –Events Value Event: laboratory test result, visit Duration Event: pharmacy, diagnosis –Event Set is all Events that occur in a window of time –Event Set Sequence is all Event Sets that occur over a long period of time Approach Event Set Sequences
9 Approach Example Event Set 461 L WBC L HCT L PLT L CD4% L CD4A C CV 468 D AIDS-RELATED COMPLEX, UNSPECIFIED 469 P CTM 60 CO-TRIMOXAZOLE DS 469 P AZT 200 ZIDOVUDINE 100MG
10 Approach Normalization –Normal for each patient is different –Especially when effected by a catastrophic or chronic illness –Example: CD4A General Population Normal: Well HIV-positive patient: Severely immune-compromised patient:
11 Approach Normalization (continued) –Scale to -4…0…+4 0 is normal Each number represents a deviation from normal 1 and 2 are noticeable but not severe 3 is severe 4 is very severe
12 Approach Replace Missing Data –Diagnosis data very incomplete –Learn severity of condition from pharmacy data –Induce decision tree to classify conditions
13 Approach Create Health Status Categories 1= HIV-positive asymptomatic 2 = Asymptomatic, on anti-HIV therapy 3 = Immune-compromised, on prophylactic therapy 4= Active illness 5 = Severe active illness
14 Approach Learn Implicit Knowledge –Need to augment explicit knowledge –Recovery time is expert’s implicit knowledge –Use neural network to learn recovery time function 0 = Nothing to recover from 1-4 = weeks to recover 5 = 5 or more weeks to recover
15 Approach Categorize Pharmacy Data –A myriad of drugs prescribed –Need to understand significance –Categorize by use
16 Approach Categories –Nucleoside Analogs –Protease Inhibitors –Prophylaxis Therapies –Intraveneous antibiotics –Anti-virals –Anti-PCP/Toxoplasmosis –Anti-mycobacterials
17 Approach Categories (continued) – Anti-wasting syndrome – Anti-fungals – Chemotherapies
18 Approach Result: Understandable representation of patient data 861 C : 30 38: H : 3 22: 1 35: H : 3 22: 1 35: : C C : 30 38: 50 39: : H : 2 22: 1 39: 12
19 Approach Result: Understandable representation of patient data 861 C – H – H – C C H –
20 Approach Result: Understandable representation of patient data < { (EV C)(HS 3)(RT 1)(WBC -4)(HCT -3)(PLT 0) (LMPH –1)(onD ) } { (EV H)(HS 4)(RT 4)(WBC 0)(HCT -4)(PLT -1) (LMPH –2)(onD ) } { (EV H)(HS 4)(RT 1)(WBC -2)(HCT -3)(PLT -1) (LMPH –4)(onD ) } { (EV C)(HS 4)(RT 3)(WBC -4)(HCT -1) (onD ) } { (EV C)(HS 4)(RT 2)(WBC -4)(HCT -2)(PLT -1) (LMPH 2)(onD ) } { (EV H)(HS 4)(RT 4)(WBC 0)(HCT -4)(PLT 0) (LMPH –2)(onD ) } >
21 Approach Inexact Match –Use set difference Partial match, feature by feature Assumes default partial match for missing data –Use weakest-link/average-link Require minimum degree of match Require average degree of match
22 TEMPADIS Raw Target Data Data Cleaning Data Normalization Normalized Database
23 TEMPADIS Normalized Database Decision Tree Neural Net Reduced, Knowledge-Added Data
24 TEMPADIS Knowledge-Added Database Sequence Builder Temporal Patterns
25 Validation –Results are temporal patterns that demonstrate groups of patients had similar experience during the course of disease –Only medical experts can assess validity of discovered patterns –These results have been validated by the experts in the HIV Clinical Research Group Results
26 Results Given a database of patients followed for 4 to 9 years –Discovered interesting patterns –Interestingness has multiple dimensions Length Data that appears in the patterns Data that does not appear in the patterns
27 Results Advanced patients, subject to various OIs < { (EV C)(HS 3)(RT 0)(WBC 0)(HCT -1)(PLT 0)(LMPH -3) (onD ) } { (EV E)(HS 3)(RT 2)(WBC 3)(HCT -1)(PLT 1)(LMPH 4) (onD ) } { (EV C)(HS 3)(RT 0)(WBC 1)(HCT 0)(PLT 0)(CD4P -3) (CD4A -1)(LMPH 0)(onD ) } { (EV C)(HS 3)(RT 1)(WBC -1)(HCT -1)(PLT 1)(LMPH 2) (onD ) } { (EV E)(HS 3)(RT 1)(WBC 2)(HCT -1)(PLT 1)(LMPH 4) (onD ) } { (EV C)(HS 3)(RT 1)(WBC 1)(HCT 0)(PLT 0)(CD4P -3) (CD4A -2)(LMPH 0)(onD ) } >
28 Advanced patients, fairly stable < { (EV C)(HS 3)(RT 0)(WBC -1)(HCT -1)(PLT 1)(CD4P -4) (CD4A -4)(LMPH 0)(onD ) } { (EV C)(HS 3)(RT 0)(WBC 0)(HCT 0)(PLT -1)(CD4P -4) (CD4A -4)(LMPH 0)(onD ) } { (EV C)(HS 3)(RT 0)(onD ) } { (EV C)(HS 3)(RT 0)(WBC -2)(HCT 0)(PLT -1)(CD4P -4) (CD4A -4)(LMPH 0)(onD ) } { (EV C)(HS 4)(RT 1)(WBC 1)(HCT -4)(PLT 0)(CD4P -4) (CD4A -4)(LMPH -4)(onD ) } { (EV C)(HS 3)(RT 3)(onD ) } { (EV )(HS 3)(RT 1)(WBC 0)(HCT 0)(PLT 0)(LMPH 0) (onD ) } { (EV C)(HS 3)(RT 0)(CD4A -4)(onD ) } >
29 Asymptomatic period < { (EV C)(HS 1)(RT 0)(onD ) } { (EV C)(HS 1)(RT 0)(onD ) } { (EV C)(HS 1)(RT 1)(onD ) } { (EV C)(HS 1)(RT 0)(onD ) } { (EV E)(HS 1)(RT 0)(WBC -1)(HCT 0)(PLT 1)(CD4P -1) (CD4A -2)(LMPH 0)(onD ) } { (EV C)(HS 1)(RT 0)(onD ) } { (EV C)(HS 1)(RT 0)(CD4A 0)(onD ) } { (EV E)(HS 1)(RT 0)(WBC 1)(HCT 0)(PLT 0)(CD4P 0) (CD4A 0)(LMPH 0)(onD ) } { (EV C)(HS 1)(RT 0)(onD ) } { (EV C)(HS 1)(RT 0)(onD ) } >
30 Summary Nine Steps of KDD –Identify goal –Identify target data set –Data cleaning and preprocessing –Data reduction and projection –Identify data mining method
31 Summary Nine Steps of KDD –Exploratory Analysis –Data Mining –Interpretation of Mined Patterns –Acting on Discovered Knowledge
32 Conclusions Objective Met with Contributions –Patterns discovered representing groups of patients with similar experience in course of disease –This perspective on the data has not previously been produced –This kind of computation on this kind of data has not previously been produced
33 Future Work Improve discovery algorithm –Backtracking is a barrier to overcome Improve search control Develop heuristic for measuring interestingness Add ability to identify clinically identical/similar patterns
34 Future Work Move database to new Intelligent Systems in Medicine and Biology Lab Bring database up to date Include more domain data in Event Sets Explore impact of new developments in HIV treatment