Analysis of Death Causes in the STULONG Data Set Jan Burian, Jan Rauch EuroMISE – Cardio University of Economics Prague
Discovery Challenge DEATH CAUSE PATIENTS% myocardial infarction coronary heart disease stroke other causes sudden death unknown tumorous disease general atherosclerosis TOTAL
Discovery Challenge Data matrix ENTRY General characteristicsExaminationsVices Marital status Transport to a job Physical activity in a job Activity after a job Education Responsibility Age Weight Height Chest pain Breathlesness Cholesterol Urine Subscapular Triceps Alcohol Liquors Beer 10 Beer 12 Wine Smoking Former smoker Duration of smoking Tea Sugar Coffee
Discovery Challenge Analytic questions Are there strong relations concerning death cause? General characteristics (?) Death cause (?) Examinations (?) Death cause (?) Vices(?) Death cause (?) Combinations (?) Death cause (?)
Discovery Challenge Example of relation: founded implication A Cholesterol & Coffee(3 and more cups) 0.63;15 Death cause (tumorous disease) S S¬S¬S A15924 ¬ A % of patients satisfying A satisfy also S there are 15 patients satisfying both A and S
Discovery Challenge Example of relation: above average A Age( 65) ;15 Death cause (general atherosclerosis) S A Age( 65) 0.1;15 Death cause (general atherosclerosis) S S¬S¬S A ¬ A relative frequency of S: 22/389 = relative frequency of S if A: 15/151 = relative frequency of S if A is 76 per cent higher than the relative frequency of S there are 15 patients satisfying both A and S
Discovery Challenge Liquors(?) & Smoking(?) ;15 Death cause(?) Alcohol(?) & Tea(?) ;15 Death cause(?) Beer 12(?) & Wine(?) ;15 Death cause(?) Liquors(?) & Smoking(?) & Coffee(?) & Beer 12(?) ;15 Death cause(?) ????? ;15 Death cause(?) Vices(?) ;15 Death cause (?) For which combinations of vices is relative frequency of some death causes at least 55 per cent higher than relative frequency of the same death cause among all patients ? We require at least 15 patients with particular death cause satisfying both particular condition. Example of task
Discovery Challenge ft-Miner application Vices(?) ;15 Death cause (?) Vices(?) = Antecedent ;15 Death cause(?)
Discovery Challenge Dealing with attributes An example – Age Predefined intervals length 10: Age<40,50), Age<50,60), …, Age <70,80) Predefined intervals length 5: Age<40,45), Age<45,50), … Age <70,75) Sliding window length 10 Sliding window length 5 Sliding window length 2
Discovery Challenge Sliding window length 5 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,....., 67, 68, 69, 70, 71, 72, 73, , 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,....., 67, 68, 69, 70, 71, 72, 73, 74
Discovery Challenge Dealing with attributes An other example – Marital status Marital status(divorced) – 39 patients Marital status(single) – 28 patients 81.5 % 10.0 %7.2 % 1.3 %
Discovery Challenge Dealing with attributes Some further examples Predefined intervals, sliding windows Cholesterol Subscapular Height, Weight, … Particular values Activity after job Physical activity in a job Education Transport Responsibility …
Discovery Challenge ft-Miner result example Beer 12(yes) & Vine(yes) ;15 Death cause (tumorous disease)
Discovery Challenge Tasks: Antecedent Death cause (?) Antecedent rulesverifications General characteristics (9 attributes) 0.5; ; Examinations (6 attributes) 0.5; + 0.5; Vices (5 attributes) 0.5; ; Combinations 1 general + 1 other 0.5; ; Solution time in all cases ≤ 8 sec Intel Pentium on 3Ghz, 512 MB RAM
Discovery Challenge Conclusions Only 389 patients with death code Some potentially interesting rules Fast work with 4ft-Miner Possibility of tuning work with attributes predefined intervals, sliding windows …