Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Comparing Association Rules and Decision Trees for Disease Prediction Advisor : Dr. Hsu Presenter : Yu-San Hsieh Author : Carlos Ordonez CIKM.17-24
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Motivation Objective Method Experiments Conclusions Outline
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation The mining association rules exits some questions in a medical data set ─ Irrelevant ─ Most relevant rules appear only at low support ─ The number of discovered rules becomes large at low support The number of rules makes search slow and interpretation by the domain expert difficult.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective We propose search constraints to find only medically significant association rules and make search more efficient.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Method Medical dataset Transforming Search Constraints Search constraints ─ User-specified maximum item-set size κ ─ group : A→g group(A j ) = g j group(AGE)=0 AGE is not group-constrained group(AL)=1 AL is constrained to belong group 1 ─ group(attribute(a)) ≠ group(attribute(b)) (-1.0<= IL < 0.2) and (-1.0 <= LA < 0.2) are not in the same itemset ─ ac : A→C ac(A j ) = c j ac(AGE) = 1 AGE is in antecedent ac(LAD) =2 LAD is in consequent Support confidence Phase 1 Phase 2 Phase 1 Phase 2 AGE LAD
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Experiments The medical data set ─ 655 patients and 25 attribute (numeric and categorical) ─ Three basic elements for analysis Perfusion defect Coronary stenosis Risk fatocr ─ Default parameter setting Maximal itemset size κ=4 Minimum support = 1% Minimum confidence = 70% ─ Negation, ac and Group Association rules Decision tree
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Conclusions The decision tree are less effective than constrained association rule ─ Predict disease with several related target attribute ─ Low confidence factor ─ Slight overfitting ─ Rule complexity ─ Data set fragmentation
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 My opinion Advantage ─ Producing medically useful rules, reducing the number of discovered rules and improving running time Drawback ─ Lack of quantitative evaluation ─ Most of rules’ analysis Application ─ Prediction ─ Classification
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Method Transformed to binary dimension ─ Numerical data: age 0< age <=40 and 40< age <=60 ─ Categorical data: sex sex = Male and sex = Female First constraint ─ An attribute has negation Additional items are created and corresponding to each negated categorical value or each negated interval example: not(0 <= LM < 30), not(0 <= LAD <50), not(0 <= LCX <50)……
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Experiments Predictive association rule healthy diseased LCX LAD RCA
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Experiments Predictive Decision tree ─ Using the CN4.5 decision tree algorithm ─ Focused on predicting LAD disease (LAD ≧ 50 as the target class) ─ Result : maximal height = 3 Numeric dimensions and automatic splitsManually binned variable Confidence↓ , not useful Confidence↓