Lazy Associative Classification A. Veloso, W. M. Jr., and M. J. Zaki ICDM 2006 Advisor: Dr. Koh Jia-Ling Speaker: Liu Yu-Jiun Date: 2007/3/8
Outline Introduction Information Gain Decision Tree Eager Associative Classifier DT v.s. EAC Lazy Associative Classifier LAC v.s. EAC Experiment
Introduction Classification problem Models of classification Decision Tree Associative Classifier Neural Network Genetic Algorithm Lazy association classifier DT缺乏宏觀的相關性 (local) AC有可能產生太多的rule (global) LAC希望保留AC的準確度且不會產生太多的規則 Lazy的意思是force在有用的features上
Information gain S: any subset of training instances. si: the # of instances with class ci. |S|: the total # of training instance. : the probability of class ci in S. : the entropy of S. : information gain
Decision Tree A DT is built using a greedy, recursive splitting strategy. Each internal node is split according to the information gain. One rule per leaf.
Example
Decision Tree Classifier {outlook=sunny and humidity=high play=no} {outlook=sunny, temperature=cool, humidity=high, windy=false}
Eager Associative Classifier
CARs from EAC {windy=false and temperature=cool play=yes} {outlook=sunny and humidity=high play=no} {outlook=sunny and temperature=cool play=yes} {outlook=sunny, temperature=cool, humidity=high, windy=false}
DT v.s. EAC
Lazy Associative Classifier
Projected Training Data
Prediction results of EAC and LAC minsup = 40% Test instance: {o=overcast, t=hot, h=low, w=true} {windy=false and humidity=normal play=yes} {windy=false and temperature=cool play=yes} {temperature=cool and humidity=normal play=yes} {outlook=overcast play=yes} {temperature=hot play=yes} {windy=true play=no}
LAC v.s. EAC
Two characteristics Missing CARs Highly Disjunctive Spaces
Experiment 26 datasets from UCI Machine Learning Repository min_conf = 50%, min_sup = 1% Linux-based PC Intel PIII 1.0 GHz 1G RAM
Error Rates EAC info. gain 絕對比C4.5好,而其他 EAC則不一定,CBA在稀疏資料空間表現比較好,平均而言EAC info gain比CBA好,而CMAR更好的原因在於預測類別時使用多個規則,EAC info gain只有挑rank最高的那個。
Rule-Set Utilization
Execution Times Cache size: 10,000 CARs