Classification by Association Rules: Use Minimum Set of Rules Jianyu Yang December 10, 2003
Classification System Problem: (A, B, C) => y | n ? Problem: (A, B, C) => y | n ? –Decision tree learning, etc. Association rules: X => c Association rules: X => c –X: antecedent, c : consequent –Support & Confidence –Algorithms: Apriori
Association Rules: Issues Too many rules Too many rules –Inefficient –Overfitting Applying order matters Applying order matters –Example: (A, B) => y, (C) => n Minimum Support (minsup) Minimum Support (minsup) Minimum Confidence (minconf ) Minimum Confidence (minconf )
MSR Algorithm Ideas: No redundant rules – –(A, B) =>y – –(A, B, C) =>y Total order of rules – –“Occum’s razor”: favor general rules Pre-pruning – –(A, B) =>y – –(A, B, D)=>? 1 L 1 = {large 1-ruleitems}; 2CAR 1 = genRules(L 1 ) 3pruneSet(L 1 ) 4for (k = 2; L k-1 ≠ ; k++) do begin 5 C k = apriori-gen(L k-1 ); 6 forall training instances t D do begin 7C t = subset(C k, t) 8forall candidates c C t 9 C i.count++ for class label i 10 end 11 L k = {c C t | c i.count ≥ minsup for any class i} 12 CAR k = genRules(L k ) 13 pruneSet(L k ) 14end 15CARs = UNION k (CAR k )
minsup
minconf
Results: Error Rate Comparison
Conclusions A new algorithm was designed to build a classification system using a minimum set of association rules. A new algorithm was designed to build a classification system using a minimum set of association rules. In general, low minsup and high minconf produce low error rates. In general, low minsup and high minconf produce low error rates. Experiments on 26 benchmark datasets showed lower error rates in 17 datasets thanC4.5 (R8), in 16 than CBA (v2.0). Experiments on 26 benchmark datasets showed lower error rates in 17 datasets thanC4.5 (R8), in 16 than CBA (v2.0). The new algorithm does not always produce lower error rates than other algorithms. The new algorithm does not always produce lower error rates than other algorithms.