Rapid Association Rule Mining Amitabha Das, Wee-Keong Ng, Yew-Kwong Woon, Proc. of the 10th ACM International Conference on Information and Knowledge Management(CIKM’01),2001 Adviser:Jia-Ling Koh Speaker: Yu-ting Kung
Introduction Propose an innovative algorithm to push the speed-up barrier ─Rapid Association Rule Mining (RARM) RARM uses a tree structure─Support- Ordered Trie Itemset (SOTrieIT) Hold pre-processed transactional data quickly discover large 1-itemsets and 2-itemsets without scanning the database and without candidate 2-itemsets generation
A Complete TrieIT Definition I (the set of items) = {a 1,a 2,…a N } ─lexicographically-ordered A complete TrieIT is a set of tree nodes such that every tree node w is a 2-tuple w i I is the label of the node w c is a support count
A Complete TrieIT(Cont.) Example Complete TrieIT W 1 (item A) Complete TrieIT W 2 (item B) Complete TrieIT W 3 (item C) Complete TrieIT W 4 (item D) ※ Database D is stored as a set of complete TrieITs
A Complete TrieIT(Cont.) Insertion Transaction database with N=4 After the transactions 100 to 300 have been inserted into the tree After the transactions 400 have been inserted into the tree
Support-Ordered Trie Itemset Definition A SOTrieIT is a complete TrieIT with a depth of 1; i.e.,it consists only of 1. A root node w i 2. Some child nodes. All nodes in the forest of SOTrieIT are sorted according to their support counts in descending order from the left
SOTrieIT(Cont.) Example A SOTrieIT(Item A) A SOTrieIT(Item C) A SOTrieIT(Item B)
SOTrieIT(Cont.) Insertion Transaction database with N=4 Resultant SOTrieIT Insert TID 100 Insert TID 200 Insert TID 300 Insert TID 400
Algorithm RARM Pre-processing Mining of large itemsets
Algorithm RARM(Cont.) Example (support threshold is 80%) total number of traversals is 3 and L1={{C}} The sequence with which the SOTrieIT is traversed
Performance Evaluation Definition of Parameters Experiment using two database D1: T25.I10.N1K.D10K D2: T25.I10.N10K.D100K
Performance Evaluation(Cont.) Comparison of Apriori and RARM─ execution time For D1:
Performance Evaluation(Cont.) For D2:
Performance Evaluation(Cont.) Why does RARM achieve a much greater speed-up in D2 than in D1?
Conclusion Experiments have shown that RARM outperforms Apriori at all support thresholds