Download presentation
Presentation is loading. Please wait.
Published byGwendoline Ferguson Modified over 8 years ago
1
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM’01), 29 Nov.-2 Dec. 2001, pp. 361–368. J. Li, H. Shen and R. Topor, “Mining the informative rule set for prediction,” Journal of Intelligent Information Systems, 22:2, pp. 155-174, 2004, Kluwer Academic. Advisor : Jia-Ling Koh Speaker : Chen-Yi Lin Department of Information & Computer Education, NTNU
2
2 Introductions The informative rule set Algorithm Experimental Results Conclusions Department of Information & Computer Education, NTNU Outlines
3
3 Introductions (1/2) Department of Information & Computer Education, NTNU The key problems with association rule mining are –The high cost of generating association rules –and the large number of rules that are normally generated.
4
4 Introductions (2/2) Mining the smallest association rules efficiently for subsequent prediction Informative rule set: necessary rule set for prediction Direct method without generating all Department of Information & Computer Education, NTNU
5
5 The informative rule set (1/8) The predictions for an itemset P from rule set R is a sequence of items Q: Ex: Department of Information & Computer Education, NTNU Itemset: a, b, c, d R1: a, b=>c R2: a, b=>d Q: {c} Q: {c, d} P
6
6 The informative rule set (2/8) Let be an association rule set and the set of single-target rules in. A set is informative over if – –for all, there does not exist a such that and –for all, there exist such that and Department of Information & Computer Education, NTNU
7
7 The informative rule set (3/8) Ex1: –r = ac=>b (0.5, 0.75) –r ’ = a=>b (0.67, 0.8) X Ex2: –r ’’ = ac=>b (0.5, 0.75) O –r = a=>b (0.67, 0.8) Department of Information & Computer Education, NTNU
8
8 The informative rule set (4/8) TidItems 100a, b, c 200a, b, c 300a, b, c 400a, b, d 500a, c, d 600b, c, d min_sup=0.5 min_conf=0.5 Transaction database There are 12 association rules: a=>b (0.67, 0.8) a=>c (0.67, 0.8) b=>c (0.67, 0.8) b=>a (0.67, 0.8) c=>a (0.67, 0.8) c=>b (0.67, 0.8) ab=>c (0.5, 0.75) ac=>b (0.5, 0.75) bc=>a (0.5, 0.75) a=>bc (0.5, 0.6) b=>ac (0.5, 0.6) c=>ab (0.5, 0.6) Department of Information & Computer Education, NTNU
9
9 The informative rule set (5/8) Ex: (1) Every transaction identified by the rule ab=>c is also identified by rule a=>c or b=>c. ab=>c can be omitted from the informative rule set without losing predictive capability. (2) Rule a=>b and a=>c provide predictions b and c than rule a=>bc. a=>bc can be omitted from the informative rule set without losing predictive capability. Hence, we left only 6 rules in informative rule set: { a=>b (0.67, 0.8), a=>c (0.67, 0.8), b=>c (0.67, 0.8), b=>a (0.67, 0.8), c=>a (0.67, 0.8), c=>b (0.67, 0.8) } Department of Information & Computer Education, NTNU
10
10 The informative rule set (6/8), then rule XY=>Z does not belong to the informative rule set (because Z is identified by X=>Z). EX: –X=a, Y=b, and Z=c (TID = {400}) ab=>c (X) Department of Information & Computer Education, NTNU
11
11 The informative rule set (7/8) Upward closed properties for generating informative rule sets: –If, then rule XY=>Z and all more specific rules do not occur in the informative rule set. –If, then for any Z, rule XY=>Z and all more specific rules do not occur in the informative rule set. Department of Information & Computer Education, NTNU
12
12 The informative rule set (8/8) Department of Information & Computer Education, NTNU Ex1: a=>d or b=>d (O) ab=>d and ab…=>d (X)
13
13 Algorithm (1/2) An fully expanded candidate tree over the set of items {1, 2, 3, 4}. Identity set Label Department of Information & Computer Education, NTNU
14
14 Algorithm (2/2) Department of Information & Computer Education, NTNU
15
15 Experimental Results (1/4) Department of Information & Computer Education, NTNU Sizes of different rule sets
16
16 Department of Information & Computer Education, NTNU Experimental Results (2/4) Generating time for different rule sets
17
17 Department of Information & Computer Education, NTNU Experimental Results (3/4) The number of times for scanning the database
18
18 Experimental Results (4/4) Department of Information & Computer Education, NTNU The number of candidate nodes
19
19 Conclusions Department of Information & Computer Education, NTNU Reduce the rule set for prediction sequences. A direct algorithm to efficiently mine the informative rules set without generating all frequent itemsets first. Fewer candidates and database accesses.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.