Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining interesting association rules Loo Kin Kong 22 Feb 2002.

Similar presentations


Presentation on theme: "Mining interesting association rules Loo Kin Kong 22 Feb 2002."— Presentation transcript:

1 Mining interesting association rules Loo Kin Kong 22 Feb 2002

2 Plan Motivation Interestingness measures for association rules Objective measures Subjective measures Research issues Conclusion

3 Motivation KDD aims at finding new and interesting knowledge from databases, but... Numerous association rules are generated in a mining process Some of the mined rules may be trivial facts... “ Pregnant ”  “ Female ”, Supp=20%, Conf=100%

4 Motivation (Cont ’ d)... while some other rules may be redundant “ Drive fast ”  “ Had an accident ”, Supp=10%, Conf=40% “ Drive fast ” and “ Born in HK ”  “ Had an accident ”, Supp=9%, Conf=42% The study of “ interestingness ” of association rules aims at presenting only the rules that are interesting to the user Closely related to the study of “ surprisingness ” or “ unexpectedness ” of association rules

5 Interestingness: two approaches Objective measures (data-driven) Subjective measures (user-driven)

6 Objective measures Mined rules are ranked by a pre-defined ranking system, or Mined rules are filtered by a set of pre-defined pruning rules

7 Pruning rules [Shah 99] proposed five pruning rules: For two rules r 1 and r 2 with similar strength: if r 1 = A  C and r 2 = A  B  C, r 2 is redundant if r 1 = A  C and r 2 = B  C, while B  A but A  B is false, r 2 is redundant if r 1 = A  C and r 2 = B  C, while B  A and A  B are both true, both r 1 and r 2 are weak if r 1 = A  B and r 2 = A  B  C, r 1 is redundant if r 1 = A  B and r 2 = A  C, while B  C, r 2 is redundant

8 Small disjuncts A disjunct is a conjunctive set of conditions e.g., (C 11  C 12 ...  C 1m )  (C 21  C 22 ...  C 2n ) ... The size of a disjunct is determined by the number of tuples covered by the disjunct Small disjuncts may contain surprising knowledge, although they are prone to errors

9 Small disjuncts (Cont ’ d) [Freitas 98] proposed that an association rule can be regarded as a disjunct For a rule (disjunct) to be considered surprising, for each rule r = (i 1  i 2 ...  i n )  I c, count(r) = 0 for each minimal generalization r ’ of r if RHS of r ’  RHS of r count(r)++ Rules are ranked according to their counts

10 Subjective measures Users are required to specify whether the mined rules are interesting... But it is impossible to do so rule by rule Hence rules are handled collectively

11 Rule templates Interesting and uninteresting rules can be specified with templates [Klemettinen et al. 94] A rule template specifies what attributes to occur in the LHS and RHS of a rule e.g., any rule in the form “ Pregnant ” & (any number of conditions)  “ Female ” is uninteresting

12 Eliminating uninteresting rule families Proposed in [Sahar 99] For a rule r = A  B, r ’ = a  b is an ancestor rule if a  A and b  B. r’ is said to cover r. An ancestor rule can be classified as one of the following: True-Not-Interesting (TNI) Not-True-Interesting (NTI) Not-True-Not-Interesting (NTNI) True-Interesting (TI)

13 Eliminating uninteresting rule families (Cont ’ d) The algorithm: Let  denote the set of association rules from data mining Iteratively: The ancestor rule r ’ that covers the largest number of rules in  is presented to user for classification r ’ is classified as one of TNI, NTI, NTNI and TI  is pruned according to the classification of r ’

14 Research issues The problem of rule interestingness is difficult because domain knowledge and/or user interaction [Sahar 99] Possible research directions: Machine learning on interesting rules How interestingness information is used in a data mining process

15 Conclusion Association rule interestingness is an important, but difficult, problem Measures of rule interestingness include subjective and objective ones Objective interestingness measures are data driven Subjective interestingness measures require users to specify whether a rule is interesting

16 References [Dong et al. 01] Guozhu Dong and Kaustubh Deshpande. Efficient Mining of Niches and Set Routines. PAKDD01. [Freitas 98] Alex A. Freitas. On Objective Measures of Rule Surprisingness. PKDD98. [Klemettinen et al. 94] Mika Klemettinen et al. Finding Interesting Rules from Large Sets of Discovered Association Rules. CIKM94. [Sahar 99] Sigal Sahar. Interestingness Via What Is Not Interesting. KDD99. [Shah 99] Devavrat Shah. Interestingness and Pruning of Mined Patterns. DMKD99.

17 Discussion


Download ppt "Mining interesting association rules Loo Kin Kong 22 Feb 2002."

Similar presentations


Ads by Google