Download presentation
Presentation is loading. Please wait.
1
Association Rule Mining
2
Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns Mining various kinds of association/correlation rules
3
Max-patterns & Close-patterns If there are frequent patterns with many items, enumerating all of them is costly. We may be interested in finding the ‘ boundary ’ frequent patterns. Two types …
4
Max-patterns Frequent pattern {a 1, …, a 100 } ( 100 1 ) + ( 100 2 ) + … + ( 1 1 0 0 0 0 ) = 2 100 -1 = 1.27*10 30 frequent sub-patterns! Max-pattern: frequent patterns without proper frequent super pattern BCDE, ACD are max-patterns BCD is not a max-pattern TidItems 10A,B,C,D,E 20B,C,D,E, 30A,C,D,F Min_sup=2
5
MaxMiner: Mining Max-patterns Idea: generate the complete set- enumeration tree one level at a time, while prune if applicable. (ABCD) A (BCD) B (CD) C (D)D () AB (CD)AC (D)AD () BC (D)BD () CD ()ABC (C) ABCD () ABD ()ACD ()BCD ()
6
Local Pruning Techniques (e.g. at node A) Check the frequency of ABCD and AB, AC, AD. If ABCD is frequent, prune the whole sub-tree. If AC is NOT frequent, remove C from the parenthesis before expanding. (ABCD) A (BCD) B (CD) C (D)D () AB (CD)AC (D)AD () BC (D)BD () CD ()ABC (C) ABCD () ABD ()ACD ()BCD ()
7
Algorithm MaxMiner Initially, generate one node N=, where h(N)= and t(N)={A,B,C,D}. Consider expanding N, If h(N)t(N) is frequent, do not expand N. If for some it(N), h(N){i} is NOT frequent, remove i from t(N) before expanding N. Apply global pruning techniques … (ABCD)
8
Global Pruning Technique (across sub-trees) When a max pattern is identified (e.g. ABCD), prune all nodes (e.g. B, C and D) where h(N)t(N) is a sub-set of it (e.g. ABCD). (ABCD) A (BCD) B (CD) C (D)D () AB (CD)AC (D)AD () BC (D)BD () CD ()ABC (C) ABCD () ABD ()ACD ()BCD ()
9
Example TidItems 10A,B,C,D,E 20B,C,D,E, 30A,C,D,F (ABCDEF) ItemsFrequency ABCDEF0 A2 B2 C3 D3 E2 F1 Min_sup=2 Max patterns: A (BCDE) B (CDE)C (DE)E ()D (E)
10
Example TidItems 10A,B,C,D,E 20B,C,D,E, 30A,C,D,F (ABCDEF) ItemsFrequency ABCDE1 AB1 AC2 AD2 AE1 Min_sup=2 A (BCDE) B (CDE)C (DE)E ()D (E) AC (D)AD () Max patterns: Node A
11
Example TidItems 10A,B,C,D,E 20B,C,D,E, 30A,C,D,F (ABCDEF) ItemsFrequency BCDE2 BC BD BE Min_sup=2 A (BCDE) B (CDE)C (DE)E ()D (E) AC (D)AD () Max patterns: BCDE Node B
12
Example TidItems 10A,B,C,D,E 20B,C,D,E, 30A,C,D,F (ABCDEF) ItemsFrequency ACD2 Min_sup=2 A (BCDE) B (CDE)C (DE)E ()D (E) AC (D)AD () Max patterns: BCDE ACD () ACD Node AC
13
Frequent Closed Patterns For frequent itemset X, if there exists no item y s.t. every transaction containing X also contains y, then X is a frequent closed pattern “ ab ” is a frequent closed pattern Concise rep. of freq pats Reduce # of patterns and rules N. Pasquier et al. In ICDT ’ 99 TIDItems 10a, b, c 20a, b, c 30a, b, d 40a, b, d 50e, f Min_sup=2
14
Max Pattern vs. Frequent Closed Pattern max pattern closed pattern if itemset X is a max pattern, adding any item to it would not be a frequent pattern; thus there exists no item y s.t. every transaction containing X also contains y. closed pattern max pattern “ ab ” is a closed pattern, but not max TIDItems 10a, b, c 20a, b, c 30a, b, d 40a, b, d 50e, f Min_sup=2
15
Mining Frequent Closed Patterns: CLOSET Flist: list of all frequent items in support ascending order Flist: d-a-f-e-c Divide search space Patterns having d Patterns having a but not d, etc. Find frequent closed pattern recursively Among the transactions having d, cfa is frequent closed cfad is a frequent closed pattern J. Pei, J. Han & R. Mao. CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets", DMKD'00. TIDItems 10a, c, d, e, f 20a, b, e 30c, e, f 40a, c, d, f 50c, e, f Min_sup=2
16
Multiple-Level Association Rules Items often form hierarchy. Items at the lower level are expected to have lower support. Rules regarding itemsets at appropriate levels could be quite useful. A transactional database can be encoded based on dimensions and levels We can explore shared multi- level mining Food bread milk skim Garelick 2% fatwhite wheat Wonder....
17
Mining Multi-Level Associations A top_down, progressive deepening approach: First find high-level strong rules: milk bread [20%, 60%]. Then find their lower-level “weaker” rules: 2% fat milk wheat bread [6%, 50%]. Variations at mining multiple-level association rules. Level-crossed association rules: skim milk Wonder wheat bread Association rules with multiple, alternative hierarchies: full fat milk Wonder bread
18
Multi-level Association: Uniform Support vs. Reduced Support Uniform Support: the same minimum support for all levels + One minimum support threshold. No need to examine itemsets containing any item whose ancestors do not have minimum support. – Lower level items do not occur as frequently. If support threshold too high miss low level associations too low generate too many high level associations
19
Multi-level Association: Uniform Support vs. Reduced Support Reduced Support: reduced minimum support at lower levels There are 4 search strategies: Level-by-level independent Independent search at all levels (no misses) Level-cross filtering by k-itemset Prune a k-pattern if the corresponding k-pattern at the upper level is infrequent Level-cross filtering by single item Prune an item if its parent node is infrequent Controlled level-cross filtering by single item Consider ‘subfrequent’ items that pass a passage threshold
20
Uniform Support Multi-level mining with uniform support Milk [support = 10%] full fat Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 5% X
21
Reduced Support Multi-level mining with reduced support full fat Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 3% Milk [support = 10%]
22
Interestingness Measurements Objective measures Two popular measurements: ¶ support; and · confidence Subjective measures A rule (pattern) is interesting if ¶ it is unexpected (surprising to the user); and/or · actionable (the user can do something with it)
23
Criticism to Support and Confidence Example 1: Among 5000 students 3000 play basketball 3750 eat cereal 2000 both play basket ball and eat cereal play basketball eat cereal [40%, 66.7%] is misleading because the overall percentage of students eating cereal is 75% which is higher than 66.7%. play basketball not eat cereal [20%, 33.3%] is far more accurate, although with lower support and confidence
24
Criticism to Support and Confidence (Cont.) Example 2: X and Y: positively correlated, X and Z, negatively related support and confidence of X=>Z dominates We need a measure of dependent or correlated events P(B|A)/P(B) is also called the lift of rule A => B
25
Other Interestingness Measures: Interest Interest (correlation, lift) taking both P(A) and P(B) in consideration P(AB)=P(B)*P(A), if A and B are independent events A and B negatively correlated, if the value is less than 1; otherwise A and B positively correlated
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.