Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.

Association Rule Mining

Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns  Mining various kinds of association/correlation rules

Max-patterns & Close-patterns  If there are frequent patterns with many items, enumerating all of them is costly.  We may be interested in finding the ‘ boundary ’ frequent patterns.  Two types …

Max-patterns  Frequent pattern {a 1, …, a 100 }  ( 100 1 ) + ( 100 2 ) + … + ( 1 1 0 0 0 0 ) = 2 100 -1 = 1.27*10 30 frequent sub-patterns!  Max-pattern: frequent patterns without proper frequent super pattern BCDE, ACD are max-patterns BCD is not a max-pattern TidItems 10A,B,C,D,E 20B,C,D,E, 30A,C,D,F Min_sup=2

MaxMiner: Mining Max-patterns  Idea: generate the complete set- enumeration tree one level at a time, while prune if applicable.  (ABCD) A (BCD) B (CD) C (D)D () AB (CD)AC (D)AD () BC (D)BD () CD ()ABC (C) ABCD () ABD ()ACD ()BCD ()

Local Pruning Techniques (e.g. at node A) Check the frequency of ABCD and AB, AC, AD.  If ABCD is frequent, prune the whole sub-tree.  If AC is NOT frequent, remove C from the parenthesis before expanding.  (ABCD) A (BCD) B (CD) C (D)D () AB (CD)AC (D)AD () BC (D)BD () CD ()ABC (C) ABCD () ABD ()ACD ()BCD ()

Algorithm MaxMiner  Initially, generate one node N=, where h(N)= and t(N)={A,B,C,D}.  Consider expanding N, If h(N)t(N) is frequent, do not expand N. If for some it(N), h(N){i} is NOT frequent, remove i from t(N) before expanding N.  Apply global pruning techniques …  (ABCD)

Global Pruning Technique (across sub-trees)  When a max pattern is identified (e.g. ABCD), prune all nodes (e.g. B, C and D) where h(N)t(N) is a sub-set of it (e.g. ABCD).  (ABCD) A (BCD) B (CD) C (D)D () AB (CD)AC (D)AD () BC (D)BD () CD ()ABC (C) ABCD () ABD ()ACD ()BCD ()

Example TidItems 10A,B,C,D,E 20B,C,D,E, 30A,C,D,F  (ABCDEF) ItemsFrequency ABCDEF0 A2 B2 C3 D3 E2 F1 Min_sup=2 Max patterns: A (BCDE) B (CDE)C (DE)E ()D (E)

Example TidItems 10A,B,C,D,E 20B,C,D,E, 30A,C,D,F  (ABCDEF) ItemsFrequency ABCDE1 AB1 AC2 AD2 AE1 Min_sup=2 A (BCDE) B (CDE)C (DE)E ()D (E) AC (D)AD () Max patterns: Node A

Example TidItems 10A,B,C,D,E 20B,C,D,E, 30A,C,D,F  (ABCDEF) ItemsFrequency BCDE2 BC BD BE Min_sup=2 A (BCDE) B (CDE)C (DE)E ()D (E) AC (D)AD () Max patterns: BCDE Node B

Example TidItems 10A,B,C,D,E 20B,C,D,E, 30A,C,D,F  (ABCDEF) ItemsFrequency ACD2 Min_sup=2 A (BCDE) B (CDE)C (DE)E ()D (E) AC (D)AD () Max patterns: BCDE ACD () ACD Node AC

Frequent Closed Patterns  For frequent itemset X, if there exists no item y s.t. every transaction containing X also contains y, then X is a frequent closed pattern “ ab ” is a frequent closed pattern  Concise rep. of freq pats  Reduce # of patterns and rules  N. Pasquier et al. In ICDT ’ 99 TIDItems 10a, b, c 20a, b, c 30a, b, d 40a, b, d 50e, f Min_sup=2

Max Pattern vs. Frequent Closed Pattern  max pattern  closed pattern if itemset X is a max pattern, adding any item to it would not be a frequent pattern; thus there exists no item y s.t. every transaction containing X also contains y.  closed pattern  max pattern “ ab ” is a closed pattern, but not max TIDItems 10a, b, c 20a, b, c 30a, b, d 40a, b, d 50e, f Min_sup=2

Mining Frequent Closed Patterns: CLOSET  Flist: list of all frequent items in support ascending order Flist: d-a-f-e-c  Divide search space Patterns having d Patterns having a but not d, etc.  Find frequent closed pattern recursively Among the transactions having d, cfa is frequent closed  cfad is a frequent closed pattern  J. Pei, J. Han & R. Mao. CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets", DMKD'00. TIDItems 10a, c, d, e, f 20a, b, e 30c, e, f 40a, c, d, f 50c, e, f Min_sup=2

Multiple-Level Association Rules  Items often form hierarchy.  Items at the lower level are expected to have lower support.  Rules regarding itemsets at appropriate levels could be quite useful.  A transactional database can be encoded based on dimensions and levels  We can explore shared multi- level mining Food bread milk skim Garelick 2% fatwhite wheat Wonder....

Mining Multi-Level Associations  A top_down, progressive deepening approach: First find high-level strong rules: milk  bread [20%, 60%]. Then find their lower-level “weaker” rules: 2% fat milk  wheat bread [6%, 50%].  Variations at mining multiple-level association rules. Level-crossed association rules: skim milk  Wonder wheat bread Association rules with multiple, alternative hierarchies: full fat milk  Wonder bread

Multi-level Association: Uniform Support vs. Reduced Support  Uniform Support: the same minimum support for all levels + One minimum support threshold. No need to examine itemsets containing any item whose ancestors do not have minimum support. – Lower level items do not occur as frequently. If support threshold  too high  miss low level associations  too low  generate too many high level associations

Multi-level Association: Uniform Support vs. Reduced Support  Reduced Support: reduced minimum support at lower levels There are 4 search strategies:  Level-by-level independent  Independent search at all levels (no misses)  Level-cross filtering by k-itemset  Prune a k-pattern if the corresponding k-pattern at the upper level is infrequent  Level-cross filtering by single item  Prune an item if its parent node is infrequent  Controlled level-cross filtering by single item  Consider ‘subfrequent’ items that pass a passage threshold

Uniform Support Multi-level mining with uniform support Milk [support = 10%] full fat Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 5% X

Reduced Support Multi-level mining with reduced support full fat Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 3% Milk [support = 10%]

Interestingness Measurements  Objective measures Two popular measurements: ¶ support; and · confidence  Subjective measures A rule (pattern) is interesting if ¶ it is unexpected (surprising to the user); and/or · actionable (the user can do something with it)

Criticism to Support and Confidence  Example 1: Among 5000 students  3000 play basketball  3750 eat cereal  2000 both play basket ball and eat cereal play basketball  eat cereal [40%, 66.7%] is misleading because the overall percentage of students eating cereal is 75% which is higher than 66.7%. play basketball  not eat cereal [20%, 33.3%] is far more accurate, although with lower support and confidence

Criticism to Support and Confidence (Cont.)  Example 2: X and Y: positively correlated, X and Z, negatively related support and confidence of X=>Z dominates  We need a measure of dependent or correlated events  P(B|A)/P(B) is also called the lift of rule A => B

Other Interestingness Measures: Interest  Interest (correlation, lift) taking both P(A) and P(B) in consideration P(AB)=P(B)*P(A), if A and B are independent events A and B negatively correlated, if the value is less than 1; otherwise A and B positively correlated

Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.

Similar presentations

Presentation on theme: "Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.

Similar presentations

Presentation on theme: "Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns."— Presentation transcript:

Similar presentations

About project

Feedback