1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation
2 Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References
3 Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References
4 Introduction Recall mining association rules Association rules mining finds interesting association or correlation relationships among a large set of data items.
5 Some problems we met during mining association rules Overwhelming? Not what you want? Wait so long? Lack of Focus
6 Introduction(cont.) Example in walmart Suppose a manager want to find which is the most popular shoes in winter?
7 Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References
8 Mining frequent itemsets vs. Mining association rules Mining frequent itemsets is almost the same as Mining association rules
9 Constrained Mining A naive solution First find all frequent sets, and then test them for constraint satisfaction Our approach: Analyze the properties of constraints comprehensively Push them as deeply as possible inside the frequent pattern computation.
10 Frequent Itemsets & Constraints Given a transaction database Frequent itemset: a subset of items frequently appear in transactions, e.g. {a, c} Constraint: a predicate over itemsets C(I): sum(I)>50 C(abd)= TIDTransaction 10a, b, c 20b, c, d, f 30a, c TDB (min_sup=2) ItemValue a40 b10 c-20 d10 e-30 true
11 Mining Frequent Itemsets With Constraints Given A transaction database TDB A support threshold min_sup A constraint C Find the complete set of frequent itemsets satisfying the constraint Use constraint to Express user’s focus Improve both effectiveness and efficiency
12 Classification of Constraints We have the following classification of constraints Anti-monotone Monotone Succinct Convertible Convertible anti-monotone Convertible monotone Strongly convertible Inconvertible
13 Anti-Monotone Definition 1 (Anti-Monotone): A 1-var constraint C is anti-monotone if for all sets S, S ’ : S S ’ & S satisfies C S ’ satisfies C. Simply, when an intemset S violates the constraint, so does any of its superset
14 Is Min(S) v anti-monotone? S={5, 10, 14}, v = 7 Min(S) 7 {5} v iolates it. Superset {5}: {5, 10}, {5, 14}, {5, 10, 14} So does {5, 10}, {5, 14}, {5, 10, 14} Min(S) v is anti-monotone
15 Succinct Definition 2 (Succinct) I Item is a succinct set if it can be expressed as p (Item) for some selection predicate p. SP 2 Item is a succinct powerset if there is a fixed number of succinct sets Item 1, … Item k Item such that SP can be expressed in terms of the strict powersets of Item1, …,Item k, using union and minus. Finally, a 1-var constraint C is succinct provided SATc(Item) is a succinct powerset.
16 Succinct General idea: we can enumerate all and only those sets that are guaranteed to satisfy the constraint. If a constraint is succinct, we can directly generate precisely the sets that satisfy it.
17 Succinct example Itemset containing a or b Itemset containing some item with value more than 30
18 Succinct example C1 Item.Price 100 Item 1 = Item.price 100 (Item)={a,b} 2 Item1 ={ {a}, {b}, {a, b} } SAT c1 = { {a}, {b}, {a, b} } SAT c1 = 2 Item1 C1 is succinct
19 Convertible Convert tough constraints into anti- monotone or monotone by properly order items
20 Convertible Definition: R is an order of items Convertible anti-monotone Itemset X satisfies constraint so does every prefix of X w.r.t. R
21 Convertible example constraint C: avg(X) 25 Order items in value- descending order Itemset afd satisfies C So do prefixes a and af Thus, it becomes Anti-monotone! ItemValue a40 b0 c-20 d10 e-30 f30 g20 h-10 ItemValue a40 f30 g20 d10 b0 h-10 c-20 e-30
22 Commonly Used Constraints— A General Picture ConstraintAntimonotoneMonotoneSuccinct v S noyes S V noyes S V yesnoyes min(S) v noyes min(S) v yesnoyes max(S) v yesnoyes max(S) v noyes count(S) v yesnoweakly count(S) v noyesweakly sum(S) v ( a S, a 0 ) yesno sum(S) v ( a S, a 0 ) noyesno range(S) v yesno range(S) v noyesno avg(S) v, { , , } convertible no support(S) yesno support(S) noyesno
23 Optional Proof of min(S) v is Anti-monotone According to the table, min(S) v is both anti-monotone and succinct. I only proof anti-monotone here due to time limitation. Something special…
24 Constraint Classification Convertible anti-monotone Convertible monotone Strongly convertible Inconvertible Succinct Antimonotone Monotone
25 Summary of Approach Recapitulation Basic idea about mining frequent itemsets with constraints. Introduce several important constraints.
26 Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References
27 Algorithms There are many algorithms in solving constrained based association rules mining. Algorithm Direct Algorithm MultiJoins & Reorder Algorithm Apriori † Algorithm Hybrid(m) Algorithm CAP (Main Focus)
28 Design of Algorithm Sound An algorithm is sound provided it only finds frequent sets that satisfy the given constraints. Complete An algorithm is complete provided all frequent sets satisfying the given constraints are found.
29 Algorithm Apriori † Main idea : Use Apriori Algorithm to get the frequent item sets. Then apply the constraints on the item sets found. Step 1) Apriori with C freq Step 2) Apply C – C freq to get final Ans
30 Algorithm Apriori † (Pseudocode) 1. C 1 consists of sets of size 1; k = 1; Ans = ; 2. While (C k not empty) { 2.1 conduct db scan to form L k from C k ; 2.2 form C k+1 from L k based on C freq ; k++; } 3. For each set S in some L k : Add S to Ans if S satisfies (C – C freq ).
The Apriori † Algorithm — An Example Database TDB 1 st scan C1C1 L1L1 L2L2 C2C2 C2C2 2 nd scan C3C3 L3L3 3 rd scan TidItems 10A, C, D 20B, C, E 30A, B, C, E 40B, E Itemsetsup {A}2 {B}3 {C}3 {D}1 {E}3 Itemsetsup {A}2 {B}3 {C}3 {E}3 Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} Itemsetsup {A, B}1 {A, C}2 {A, E}1 {B, C}2 {B, E}3 {C, E}2 Itemsetsup {A, C}2 {B, C}2 {B, E}3 {C, E}2 Itemset {B, C, E} Itemsetsup {B, C, E}2
The Apriori † Algorithm — An Example (cont.) Database TDB L2L2 TidItems 10A, C, D 20B, C, E 30A, B, C, E 40B, E Itemsetsup {A}2 {B}3 {C}3 {E}3 Itemsetsup {A, C}2 {B, C}2 {B, E}3 {C, E}2 Itemsetsup {B, C, E}2 L3L3 L1L1 Constraint : {A, C, E} T.Item Ans {A} {C} {E} {A, C} {C, E}
33 Algorithm CAP Succinct and Anti-monotone Strategy I: Replace C 1 in the Apriori Algorithm by C 1 C. Anti-monotone but non-succinct Strategy II: Define C k as in the Apriori Algorithm. Drop a set S C k from counting if S fails C, i.e., constraint satisfaction is tested before counting is done.
34 Algorithm CAP (cont.) Succinct but non-anti-monotone Strategy III: Too Complicated. To be discussed later … Non-succinct & non-anti-monotone Strategy IV: Induce any weaker constraint C 1 from C. Depending on whether C 1 is anti-monotone and/or succinct, use one of the strategies I-III above for the generation of frequent set.
35 Algorithm CAP (Pseudocode) 1 if C sam C suc C none is non-empty, prepare C 1 as indicated in Strategies I, III, and IV; k = 1; 2 if C suc is non-empty { 2.1 conduct db scan to form L 1 as indicated in Strategy III; 2.2 form C 2 as indicated in Strategy III; k = 2;} 3 while (C k not empty) { 3.1 conduct db scan to form L k from C k ; 3.2 form C k+1 from L k based on Strategy III if C suc is non-empty, and Strategy II for constraints in C am ;} 4. if C none is empty, Ans = UL k. Otherwise, for each set S in some L k, add S to Ans iff S satisfies C none.
The Algorithm CAP — An Example Database TDB TidItems 10A, C, D 20B, C, E 30A, B, C, E 40B, E Constraints : {A, C, E} T.Item & min support count = 2 Question : Which strategy should we apply?
The Algorithm CAP — An Example (Cont.) Database TDB 1 st scan C1C1 L1L1 L2L2 C2C2 C2C2 2 nd scan C3C3 TidItems 10A, C, D 20B, C, E 30A, B, C, E 40B, E Itemsetsup {A}2 {C}3 {E}3 Itemsetsup {A}2 {C}3 {E}3 Itemset {A, C} {A, E} {C, E} Itemsetsup {A, C}2 {A, E}1 {C, E}2 Itemsetsup {A, C}2 {C, E}2 Itemset {} Because {A, E} is pruned earlier Ans {A} {C} {E} {A, C} {C, E} Apply Strategy I!!!
38 Case 3 : Succinct but not anti- monotone. Revisit… {1} {2} {3} {4} {1,2} {2,3}………{3,4} ……… {1,2,3,4} Some possible frequent sets may be lost: e.g. {1,8} {1,2,10} Apriori {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} min (S) < 5 {1} {2} {3} {4} **Information extracted from past presentation.
39 Case 3 : Succinct but not anti- monotone. Continue… Algorithm Direct Idea : Play it safe. Generate C c k+1 by using L c k x F where F is the set of all frequent items. Algorithm MultiJoins Algorithm Reorder
40 Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References
41 Performance Analysis (Specification) Programs written in C Generate transactional databases using program from IBM Almaden Research Center 100,000 records, domain of 1,000 items Page size 4KB SPARC-10 environment
42 Performance Analysis (Terminology) Speedup Comparison of execution time between two algorithms. Item Selectivity x% of them items satisfying the constraints. Support Threshold *Low support threshold means more frequent set to process.
43 Performance Analysis Note: Support threshold set at 0.5%. For 10% selectivity, CAP runs 80 times faster than Apriori † ! For 30% selectivity, the speedup is about 10 times.
44 Performance Analysis Note: Item Selectivity fixed at 30%. Support threshold goes up, frequent item set goes down, Apriori † improves. CAP still at least 8 times faster.
45 Performance Analysis Each entry is of the form a/b a is the # of frequent set satisfying the constraint. B is the total number of frequent set. For L 4 with support of 0.2%, Apriori † finds 1250 frequent sets where 8 of which is found by CAP. Support L1L1 L2L2 L3L3 L4L4 L5L5 L6L6 L7L7 L8L8 0.2%174/58279/96929/11408/12501/9340/4510/1320/20 0.6%98/3131/120/100000
46 Conclusion The idea of anti-monotonicity, succinctness, and convertible are introduced in the paper. Sound, complete, and efficient algorithms are introduced for the constraint based association rule mining.
47 Reference R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. KDD’97. R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. SIGMOD’98. J. Pei and J. Han. Can we push more constraints into frequent pattern mining? KDD’00.