1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation.

Slides:



Advertisements
Similar presentations
Recap: Mining association rules from large datasets
Advertisements

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
CSE 634 Data Mining Techniques
Graph Mining Laks V.S. Lakshmanan
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Data Mining in Clinical Databases by using Association Rules Department of Computing Charles Lo.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Constrained frequent itemset mining.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Association Rules Outline Goal: Provide an overview of basic Association Rule mining techniques Association Rules Problem Overview –Large itemsets Association.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining: Concepts and Techniques (2nd ed.) — Chapter 5 —
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rule Mining Zhenjiang Lin Group Presentation April 10, 2007.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Mining Association Rules in Large Databases
1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
1 Association Rule Mining (II) Instructor: Qiang Yang Thanks: J.Han and J. Pei.
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Fast Algorithms for Association Rule Mining
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Constrained frequent itemset mining.
Constraint-based (Query-Directed) Mining Finding all the patterns in a database autonomously? — unrealistic! The patterns could be too many but not focused!
What Is Sequential Pattern Mining?
ICMLC2007, Aug. 19~22, 2007, Hong Kong 1 Incremental Maintenance of Ontology- Exploiting Association Rules Ming-Cheng Tseng 1, Wen-Yang Lin 2 and Rong.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
November 3, 2015Data Mining: Concepts and Techniques1 Chapter 5: Mining Frequent Patterns, Association and Correlations Basic concepts and a road map Efficient.
Association Rules: Advanced Topics. Apriori Adv/Disadv Advantages: –Uses large itemset property. –Easily parallelized –Easy to implement. Disadvantages:
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.
Chapter 6: Mining Frequent Patterns, Association and Correlations
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
Mining Frequent Patterns, Association, and Correlations (cont.) Pertemuan 06 Matakuliah: M0614 / Data Mining & OLAP Tahun : Feb
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1998 년 8 월 7 일 Data Engineering Lab 성 유진 1 Exploratory Mining and Pruning Optimization of Constrained Associations Rules.
Reducing Number of Candidates
Association rule mining
Frequent Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Sequential Patterns
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Association Analysis: Basic Concepts
Presentation transcript:

1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

2 Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References

3 Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References

4 Introduction Recall mining association rules Association rules mining finds interesting association or correlation relationships among a large set of data items.

5 Some problems we met during mining association rules Overwhelming? Not what you want? Wait so long? Lack of Focus

6 Introduction(cont.) Example in walmart Suppose a manager want to find which is the most popular shoes in winter?

7 Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References

8 Mining frequent itemsets vs. Mining association rules Mining frequent itemsets is almost the same as Mining association rules

9 Constrained Mining A naive solution First find all frequent sets, and then test them for constraint satisfaction Our approach: Analyze the properties of constraints comprehensively Push them as deeply as possible inside the frequent pattern computation.

10 Frequent Itemsets & Constraints Given a transaction database Frequent itemset: a subset of items frequently appear in transactions, e.g. {a, c} Constraint: a predicate over itemsets C(I): sum(I)>50 C(abd)= TIDTransaction 10a, b, c 20b, c, d, f 30a, c TDB (min_sup=2) ItemValue a40 b10 c-20 d10 e-30 true

11 Mining Frequent Itemsets With Constraints Given A transaction database TDB A support threshold min_sup A constraint C Find the complete set of frequent itemsets satisfying the constraint Use constraint to Express user’s focus Improve both effectiveness and efficiency

12 Classification of Constraints We have the following classification of constraints Anti-monotone Monotone Succinct Convertible Convertible anti-monotone Convertible monotone Strongly convertible Inconvertible

13 Anti-Monotone Definition 1 (Anti-Monotone): A 1-var constraint C is anti-monotone if for all sets S, S ’ : S  S ’ & S satisfies C  S ’ satisfies C. Simply, when an intemset S violates the constraint, so does any of its superset

14 Is Min(S)  v anti-monotone? S={5, 10, 14}, v = 7  Min(S)  7 {5} v iolates it. Superset {5}: {5, 10}, {5, 14}, {5, 10, 14} So does {5, 10}, {5, 14}, {5, 10, 14} Min(S)  v is anti-monotone

15 Succinct Definition 2 (Succinct) I  Item is a succinct set if it can be expressed as  p (Item) for some selection predicate p. SP  2 Item is a succinct powerset if there is a fixed number of succinct sets Item 1, … Item k  Item such that SP can be expressed in terms of the strict powersets of Item1, …,Item k, using union and minus. Finally, a 1-var constraint C is succinct provided SATc(Item) is a succinct powerset.

16 Succinct General idea: we can enumerate all and only those sets that are guaranteed to satisfy the constraint. If a constraint is succinct, we can directly generate precisely the sets that satisfy it.

17 Succinct example Itemset containing a or b Itemset containing some item with value more than 30

18 Succinct example C1  Item.Price  100 Item 1 =  Item.price  100 (Item)={a,b} 2 Item1 ={ {a}, {b}, {a, b} } SAT c1 = { {a}, {b}, {a, b} } SAT c1 = 2 Item1 C1 is succinct

19 Convertible Convert tough constraints into anti- monotone or monotone by properly order items

20 Convertible Definition: R is an order of items Convertible anti-monotone Itemset X satisfies constraint  so does every prefix of X w.r.t. R

21 Convertible example constraint C: avg(X)  25 Order items in value- descending order Itemset afd satisfies C So do prefixes a and af Thus, it becomes Anti-monotone! ItemValue a40 b0 c-20 d10 e-30 f30 g20 h-10 ItemValue a40 f30 g20 d10 b0 h-10 c-20 e-30

22 Commonly Used Constraints— A General Picture ConstraintAntimonotoneMonotoneSuccinct v  S noyes S  V noyes S  V yesnoyes min(S)  v noyes min(S)  v yesnoyes max(S)  v yesnoyes max(S)  v noyes count(S)  v yesnoweakly count(S)  v noyesweakly sum(S)  v ( a  S, a  0 ) yesno sum(S)  v ( a  S, a  0 ) noyesno range(S)  v yesno range(S)  v noyesno avg(S)  v,   { , ,  } convertible no support(S)   yesno support(S)   noyesno

23 Optional Proof of min(S)  v is Anti-monotone According to the table, min(S)  v is both anti-monotone and succinct. I only proof anti-monotone here due to time limitation. Something special…

24 Constraint Classification Convertible anti-monotone Convertible monotone Strongly convertible Inconvertible Succinct Antimonotone Monotone

25 Summary of Approach Recapitulation Basic idea about mining frequent itemsets with constraints. Introduce several important constraints.

26 Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References

27 Algorithms There are many algorithms in solving constrained based association rules mining. Algorithm Direct Algorithm MultiJoins & Reorder Algorithm Apriori † Algorithm Hybrid(m) Algorithm CAP (Main Focus)

28 Design of Algorithm Sound An algorithm is sound provided it only finds frequent sets that satisfy the given constraints. Complete An algorithm is complete provided all frequent sets satisfying the given constraints are found.

29 Algorithm Apriori † Main idea : Use Apriori Algorithm to get the frequent item sets. Then apply the constraints on the item sets found. Step 1) Apriori with C freq Step 2) Apply C – C freq to get final Ans

30 Algorithm Apriori † (Pseudocode) 1. C 1 consists of sets of size 1; k = 1; Ans =  ; 2. While (C k not empty) { 2.1 conduct db scan to form L k from C k ; 2.2 form C k+1 from L k based on C freq ; k++; } 3. For each set S in some L k : Add S to Ans if S satisfies (C – C freq ).

The Apriori † Algorithm — An Example Database TDB 1 st scan C1C1 L1L1 L2L2 C2C2 C2C2 2 nd scan C3C3 L3L3 3 rd scan TidItems 10A, C, D 20B, C, E 30A, B, C, E 40B, E Itemsetsup {A}2 {B}3 {C}3 {D}1 {E}3 Itemsetsup {A}2 {B}3 {C}3 {E}3 Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} Itemsetsup {A, B}1 {A, C}2 {A, E}1 {B, C}2 {B, E}3 {C, E}2 Itemsetsup {A, C}2 {B, C}2 {B, E}3 {C, E}2 Itemset {B, C, E} Itemsetsup {B, C, E}2

The Apriori † Algorithm — An Example (cont.) Database TDB L2L2 TidItems 10A, C, D 20B, C, E 30A, B, C, E 40B, E Itemsetsup {A}2 {B}3 {C}3 {E}3 Itemsetsup {A, C}2 {B, C}2 {B, E}3 {C, E}2 Itemsetsup {B, C, E}2 L3L3 L1L1 Constraint : {A, C, E}  T.Item Ans {A} {C} {E} {A, C} {C, E}

33 Algorithm CAP Succinct and Anti-monotone Strategy I: Replace C 1 in the Apriori Algorithm by C 1 C. Anti-monotone but non-succinct Strategy II: Define C k as in the Apriori Algorithm. Drop a set S  C k from counting if S fails C, i.e., constraint satisfaction is tested before counting is done.

34 Algorithm CAP (cont.) Succinct but non-anti-monotone Strategy III: Too Complicated. To be discussed later … Non-succinct & non-anti-monotone Strategy IV: Induce any weaker constraint C 1 from C. Depending on whether C 1 is anti-monotone and/or succinct, use one of the strategies I-III above for the generation of frequent set.

35 Algorithm CAP (Pseudocode) 1 if C sam  C suc  C none is non-empty, prepare C 1 as indicated in Strategies I, III, and IV; k = 1; 2 if C suc is non-empty { 2.1 conduct db scan to form L 1 as indicated in Strategy III; 2.2 form C 2 as indicated in Strategy III; k = 2;} 3 while (C k not empty) { 3.1 conduct db scan to form L k from C k ; 3.2 form C k+1 from L k based on Strategy III if C suc is non-empty, and Strategy II for constraints in C am ;} 4. if C none is empty, Ans = UL k. Otherwise, for each set S in some L k, add S to Ans iff S satisfies C none.

The Algorithm CAP — An Example Database TDB TidItems 10A, C, D 20B, C, E 30A, B, C, E 40B, E Constraints : {A, C, E}  T.Item & min support count = 2 Question : Which strategy should we apply?

The Algorithm CAP — An Example (Cont.) Database TDB 1 st scan C1C1 L1L1 L2L2 C2C2 C2C2 2 nd scan C3C3 TidItems 10A, C, D 20B, C, E 30A, B, C, E 40B, E Itemsetsup {A}2 {C}3 {E}3 Itemsetsup {A}2 {C}3 {E}3 Itemset {A, C} {A, E} {C, E} Itemsetsup {A, C}2 {A, E}1 {C, E}2 Itemsetsup {A, C}2 {C, E}2 Itemset {} Because {A, E} is pruned earlier Ans {A} {C} {E} {A, C} {C, E} Apply Strategy I!!!

38 Case 3 : Succinct but not anti- monotone. Revisit… {1} {2} {3} {4} {1,2} {2,3}………{3,4} ……… {1,2,3,4} Some possible frequent sets may be lost: e.g. {1,8} {1,2,10} Apriori {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} min (S) < 5 {1} {2} {3} {4} **Information extracted from past presentation.

39 Case 3 : Succinct but not anti- monotone. Continue… Algorithm Direct Idea : Play it safe. Generate C c k+1 by using L c k x F where F is the set of all frequent items. Algorithm MultiJoins Algorithm Reorder

40 Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References

41 Performance Analysis (Specification) Programs written in C Generate transactional databases using program from IBM Almaden Research Center 100,000 records, domain of 1,000 items Page size 4KB SPARC-10 environment

42 Performance Analysis (Terminology) Speedup Comparison of execution time between two algorithms. Item Selectivity x% of them items satisfying the constraints. Support Threshold *Low support threshold means more frequent set to process.

43 Performance Analysis Note: Support threshold set at 0.5%. For 10% selectivity, CAP runs 80 times faster than Apriori † ! For 30% selectivity, the speedup is about 10 times.

44 Performance Analysis Note: Item Selectivity fixed at 30%. Support threshold goes up, frequent item set goes down, Apriori † improves. CAP still at least 8 times faster.

45 Performance Analysis Each entry is of the form a/b a is the # of frequent set satisfying the constraint. B is the total number of frequent set. For L 4 with support of 0.2%, Apriori † finds 1250 frequent sets where 8 of which is found by CAP. Support L1L1 L2L2 L3L3 L4L4 L5L5 L6L6 L7L7 L8L8 0.2%174/58279/96929/11408/12501/9340/4510/1320/20 0.6%98/3131/120/100000

46 Conclusion The idea of anti-monotonicity, succinctness, and convertible are introduced in the paper. Sound, complete, and efficient algorithms are introduced for the constraint based association rule mining.

47 Reference R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. KDD’97. R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. SIGMOD’98. J. Pei and J. Han. Can we push more constraints into frequent pattern mining? KDD’00.