Mining Quantitative Association Rules in Large Relational Tables ACM SIGMOD Conference 1996 Authors: R. Srikant, and R. Agrawal Presented by: Sasi Sekhar.

Slides:



Advertisements
Similar presentations
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Advertisements

Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Frequent Closed Pattern Search By Row and Feature Enumeration
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Privacy Preserving Association Rule Mining in Vertically Partitioned Data Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Association Mining Data Mining Spring Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk} Transactional Database.
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant.
Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
Association Analysis: Basic Concepts and Algorithms.
1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.
Fast Algorithms for Mining Association Rules * CS401 Final Presentation Presented by Lin Yang University of Missouri-Rolla * Rakesh Agrawal, Ramakrishnam.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Fast Algorithms for Association Rule Mining
Data Mining Association Rules: Advanced Concepts and Algorithms
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee David W. Cheung Ben Kao The University of Hong Kong.
Mining Association Rules
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.
Data Mining Association Rules: Advanced Concepts and Algorithms
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
1 Mining Association Rules Mohamed G. Elfeky. 2 Introduction Data mining is the discovery of knowledge and useful information from the large amounts of.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003.
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Fast Algorithms For Mining Association Rules By Rakesh Agrawal and R. Srikant Presented By: Chirayu Modi.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Data Mining Association Rules: Advanced Concepts and Algorithms
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
Association rule mining Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf). Assume all data.
Association rule mining Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf). Assume all data.
1 1 MSCIT 5210: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Tan, Steinbach, Kumar.
Association Rule Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Discovering Calendar-based Temporal Association Rules SHOU Yu Tao May. 21 st, 2003 TIME 01, 8th International Symposium on Temporal Representation and.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
Classification - CBA CS 485: Special Topics in Data Mining Jinze Liu.
Mining Sequential Patterns
DIRECT HASHING AND PRUNING (DHP) ALGORITHM
Association Rule Mining
Mining Sequential Patterns
Association Analysis: Basic Concepts
Presentation transcript:

Mining Quantitative Association Rules in Large Relational Tables ACM SIGMOD Conference 1996 Authors: R. Srikant, and R. Agrawal Presented by: Sasi Sekhar Kunta March 23, 2008 (slides modified slightly from Biyu Liang’s version)

2 Outline  Review of Association Analysis  Introducing Quantitative AR Problem  Partitioning Quantitative Attributes  Identifying the Interesting Rules  Extending the Apriori Algorithm  Conclusions

3 Association Rule  Item sets X and Y  Rule X => Y  Support=P(X And Y)  Confidence = P(Y|X) = P(X And Y)/P(X)  Find rules that have MinSup and MinConf

4 Boolean Association Rules TIDItem1Item2Item3Item  Attribute has a value of “1” if the transaction contains the corresponding item; “0” otherwise.

5 Outline  Review of Association Analysis  Introducing Quantitative AR Problem  Partitioning Quantitative Attributes  Identifying the Interesting Rules  Extending the Apriori Algorithm  Conclusions

6 Quantitative Association Rules  and =>  Support = 40%, Conf = 100% RecordIDAgeMarriedNumCars 10023No Yes No Yes Yes2

7 Mapping to Boolean Association Rules Problem  Using as new attribute, which has only boolean values Record ID Age: Age: Married: Yes Married: No NumCars: 0 NumCars:

8 Problems with Direct Mapping  MinSup: If number of intervals is large, the support of a single interval can be lower  MinConf: Information lost during partition values into intervals. Confidence can be lower as number of intervals is smaller

9 The Tradeoff  Increase the number of intervals (to reduce information lost) while combining adjacent ones (to increase support)  ExecTime blows up as items per record increases  ManyRules: Number of rules also blows up. Many of them will not be interesting

10 The Proposed Approach  Partition quantitative attribute values and combining adjacent partitions as necessary  Partial Completeness Measure for deciding the partitions  Interest Measure (pruning) to address the “ManyRules” problem  Extend the Apriori Algorithm

11 5 Steps of the Proposed Approach 1.Determine the number of partitions for each quantitative attribute 2.Map values/ranges to consecutive integer values such that the order is preserved 3.Find the support of each value of the attributes, and combine when support is less than MaxSup. Find frequent itemsets, whose support is larger than MinSup 4.Use frequent set to generate association rules 5.Pruning out uninteresting rules

12 5 Steps of the Proposed Approach 1.Determine the number of partitions for each quantitative attribute 2.Map values/ranges to consecutive integer values such that the order is preserved 3.Find the support of each value of the attributes, and combine when support is less than MaxSup. Find frequent itemsets, whose support is larger than MinSup 4.Use frequent set to generate association rules 5.Pruning out uninteresting rules

13 Outline  Review of Association Analysis  Introducing Quantitative AR Problem  Partitioning Quantitative Attributes  Identifying the Interesting Rules  Extending the Apriori Algorithm  Conclusions

14 Partial Completeness  R : rules obtained before partition  R’: rules obtained after partition  Partial Completeness measures the maximum distance between a rule in R and its closest generalization in R’  is a generalization of itemset X : if  The distance is defined by the ratio of support

15 K-Complete  C : the set of frequent itemsets  For any K ≥ 1, P is K-complete w.r.t C if: P C For any itemset X (or its subset) in C, there exists a generalization whose support is no more than K times that of X (or its subset)  The smaller K is, the less the information lost

16 Theoretical Results  Lemma 1: If P is K-complete set w.r.t C, then any rule R obtained from C has a generalization R’ from P, such that conf(R’) is bounded by [conf(R)/K, K*conf(R)]  For given partial completeness level K, equi-depth partitioning satisfies the completeness level with minimum number of intervals: 2n/[m(K-1)], and MaxSup for each interval is m(K-1)/(2n)

17 Outline  Review of Association Analysis  Introducing Quantitative AR Problem  Partitioning Quantitative Attributes  Identifying the Interesting Rules  Extending the Apriori Algorithm  Conclusions

18 Example of Uninteresting Rule  Suppose a quarter of people in age group are in the age group =>, with 8% sup, 70% conf =>, with 2% sup, 70% conf  The second rule doesn’t give any additional information, and is less general than the first rule

19 Expected Values Based on Generalization  Itemset Z = {, …, }  The expected support of Z based on the support of its generalization is defined as

20 Expected Values Based on Generalization  The expected confidence of the rule X => Y based on the confidence of its generalization is defined as

21 Interest Measure  Itemset X is R-interesting w.r.t its generalization if The support of X is no less than R times the expected supports based on, and For any specialization X' of, X – X' is R- interesting w.r.t  Rule X => Y is R-interesting w.r.t its generalization if the support or confidence is R times that of, and the itemset is R-interesting w.r.t

22 Outline  Review of Association Analysis  Introducing Quantitative AR Problem  Partitioning Quantitative Attributes  Identifying the Interesting Rules  Extending the Apriori Algorithm  Conclusions

23 Candidate Generation  Given the set L k-1 of all frequent (k- 1)-itemset, generate the set of L k  The process has three parts: Join Phase Subset Prune Phase Interest Prune Phase

24 Join Phase  L k-1 joined with itself  Join condition: k-2 items are the same, the remaining ones have different attribute  Example, L 2 : { }  Result of self-join, C 3 : { }

25 Subset Prune Phase  Make sure any (k-1)-subset is in L k-1  Example, L 2 : { }  Result of self-join, C 3 : { }  Delete the first itemset in C 3 since is not in L 2

26 Interest Prune Phase  Given user-specified interest level R  Delete any itemset that contains a item with support greater than 1/R  Lemma 5 guarantees that such itemsets cannot be R-interesting w.r.t to their generalizations

27 Outline  Review of Association Analysis  Introducing Quantitative AR Problem  Partitioning Quantitative Attributes  Identifying the Interesting Rules  Extending the Apriori Algorithm  Conclusions

28 Conclusions  This paper introduced the problem of mining quantitative association rules in large relational tables  It dealt with quantitative attributes by fine- partitioning the values and combining adjacent partitions as necessary  Partial completeness quantifies the info lost, and help decide the partitions  Interest measure to identify interesting rules

Thanks! Question?

30 Final Exam Questions  What is Partial Completeness? (p.14-15)  Determine a number of intervals, where there 3 quantitative attributes,.70 min support and a 1.5 partial completeness level? (p.16)  If Intervals are too large, rules may not have MinConf, and if they are too small, rules may not have MinSupp, how Do you go about solving this catch 22 problem? (p.8-9)