Efficient Mining of Both Positive and Negative Association Rules Xindong Wu (*), Chengqi Zhang (+), and Shichao Zhang (+) (*) University of Vermont, USA.

Slides:



Advertisements
Similar presentations
Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Advertisements

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Techniques Association Rule
Association rules and frequent itemsets mining
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Frequent Closed Pattern Search By Row and Feature Enumeration
LOGO Association Rule Lecturer: Dr. Bo Yuan
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining both Positive and Negative Association Rules Xindong Wu (*), Chengqi Zhang (+), and Shichao Zhang (+) (*) University of Vermont, USA (+) University.
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Mining Frequent Itemsets from Uncertain Data Presented by Chun-Kit Chui, Ben Kao, Edward Hung Department of Computer Science, The University of Hong Kong.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Learning Fuzzy Association Rules and Associative Classification Rules Jianchao Han Computer Science Department California State University Dominguez Hills.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, D. W. Cheung, B. Kao Department of Computer Science.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
DATA MINING -ASSOCIATION RULES-
ICML-2002Xindong Wu, University of Vermont, USA 1 Mining both Positive and Negative Association Rules Xindong Wu (*), Chengqi Zhang (+), and Shichao Zhang.
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Fast Algorithms for Association Rule Mining
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee David W. Cheung Ben Kao The University of Hong Kong.
Mining Association Rules
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
1 Synthesizing High-Frequency Rules from Different Data Sources Xindong Wu and Shichao Zhang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
Performance and Scalability: Apriori Implementation.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 14 Sequential Experimentation, Screening Designs, Fold-Over Designs.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
2006/12/06Chen Yi-Chun1 Mining Positive and Negative Association Rules: An Approach for Confined Rules Maria-Luiza Antonie, Osmar R. Zaiane PKDD2004.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 On Mining General Temporal Association Rules in a Publication Database Chang-Hung Lee, Cheng-Ru Lin and Ming-Syan Chen, Proceedings of the 2001 IEEE.
Association Rule Mining
Measuring Association Rules Shan “Maggie” Duanmu Project for CSCI 765 Dec 9 th 2002.
CS Data Mining1 Data Mining The Extraction of useful information from data The automated extraction of hidden predictive information from (large)
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) ICS, Polish Academy of Sciences.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
Optimization of Association Rules Extraction Through Exploitation of Context Dependent Constraints Arianna Gallo, Roberto Esposito, Rosa Meo, Marco Botta.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Reducing Number of Candidates
Data Mining: Concepts and Techniques
Association Rules Repoussis Panagiotis.
Frequent Pattern Mining
Association Rules.
Chang-Hung Lee, Jian Chih Ou, and Ming Syan Chen, Proc
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
DIRECT HASHING AND PRUNING (DHP) ALGORITHM
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Amer Zaheer PC Mohammad Ali Jinnah University, Islamabad
Market Basket Analysis and Association Rules
AB AC AD AE AF 5 ways If you used AB, then, there would be 4 remaining ODD vertices (C, D, E and F) CD CE CF 3 ways If you used CD, then, there.
Association Analysis: Basic Concepts
Presentation transcript:

Efficient Mining of Both Positive and Negative Association Rules Xindong Wu (*), Chengqi Zhang (+), and Shichao Zhang (+) (*) University of Vermont, USA (+) University of Technology Sydney, Australia Presenter: Mike Tripp 1

Outline Association Analysis Exceptions Problems Rules Examples Pruning Strategy Frequent Items of Potential Interest Infrequent Items of Potential Interest Procedure AllItemsOfInterest Extracting Positive and Negative Rules CPIR PositiveAndNegativeAssociations Effectiveness and Efficiency Experimental Results Related Work Exam Questions 2

Exceptions of Rules Known as a deviational pattern to a well-known fact, and exhibits unexpectedness. Also known as surprising patterns. Example: While birds(x) => flies(x), exception: bird(x), penguin(x) => −flies(x) Interesting fact: A => B as a valid rule does not imply −B => −A is a valid rule. 3

Key Problems in Negative Association Rule Mining How to effectively search for interesting itemsets. How to effectively identify negative association rules of interest. 4

Association Analysis Generate all large itemsets: All itemsets that have a support greater than or equal to the user specified minimum support are generated. Generate all the rules that have a minimum confidence in the following naive way: For every large itemset X and any B C X, let A = X − B. If the rule A => B has the minimum confidence (or supp(X)/supp(A) ≥ mc ), then it is a valid rule. 5

Negation/Types of Rules The negation of an itemset A is indicated by −A. The support of −A, supp(−A) = 1 − supp(A). In particular, for an itemset i1−i2i3, its support is supp(i1−i2i3 ) = supp(i1i3) − supp (i1i2i3). Positive rule: A => B Negative rules: A => −B −A => B −A => −B 6

Negative Association Rules Still Difficult: exponential growth of infrequent itemsets TD={(A,B,D);(B,C,D);(B,D);(B,C,D,E);(A,B,D,F)} Such a simple database contains 49 infrequent item sets. 7

Define Negative Association Rules Two cases: If both A and B are frequent, A U B is infrequent, is A=>~B a valid rule? If A is frequent, B is infrequent, is A => ~B a valid rule? Maybe, but not of our interest. Heuristic Heuristic: Only if both A and B are frequent, will A => ~B be considered. 8

Negative Association Example Consider supp(c) = 0.6, supp(t) = 0.4, supp(t U c) = 0.05, mc = 0.52 Confidence of t => c is supp(t U c)/supp[t] = 0.05/0.4 = 0.125, which is < mc = 0.52 and, supp(t U c) = 0.05 is low. This indicates that t U c is an infrequent itemset and that t => c cannot be a valid rule. However... Supp(t U −c) = supp[t] – supp[t U c] = 0.4 – 0.05 = 0.35 which is high, and the confidence of t => −c is the ratio supp[t U −c]/supp[t] = 0.35/0.4 = 0.875, which is > mc Therefore, t => −c is a valid rule. 9

Identifying Interesting Itemsets Because of the exponential number of infrequent itemsets in a database, pruning is critical to efficient search for interesting itemsets. 10

Pruning Strategy We want to find interesting itemsets when pruning. Let’s define an interestingness function interest(X, Y) = |supp(X U Y) – supp(X)supp(Y)| and a threshold mi If interest(X, Y) ≥ mi, then the rule X => Y is of potential interest, and X U Y is referred to as potentially interesting itemset Using this approach, we can establish an effective pruning strategy for efficiently identifying all frequent itemsets of potential interest in a database. The pruning strategy ensures we can use an Apriori like algorithm. Generating infrequent k itemsets from frequent k-1 itemsets. 11

Frequent Itemset of Potential Interest Where f() is a constraint function concerning the support, confidence, and interestingness of X => Y 12

Infrequent Itemset of Potential Interest Where g() is a constraint function concerning f() and the support, confidence, and interestingness of X => Y 13

Bringing them together Using the fipi and iipi mechanisms for both positive and negative rule discovery, our search is constrained to seeking interesting rules on certain measures, and pruning is the removal of all uninteresting branches that cannot lead to an interesting rule that would satisfy those constraints. 14

Procedure AllItemsOfInterest Input: D (a database); minsupp; mininterest Output: PL (frequent itemsets); NL (infrequent itemsets) 15

Procedure AllItemsOfInterest 16

Procedure AllItemsOfInterest E.g. run of the algorithm (ms=0.3,mi=0.05) 17 TIDItems bought T1{A,B,D} T2{A,B,C,D} T3 {B,D} T4 {B,C,D,E} T5 {A,E} T6 {B,D,F} T7 {A,E,F} T8 {C,F} T9 {B,C,F} T10 {A,B,C,D,F}

Procedure AllItemsOfInterest Generate frequent and infrequent 2-itemset of interest. When ms = 0.3, L2 = {AB, AD, BC, BD, BF, CD, CF}, N2 = {AC, AE, AF, BE, CE, DE, DF, EF} Use interest measure to prune. 18

Procedure AllItemsOfInterest So AD and CD are not of interest, they are removed from L2. 19

Procedure AllItemsOfInterest So the resulting frequent 2-itemsets are as follows: 20

Procedure AllItemsOfInterest Generate infrequent 2-itemsets useing the iipi measure. Very similar to frequent 2-itemsets. 21

Extracting Positive and Negative Rules Continue like this to get all the itemsets. 22 TIDItems bought T1{A,B,D} T2{A,B,C,D} T3 {B,D} T4 {B,C,D,E} T5 {A,E} T6 {B,D,F} T7 {A,E,F} T8 {C,F} T9 {B,C,F} T10 {A,B,C,D,F} Algorithm iteration Frequent 1-itemsetA,B,C,D,E,F Frequent 2-itemsetAB,BC,BD,BF,CF Infrequent 2-itemsetAC,AE,AF,BE, CE,CF,DE,EF Frequent 3-itemsetBCD Infrequent 3-itemset BCF,BDF

Extracting Positive and Negative Rules Pruning strategy for rule generation: Piatetsky-Shapiro’s argument. If Dependence(X,Y) = 1, X and Y are independent. If Dependence(X,Y) > 1, Y is positively dependent on X. The bigger the ratio (p(Y | X) – p(Y))/(1 – p(Y)), the higher the positive dependence. If Dependence(X,Y) < 1, Y is negatively dependent on X (~Y is positively dependent on X). The bigger the ratio (p(Y | X) – p(Y))/(−p(Y)), the higher the negative dependence. 23

Extracting Both Types of Rules Conditional probability increment ratio. Used to measure the correlation between X and Y. When CPIR(X|Y)=0, X and Y are dependent. When it is 1, they are perfectly correlated. When it is -1, they are perfectly negatively correlated. 24

Extracting Both Types of Rules Because p(~A)=1-p(A), we only need the first half of the previous equation. or This value is used as confidence value. 25

Association Rules of Interest Let I be the set of items in a database D, i = A U B ⊆ I be an itemset, A ∩ B = Ø, supp(A) ≠ 0, supp(B) ≠ 0, and ms, mc and mi > 0 be given by the user. Then, If supp(A U B) ≥ ms, interest(A, B) ≥ mi, and CIPR(B|A) ≥ mc, then A => B is a positive rule of interest If supp(A U −B) ≥ ms, supp(A) ≥ ms, supp(B) ≥ ms, interest(A, −B) ≥mi, and CPIR(−B|A) ≥ mc, then A => −B is a negative rule of interest If supp(−A U B) ≥ ms, supp(A) ≥ ms, supp(B) ≥ ms, interest(−A, B) ≥ mi, and CPIR(B|−A) ≥ mc, then −A => B is a negative rule of interest If supp(−A U −B) ≥ ms, supp(A) ≥ ms, supp(B) ≥ ms, interest(−A, −B) ≥ mi, and CPIR(−B|−A) ≥ mc, then −A => −B is a negative rule of interest 26

Example with CPIR For itemset B U D in PL B => D can be a valid positive rule of interest 27

Extracting rules One snapshot of an iteration in the algorithm The result B => −E is a valid rule. 28

Algorithm Design Generate the set PL of frequent itemset and the set NL of infrequent itemsets Extract positive rules of the form A => B in PL, and negative rules of the forms A => −B, −A => B, −A => −B in NL 29

PositiveAndNegativeAssociations 30

PositiveAndNegativeAssociations 31

Effectiveness and Efficiency Aggregated Test Data: used for KDD Cup 2000 Data and Questions Can be found: Implemented on: Dell Workstation PWS650 w/ 2G of CPU and 2G memory Language: C++ 32

Experimental Results (1) A comparison with Apriori like algorithm without pruning MBP = Mining By Pruning MNP = Mining with No-Pruning 33

Experimental Results (2) A comparison with no-pruning 34

Experimental Results Effectiveness of pruning 35 PII = Positive Items of Interest NII = Negative Items of Interest

Related Work Negative relationships between frequent itemsets, but not how to find negative rules (Brin, Motwani and Silverstein 1997) Strong negative association mining using domain knowledge (Savasere, Ommiecinski and Navathe 1998) 36

Conclusions Negative rules are useful Pruning is essential to find frequent and infrequent itemsets. Pruning is important to find negative association rules. There could be more negative association rules if you have different conditions. 37

Exam Questions What are the three types dependencies when dependence(X, Y) are equal, greater than, and less than 1? If Dependence(X,Y) = 1, X and Y are independent. If Dependence(X,Y) > 1, Y is positively dependent on X. If Dependence(X,Y) < 1, Y is negatively dependent on X (−Y is positively dependent on X). 38

Exam Questions Give an example of a rule exception, or surprising pattern. While birds(x) => flies(x), exception: bird(x), penguin(x) => −flies(x) 39

Exam Questions What does CPIR(X|Y) tell us? When CPIR(X|Y)=0, X and Y are dependent. When it is 1, they are perfectly correlated. When it is -1, they are perfectly negatively correlated. 40