Evaluation of Association Patterns

Slides:



Advertisements
Similar presentations
DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.
Advertisements

Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
MIS2502: Data Analytics Association Rule Mining. Uses What products are bought together? Amazon’s recommendation engine Telephone calling patterns Association.
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Organization “Association Analysis”
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Ex. 11 (pp.409) Given the lattice structure shown in Figure 6.33 and the transactions given in Table 6.24, label each node with the following letter(s):
Association Analysis (4) (Evaluation). Evaluation of Association Patterns Association analysis algorithms have the potential to generate a large number.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram –A tree like diagram that.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis (4b) (Importance of Stratification)
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Fast Algorithms for Association Rule Mining
1 Association Rules & Correlations zBasic concepts zEfficient and scalable frequent itemset mining methods: yApriori, and improvements yFP-growth zRule.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Further advanced methods Chapter 17.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Eick, Tan, Steinbach, Kumar: Association Analysis Part1 Organization “Association Analysis” 1. What is Association Analysis? 2. Association Rules 3. The.
1 Associative Classification of Imbalanced Datasets Sanjay Chawla School of IT University of Sydney.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Eick, Tan, Steinbach, Kumar: Association Analysis Part1 Organization “Association Analysis” 1. What is Association Analysis? 2. Association Rules 3. The.
Supermarket shelf management – Market-basket model:  Goal: Identify items that are bought together by sufficiently many customers  Approach: Process.
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
Data Mining: Association Rule By: Thanh Truong. Association Rules In Association Rules, we look at the associations between different items to draw conclusions.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
Association Rule Mining
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
Elsayed Hemayed Data Mining Course
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 (4) Introduction to Data Mining by Tan, Steinbach, Kumar ©
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
MIS2502: Data Analytics Association Rule Mining David Schuff
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
MIS2502: Data Analytics Association Rule Mining Jeremy Shafer
Stats 202: Statistical Aspects of Data Mining Professor Rajan Patel
By Arijit Chatterjee Dr
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Market Basket Analysis and Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
CSE4334/5334 Data Mining Lecture 15: Association Rule Mining (2)
MIS2502: Data Analytics Association Rule Mining
Market Basket Analysis and Association Rules
Association Analysis: Basic Concepts
Presentation transcript:

Evaluation of Association Patterns

Evaluation of Association Patterns Association analysis algorithms have the potential to generate a large number of patterns. In real commercial databases we could easily end up with thousands or even millions of patterns, many of which might not be interesting. Very important to establish a set of well­accepted criteria for evaluating the quality of association patterns. First set of criteria can be established through statistical arguments. Second set of criteria can be established through subjective arguments.

Subjective Arguments A pattern is considered subjectively uninteresting unless it reveals unexpected information about the data. E.g., the rule {Butter}  {Bread} isn’t interesting, despite having high support and confidence values. On the other hand, the rule {Diapers}  {Beer} is interesting because the relationship is quite unexpected and may suggest a new cross­selling opportunity for retailers. Drawback: Incorporating subjective knowledge into pattern evaluation is a difficult task because it requires a considerable amount of prior information from the domain experts.

Computing Interestingness Measures Given a rule X  Y, the information needed to compute rule interestingness can be obtained from a contingency table Contingency table for X  Y Y X f11 f10 f1+ f01 f00 f0+ f+1 f+0 |T| f11: support of X and Y f10: support of X and Y f01: support of X and Y f00: support of X and Y Used to define various measures

Pitfall of Confidence Coffee Tea 150 50 200 750 900 1100 Coffee Tea The pitfall of confidence can be traced to the fact that the measure ignores the support of the itemset in the rule consequent. Coffee Coffee Tea 150 50 200 Tea 750 900 1100 Consider association rule: Tea  Coffee Confidence= P(Coffee,Tea)/P(Tea) = P(Coffee|Tea) = 150/200 = 0.75 (seems quite high) But, P(Coffee) = 0.9 Thus knowing that a person is a tea drinker actually decreases his/her probability of being a coffee drinker from 90% to 75%! Although confidence is high, rule is misleading In fact P(Coffee|Tea) = P(Coffee, Tea)/P(Tea) = 750/900 = 0.83

Statistical Independence Population of 1000 students 600 students know how to swim (S) 700 students know how to bike (B) 420 students know how to swim and bike (S,B) P(S|B) = P(S) ( P(SB)/P(B) = .42 / .7 = .6 = P(S) ) P(SB)/P(B) = P(S) P(SB) = P(S)  P(B) => Statistical independence P(SB) > P(S)  P(B) => Positively correlated i.e. if someone knows how to swim, then it is more probable he knows how to bike, and vice versa P(SB) < P(S)  P(B) => Negatively correlated i.e. if someone knows how to swim, then it is less probable he/she knows how to bike, and vice versa

Interest Factor Measure that takes into account statistical dependence Interest factor compares the frequency of a pattern against a baseline frequency computed under the statistical independence assumption. The baseline frequency for a pair of mutually independent variables is: Or equivalently

Interest Equation Fraction f11/N is an estimate for the joint probability P(A,B), while f1+ /N and f+1 /N are the estimates for P(A) and P(B), respectively. If A and B are statistically independent, then P(AB)=P(A)×P(B), thus the Interest is 1.

Example: Interest Coffee Tea 150 50 200 750 900 1100 Coffee Tea Association Rule: Tea  Coffee Interest = 150*1100 / (200*900)= 0.92 (< 1, therefore they are negatively correlated)

Simpson’s Paradox

Some other example… Confidence of rule 1 = 99/180 = 55% What’s the confidence of the following rules: (rule 1) {HDTV=Yes}  {Exercise machine = Yes} (rule 2) {HDTV=No}  {Exercise machine = Yes} ? Confidence of rule 1 = 99/180 = 55% Confidence of rule 2 = 54/120 = 45% So, “Customers who buy high-definition televisions are more likely to buy exercise machines that those who don’t buy high-definition televisions.” Right? Well, maybe not…

Stratification: Simpson paradox Consider this more detailed table: What’s the confidence of the rules for each strata: (rule 1) {HDTV=Yes}  {Exercise machine = Yes} (rule 2) {HDTV=No}  {Exercise machine = Yes} ? College students: Confidence of rule 1 = 1/10 = 10% Confidence of rule 2 = 4/34 = 11.8% Working Adults: Confidence of rule 1 = 98/170 = 57.7% Confidence of rule 2 = 50/86 = 58.1% The rules suggest that, for each group, customers who don’t buy HDTV are more likely to buy exercise machines, which contradict the previous conclusion when data from the two customer groups are pooled together.

Importance of Stratification The lesson here is that proper stratification is needed to avoid generating spurious patterns resulting from Simpson's paradox. For example Market basket data from a major supermarket chain should be stratified according to store locations, while Medical records from various patients should be stratified according to confounding factors such as age and gender.

Effect of Support Distribution Many real data sets have skewed support distribution where most of the items have relatively low to moderate frequencies, but a small number of them have very high frequencies.

Skewed distribution Tricky to choose the right support threshold for mining such data sets. If we set the threshold too high (e.g., 20%), then we may miss many interesting patterns involving the low support items from G1. Such low support items may correspond to expensive products (such as jewelry) that are seldom bought by customers, but whose patterns are still interesting to retailers. Conversely, when the threshold is set too low, there is the risk of generating spurious patterns that relate a high­frequency item such as milk to a low­frequency item such as caviar.

Cross­support patterns Cross-support patterns are those that relate a high­frequency item such as milk to a low­frequency item such as caviar. Likely to be spurious because their correlations tend to be weak. E.g. the confidence of {caviar}{milk} is likely to be high, but still the pattern is spurious, since there isn’t probably any correlation between caviar and milk. However, we don’t want to use the Interest Factor during the computation of frequent itemsets because it doesn’t have the antimonotone property. Interest factor is rather used as a post-processing step. So, we want to detect cross-support pattern by looking at some antimonotone property.

Cross­support patterns Definition A cross­support pattern is an itemset X = {i1, i2 ,…, ik} whose support ratio is less than a user­specified threshold hc. Example Suppose the support for milk is 70%, while the support for sugar is 10% and caviar is 0.04% Given hc = 0.01, the frequent itemset {milk, sugar, caviar} is a cross­support pattern because its support ratio is r = min {0.7, 0.1, 0.0004} / max {0.7, 0.1, 0.0004} = 0.0004 / 0.7 = 0.00058 < 0.01

Detecting cross­support patterns E.g. assuming that hc = 0.3, the itemsets {p,q}, {p,r}, and {p,q,r} are cross­support patterns. Because their support ratios, being equal to 0.2, are less than threshold hc. We can apply a high support threshold, say, 20%, to eliminate the cross­support patterns…but, this may come at the expense of discarding other interesting patterns such as the strongly correlated itemset {q,r} that has support equal to 16.7%.

Detecting cross­support patterns Confidence pruning also doesn’t help. Confidence for {q}{p} is 80% even though {p, q} is a cross­support pattern. Meanwhile, rule {q} {r} also has high confidence even though {q, r} is not a cross­support pattern. These demonstrate the difficulty of using the confidence measure to distinguish between rules extracted from cross­support and non­cross­support patterns.

Lowest confidence rule Notice that the rule {p}{q} has very low confidence because most of the transactions that contain p do not contain q. This observation suggests that: Cross­support patterns can be detected by examining the lowest confidence rule that can be extracted from a given itemset.

Finding lowest confidence Recall the anti­monotone property of confidence: conf( i1 ,i2i3,i4,…,ik )  conf( i1 ,i2 , i3i4,…,ik ) This property suggests that confidence never increases as we shift more items from the left­ to the right­hand side of an association rule. Hence, the lowest confidence rule that can be extracted from a frequent itemset contains only one item on its left­hand side.

Finding lowest confidence Given a frequent itemset {i1,i2,i3,i4,…,ik}, the rule ij i1 ,i2 , i3, ij-1, ij+1, i4,…,ik has the lowest confidence if ? s(ij) = max {s(i1), s(i2),…,s(ik)} Follows directly from the definition of confidence as the ratio between the rule's support and the support of the rule antecedent.

Finding lowest confidence Summarizing, the lowest confidence attainable from a frequent itemset {i1,i2,i3,i4,…,ik}, is This is also known as the h-confidence measure or all-confidence measure.

h­confidence Clearly, cross­support patterns can be eliminated by ensuring that the h­confidence values for the patterns exceed some threshold hc. Observe that the measure is also anti­monotone, i.e., h­confidence({i1,i2,…, ik})  h­confidence({i1,i2,…, ik+1 }) and thus can be incorporated directly into the mining algorithm.