Effect of Support Distribution l Many real data sets have skewed support distribution Support distribution of a retail data set.

Slides:



Advertisements
Similar presentations
Association rule mining
Advertisements

Association Rules Mining
Data Mining Techniques Association Rule
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Organization “Association Analysis”
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Minqi Zhou Minqi Zhou Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules Mining Part III. Multiple-Level Association Rules Items often form hierarchy. Items at the lower level are expected to have lower support.
Association Analysis (4) (Evaluation). Evaluation of Association Patterns Association analysis algorithms have the potential to generate a large number.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Evaluation of Association Patterns
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Warehousing/Mining 1 Data Warehousing/Mining Comp 150 DW Chapter 6: Mining Association Rules in Large Databases Instructor: Dan Hebert.
Mining Association Rules in Large Databases
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Frequent Pattern and Association Analysis (baseado nos slides do livro: Data Mining: C & T)
Fast Algorithms for Association Rule Mining
1 Association Rules & Correlations zBasic concepts zEfficient and scalable frequent itemset mining methods: yApriori, and improvements yFP-growth zRule.
Association Analysis: Basic Concepts and Algorithms
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Data Mining Association Analysis: Basic Concepts and Algorithms
Eick, Tan, Steinbach, Kumar: Association Analysis Part1 Organization “Association Analysis” 1. What is Association Analysis? 2. Association Rules 3. The.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Minqi Zhou © Tan,Steinbach,
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
3.4 improving the Efficiency of Apriori A hash-based technique can be uesd to reduce the size of the candidate k- itermsets,Ck,for k>1. For example,when.
Eick, Tan, Steinbach, Kumar: Association Analysis Part1 Organization “Association Analysis” 1. What is Association Analysis? 2. Association Rules 3. The.
Supermarket shelf management – Market-basket model:  Goal: Identify items that are bought together by sufficiently many customers  Approach: Process.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining III COMP Seminar GNET 713 BCB Module Spring 2007.
November 3, 2015Data Mining: Concepts and Techniques1 Chapter 5: Mining Frequent Patterns, Association and Correlations Basic concepts and a road map Efficient.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
UNIT-5 Mining Association Rules in Large Databases LectureTopic ********************************************** Lecture-27Association rule mining Lecture-28Mining.
1. Basic Association Analysis (IDM ch. 6) 1. Review 2. Maximal and Closed Itemsets 3. Rule Generation 4. Kuis 2. Support Vector Machines / SVM (IDM ch.
1. UTS 2. Basic Association Analysis (IDM ch. 6) 3. Practical: 1. Project Proposal 2. Association Rules Mining (DMBAR ch. 16) 1. online radio 2. predicting.
Mining Frequent Patterns, Association, and Correlations (cont.) Pertemuan 06 Matakuliah: M0614 / Data Mining & OLAP Tahun : Feb
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 (4) Introduction to Data Mining by Tan, Steinbach, Kumar ©
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules
©Jiawei Han and Micheline Kamber
Find Patterns Having P From P-conditional Database
CSE4334/5334 Data Mining Lecture 15: Association Rule Mining (2)
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
©Jiawei Han and Micheline Kamber
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Presentation transcript:

Effect of Support Distribution l Many real data sets have skewed support distribution Support distribution of a retail data set

Effect of Support Distribution l How to set the appropriate minsup threshold? –If minsup is set too high, we could miss itemsets involving interesting rare items (e.g., expensive products) –If minsup is set too low, it is computationally expensive and the number of itemsets is very large l Using a single minimum support threshold may not be effective

Multiple Minimum Support l How to apply multiple minimum supports? –MS(i): minimum support for item i –e.g.: MS(Milk)=5%, MS(Coke) = 3%, MS(Broccoli)=0.1%, MS(Salmon)=0.5% –MS({Milk, Broccoli}) = min (MS(Milk), MS(Broccoli)) = 0.1% –Challenge: Support is no longer anti-monotone  Suppose: Support(Milk, Coke) = 1.5% and Support(Milk, Coke, Broccoli) = 0.5%  {Milk,Coke} is infrequent but {Milk,Coke,Broccoli} is frequent

Multiple Minimum Support

Multiple Minimum Support (Liu 1999) l Order the items according to their minimum support (in ascending order) –e.g.: MS(Milk)=5%, MS(Coke) = 3%, MS(Broccoli)=0.1%, MS(Salmon)=0.5% –Ordering: Broccoli, Salmon, Coke, Milk l Need to modify Apriori such that: –L 1 : set of frequent items –F 1 : set of items whose support is  MS(1) where MS(1) is min i ( MS(i) ) –C 2 : candidate itemsets of size 2 is generated from F 1 instead of L 1

Multiple Minimum Support (Liu 1999) l Modifications to Apriori: –In traditional Apriori,  A candidate (k+1)-itemset is generated by merging two frequent itemsets of size k  The candidate is pruned if it contains any infrequent subsets of size k –Pruning step has to be modified:  Prune only if subset contains the first item  e.g.: Candidate={Broccoli, Coke, Milk} (ordered according to minimum support)  {Broccoli, Coke} and {Broccoli, Milk} are frequent but {Coke, Milk} is infrequent – Candidate is not pruned because {Coke,Milk} does not contain the first item, i.e., Broccoli.

Mining Various Kinds of Association Rules l Mining multilevel association l Miming multidimensional association l Mining quantitative association l Mining interesting correlation patterns

Mining Multiple-Level Association Rules l Items often form hierarchies l Flexible support settings –Items at the lower level are expected to have lower support l Exploration of shared multi-level mining (Agrawal & Han & uniform support Milk [support = 10%] 2% Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 5% Level 1 min_sup = 5% Level 2 min_sup = 3% reduced support

Multi-level Association: Redundancy Filtering l Some rules may be redundant due to “ancestor” relationships between items. l Example –milk  wheat bread [support = 8%, confidence = 70%] –2% milk  wheat bread [support = 2%, confidence = 72%] l We say the first rule is an ancestor of the second rule. l A rule is redundant if its support is close to the “expected” value, based on the rule’s ancestor.

Mining Multi-Dimensional Association l Single-dimensional rules: buys(X, “milk”)  buys(X, “bread”) l Multi-dimensional rules:  2 dimensions or predicates –Inter-dimension assoc. rules (no repeated predicates) age(X,”19-25”)  occupation(X,“student”)  buys(X, “coke”) –hybrid-dimension assoc. rules (repeated predicates) age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”) Categorical Attributes: finite number of possible values, no ordering among values — data cube approach Quantitative Attributes: numeric, implicit ordering among values — discretization, clustering, and gradient approaches

Mining Quantitative Associations l Techniques can be categorized by how numerical attributes, such as age or salary are treated 1.Static discretization based on predefined concept hierarchies (data cube methods) 2.Dynamic discretization based on data distribution (quantitative rules, e.g., Agrawal & 3.Clustering: Distance-based association (e.g., Yang & –one dimensional clustering then association 4.Deviation: (such as Aumann and Sex = female => Wage: mean=$7/hr (overall mean = $9)

Static Discretization of Quantitative Attributes l Discretized prior to mining using concept hierarchy. l Numeric values are replaced by ranges. l In relational database, finding all frequent k-predicate sets will require k or k+1 table scans. l Data cube is well suited for mining. l The cells of an n-dimensional cuboid correspond to the predicate sets. l Mining from data cubes can be much faster. (income)(age) () (buys) (age, income)(age,buys)(income,buys) (age,income,buys)

Quantitative Association Rules age(X,”34-35”)  income(X,”30-50K”)  buys(X,”high resolution TV”) l Proposed by Lent, Swami and Widom ICDE’97 l Numeric attributes are dynamically discretized –Such that the confidence or compactness of the rules mined is maximized l 2-D quantitative association rules: A quan1  A quan2  A cat l Cluster adjacent association rules to form general rules using a 2-D grid l Example

Mining Other Interesting Patterns l Flexible support constraints (Wang et VLDB’02) –Some items (e.g., diamond) may occur rarely but are valuable –Customized sup min specification and application l Top-K closed frequent patterns (Han, et ICDM’02) –Hard to specify sup min, but top-k with length min is more desirable –Dynamically raise sup min in FP-tree construction and mining, and select most promising path to mine

Pattern Evaluation l Association rule algorithms tend to produce too many rules –many of them are uninteresting or redundant –Redundant if {A,B,C}  {D} and {A,B}  {D} have same support & confidence l Interestingness measures can be used to prune/rank the derived patterns l In the original formulation of association rules, support & confidence are the only measures used

Application of Interestingness Measure Interestingness Measures

Computing Interestingness Measure l Given a rule X  Y, information needed to compute rule interestingness can be obtained from a contingency table YY Xf 11 f 10 f 1+ Xf 01 f 00 f o+ f +1 f +0 |T| Contingency table for X  Y f 11 : support of X and Y f 10 : support of X and Y f 01 : support of X and Y f 00 : support of X and Y Used to define various measures u support, confidence, lift, Gini, J-measure, etc.

Drawback of Confidence Coffee Tea15520 Tea Association Rule: Tea  Coffee Confidence= P(Coffee|Tea) = 0.75 but P(Coffee) = 0.9  Although confidence is high, rule is misleading  P(Coffee|Tea) =

Statistical Independence l Population of 1000 students –600 students know how to swim (S) –700 students know how to bike (B) –420 students know how to swim and bike (S,B) –P(S  B) = 420/1000 = 0.42 –P(S)  P(B) = 0.6  0.7 = 0.42 –P(S  B) = P(S)  P(B) => Statistical independence –P(S  B) > P(S)  P(B) => Positively correlated –P(S  B) Negatively correlated

Statistical-based Measures l Measures that take into account statistical dependence

Example: Lift/Interest Coffee Tea15520 Tea Association Rule: Tea  Coffee Confidence= P(Coffee|Tea) = 0.75 but P(Coffee) = 0.9  Lift = 0.75/0.9= (< 1, therefore is negatively associated)

Drawback of Lift & Interest YY X100 X YY X900 X Statistical independence: If P(X,Y)=P(X)P(Y) => Lift = 1

Interestingness Measure: Correlations (Lift) l play basketball  eat cereal [40%, 66.7%] is misleading –The overall % of students eating cereal is 75% > 66.7%. l play basketball  not eat cereal [20%, 33.3%] is more accurate, although with lower support and confidence l Measure of dependent/correlated events: lift BasketballNot basketballSum (row) Cereal Not cereal Sum(col.)

Are lift and  2 Good Measures of Correlation? l “Buy walnuts  buy milk [1%, 80%]” is misleading –if 85% of customers buy milk l Support and confidence are not good to represent correlations l So many interestingness measures? (Tan, Kumar, MilkNo MilkSum (row) Coffeem, c~m, cc No Coffeem, ~c~m, ~c~c Sum(col.)m~m  DBm, c~m, cm~c~m~cliftall-confcoh 22 A , A , A , A

Which Measures Should Be Used? l lift and  2 are not good measures for correlations in large transactional DBs l all-conf or coherence could be good measures l Both all-conf and coherence have the downward closure property l Efficient algorithms can be derived for mining (Lee et

There are lots of measures proposed in the literature Some measures are good for certain applications, but not for others What criteria should we use to determine whether a measure is good or bad? What about Apriori- style support based pruning? How does it affect these measures?

Support-based Pruning l Most of the association rule mining algorithms use support measure to prune rules and itemsets l Study effect of support pruning on correlation of itemsets –Generate random contingency tables –Compute support and pairwise correlation for each table –Apply support-based pruning and examine the tables that are removed

Effect of Support-based Pruning

Support-based pruning eliminates mostly negatively correlated itemsets

Effect of Support-based Pruning l Investigate how support-based pruning affects other measures l Steps: –Generate contingency tables –Rank each table according to the different measures –Compute the pair-wise correlation between the measures

Effect of Support-based Pruning u Without Support Pruning (All Pairs) u Red cells indicate correlation between the pair of measures > 0.85 u 40.14% pairs have correlation > 0.85 Scatter Plot between Correlation & Jaccard Measure

Effect of Support-based Pruning u 0.5%  support  50% u 61.45% pairs have correlation > 0.85 Scatter Plot between Correlation & Jaccard Measure:

Effect of Support-based Pruning u 0.5%  support  30% u 76.42% pairs have correlation > 0.85 Scatter Plot between Correlation & Jaccard Measure

Subjective Interestingness Measure l Objective measure: –Rank patterns based on statistics computed from data –e.g., 21 measures of association (support, confidence, Laplace, Gini, mutual information, Jaccard, etc). l Subjective measure: –Rank patterns according to user’s interpretation  A pattern is subjectively interesting if it contradicts the expectation of a user (Silberschatz & Tuzhilin)  A pattern is subjectively interesting if it is actionable (Silberschatz & Tuzhilin)

Interestingness via Unexpectedness l Need to model expectation of users (domain knowledge) l Need to combine expectation of users with evidence from data (i.e., extracted patterns) + Pattern expected to be frequent - Pattern expected to be infrequent Pattern found to be frequent Pattern found to be infrequent + - Expected Patterns - + Unexpected Patterns

Interestingness via Unexpectedness l Web Data (Cooley et al 2001) –Domain knowledge in the form of site structure –Given an itemset F = {X 1, X 2, …, X k } (X i : Web pages)  L: number of links connecting the pages  lfactor = L / (k  k-1)  cfactor = 1 (if graph is connected), 0 (disconnected graph) –Structure evidence = cfactor  lfactor –Usage evidence –Use Dempster-Shafer theory to combine domain knowledge and evidence from data