Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.

Slides:



Advertisements
Similar presentations
Association rule mining
Advertisements

Association Rules Mining
Recap: Mining association rules from large datasets
Data Mining Techniques Association Rule
Mining Multiple-level Association Rules in Large Databases
Frequent Closed Pattern Search By Row and Feature Enumeration
LOGO Association Rule Lecturer: Dr. Bo Yuan
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña FP grow algorithm Correlation analysis.
Advanced Topics in Data Mining: Association Rules
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules Mining Part III. Multiple-Level Association Rules Items often form hierarchy. Items at the lower level are expected to have lower support.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Association Rule Mining Instructor Qiang Yang Slides from Jiawei Han and Jian Pei And from Introduction to Data Mining By Tan, Steinbach, Kumar.
Spring 2003Data Mining by H. Liu, ASU1 5. Association Rules Market Basket Analysis and Itemsets APRIORI Efficient Association Rules Multilevel Association.
Data Mining Association Analysis: Basic Concepts and Algorithms
Spring 2005CSE 572, CBS 598 by H. Liu1 5. Association Rules Market Basket Analysis and Itemsets APRIORI Efficient Association Rules Multilevel Association.
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Association Analysis: Basic Concepts and Algorithms.
Mining Association Rules in Large Databases
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University Ch 2 Discovering Association Rules COMP 578 Data Warehousing & Data Mining.
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
1 Association Rules & Correlations zBasic concepts zEfficient and scalable frequent itemset mining methods: yApriori, and improvements yFP-growth zRule.
Mining Frequent Patterns I: Association Rule Discovery Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) Warsaw University of Technology.
Ch5 Mining Frequent Patterns, Associations, and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Information Systems Data Analysis – Association Mining Prof. Les Sztandera.
9/03Data Mining – Association G Dong (WSU) 1 5. Association Rules Market Basket Analysis APRIORI Efficient Mining Post-processing.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining III COMP Seminar GNET 713 BCB Module Spring 2007.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Association Analysis (3)
The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Spring 2009.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
CS685: Special Topics in Data Mining The UNIVERSITY of KENTUCKY Frequent Itemset Mining II Tree-based Algorithm Max Itemsets Closed Itemsets.
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Mining Association Rules
Frequent Pattern Mining
©Jiawei Han and Micheline Kamber
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Market Baskets Frequent Itemsets A-Priori Algorithm
Data Mining Association Analysis: Basic Concepts and Algorithms
Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS
Frequent-Pattern Tree
©Jiawei Han and Micheline Kamber
Association Rule Mining
Association Analysis: Basic Concepts
Presentation transcript:

Association Rule Mining

Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns  Mining various kinds of association/correlation rules

Max-patterns & Close-patterns  If there are frequent patterns with many items, enumerating all of them is costly.  We may be interested in finding the ‘ boundary ’ frequent patterns.  Two types …

Max-patterns  Frequent pattern {a 1, …, a 100 }  ( ) + ( ) + … + ( ) = = 1.27*10 30 frequent sub-patterns!  Max-pattern: frequent patterns without proper frequent super pattern BCDE, ACD are max-patterns BCD is not a max-pattern TidItems 10A,B,C,D,E 20B,C,D,E, 30A,C,D,F Min_sup=2

MaxMiner: Mining Max-patterns  Idea: generate the complete set- enumeration tree one level at a time, while prune if applicable.  (ABCD) A (BCD) B (CD) C (D)D () AB (CD)AC (D)AD () BC (D)BD () CD ()ABC (C) ABCD () ABD ()ACD ()BCD ()

Local Pruning Techniques (e.g. at node A) Check the frequency of ABCD and AB, AC, AD.  If ABCD is frequent, prune the whole sub-tree.  If AC is NOT frequent, remove C from the parenthesis before expanding.  (ABCD) A (BCD) B (CD) C (D)D () AB (CD)AC (D)AD () BC (D)BD () CD ()ABC (C) ABCD () ABD ()ACD ()BCD ()

Algorithm MaxMiner  Initially, generate one node N=, where h(N)= and t(N)={A,B,C,D}.  Consider expanding N, If h(N)t(N) is frequent, do not expand N. If for some it(N), h(N){i} is NOT frequent, remove i from t(N) before expanding N.  Apply global pruning techniques …  (ABCD)

Global Pruning Technique (across sub-trees)  When a max pattern is identified (e.g. ABCD), prune all nodes (e.g. B, C and D) where h(N)t(N) is a sub-set of it (e.g. ABCD).  (ABCD) A (BCD) B (CD) C (D)D () AB (CD)AC (D)AD () BC (D)BD () CD ()ABC (C) ABCD () ABD ()ACD ()BCD ()

Example TidItems 10A,B,C,D,E 20B,C,D,E, 30A,C,D,F  (ABCDEF) ItemsFrequency ABCDEF0 A2 B2 C3 D3 E2 F1 Min_sup=2 Max patterns: A (BCDE) B (CDE)C (DE)E ()D (E)

Example TidItems 10A,B,C,D,E 20B,C,D,E, 30A,C,D,F  (ABCDEF) ItemsFrequency ABCDE1 AB1 AC2 AD2 AE1 Min_sup=2 A (BCDE) B (CDE)C (DE)E ()D (E) AC (D)AD () Max patterns: Node A

Example TidItems 10A,B,C,D,E 20B,C,D,E, 30A,C,D,F  (ABCDEF) ItemsFrequency BCDE2 BC BD BE Min_sup=2 A (BCDE) B (CDE)C (DE)E ()D (E) AC (D)AD () Max patterns: BCDE Node B

Example TidItems 10A,B,C,D,E 20B,C,D,E, 30A,C,D,F  (ABCDEF) ItemsFrequency ACD2 Min_sup=2 A (BCDE) B (CDE)C (DE)E ()D (E) AC (D)AD () Max patterns: BCDE ACD () ACD Node AC

Frequent Closed Patterns  For frequent itemset X, if there exists no item y s.t. every transaction containing X also contains y, then X is a frequent closed pattern “ ab ” is a frequent closed pattern  Concise rep. of freq pats  Reduce # of patterns and rules  N. Pasquier et al. In ICDT ’ 99 TIDItems 10a, b, c 20a, b, c 30a, b, d 40a, b, d 50e, f Min_sup=2

Max Pattern vs. Frequent Closed Pattern  max pattern  closed pattern if itemset X is a max pattern, adding any item to it would not be a frequent pattern; thus there exists no item y s.t. every transaction containing X also contains y.  closed pattern  max pattern “ ab ” is a closed pattern, but not max TIDItems 10a, b, c 20a, b, c 30a, b, d 40a, b, d 50e, f Min_sup=2

Mining Frequent Closed Patterns: CLOSET  Flist: list of all frequent items in support ascending order Flist: d-a-f-e-c  Divide search space Patterns having d Patterns having a but not d, etc.  Find frequent closed pattern recursively Among the transactions having d, cfa is frequent closed  cfad is a frequent closed pattern  J. Pei, J. Han & R. Mao. CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets", DMKD'00. TIDItems 10a, c, d, e, f 20a, b, e 30c, e, f 40a, c, d, f 50c, e, f Min_sup=2

Multiple-Level Association Rules  Items often form hierarchy.  Items at the lower level are expected to have lower support.  Rules regarding itemsets at appropriate levels could be quite useful.  A transactional database can be encoded based on dimensions and levels  We can explore shared multi- level mining Food bread milk skim Garelick 2% fatwhite wheat Wonder....

Mining Multi-Level Associations  A top_down, progressive deepening approach: First find high-level strong rules: milk  bread [20%, 60%]. Then find their lower-level “weaker” rules: 2% fat milk  wheat bread [6%, 50%].  Variations at mining multiple-level association rules. Level-crossed association rules: skim milk  Wonder wheat bread Association rules with multiple, alternative hierarchies: full fat milk  Wonder bread

Multi-level Association: Uniform Support vs. Reduced Support  Uniform Support: the same minimum support for all levels + One minimum support threshold. No need to examine itemsets containing any item whose ancestors do not have minimum support. – Lower level items do not occur as frequently. If support threshold  too high  miss low level associations  too low  generate too many high level associations

Multi-level Association: Uniform Support vs. Reduced Support  Reduced Support: reduced minimum support at lower levels There are 4 search strategies:  Level-by-level independent  Independent search at all levels (no misses)  Level-cross filtering by k-itemset  Prune a k-pattern if the corresponding k-pattern at the upper level is infrequent  Level-cross filtering by single item  Prune an item if its parent node is infrequent  Controlled level-cross filtering by single item  Consider ‘subfrequent’ items that pass a passage threshold

Uniform Support Multi-level mining with uniform support Milk [support = 10%] full fat Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 5% X

Reduced Support Multi-level mining with reduced support full fat Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 3% Milk [support = 10%]

Interestingness Measurements  Objective measures Two popular measurements: ¶ support; and · confidence  Subjective measures A rule (pattern) is interesting if ¶ it is unexpected (surprising to the user); and/or · actionable (the user can do something with it)

Criticism to Support and Confidence  Example 1: Among 5000 students  3000 play basketball  3750 eat cereal  2000 both play basket ball and eat cereal play basketball  eat cereal [40%, 66.7%] is misleading because the overall percentage of students eating cereal is 75% which is higher than 66.7%. play basketball  not eat cereal [20%, 33.3%] is far more accurate, although with lower support and confidence

Criticism to Support and Confidence (Cont.)  Example 2: X and Y: positively correlated, X and Z, negatively related support and confidence of X=>Z dominates  We need a measure of dependent or correlated events  P(B|A)/P(B) is also called the lift of rule A => B

Other Interestingness Measures: Interest  Interest (correlation, lift) taking both P(A) and P(B) in consideration P(AB)=P(B)*P(A), if A and B are independent events A and B negatively correlated, if the value is less than 1; otherwise A and B positively correlated