Download presentation
Presentation is loading. Please wait.
Published byMegan Lee Modified over 9 years ago
1
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot
2
2 Frequent Pattern Mining - Basic Concepts Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set Finding frequent associations or correlations among sets of items or objects in transaction databases, relational databases, and other information repositories Let I={i 1,i 2,…i m } be a set of items, and let D be a set of database of transactions, where each transaction T is a list of items (purchased by a customer in a visit). An association rule is an implication of the form A → B, where A and B are subsets of I, and A∩B= Ø Customer buys A (Computer) Customer buys both Customer buys B (Software)
3
3 Association Mining-Basic Concepts (contd…) Find all the rules A → B with minimum confidence and support support, s, probability that a transaction contains both A and B confidence, c, conditional probability that a transaction having A also contains B Rules satisfying a minimum support threshold and a minimum confidence threshold are called strong A set of items is referred to as an itemset. An itemset containing k items is a k-itemset. The occurrence frequency of an itemset is the number of transactions that contain the itemset (frequency, support count or count) An itemset satisfying minimum support (count) is a frequent itemset commonly denoted by L k
4
4 Association Mining-Basic Concepts (contd…) Association rule mining is a two step process Find all frequent itemsets Generate strong association rules from frequent itemsets Performance determined by first step
5
5 Association Rule Mining: A Road Map Based on the completeness of mined patterns Complete set of frequent itemsets, constrained frequent itemsets Based on levels of abstraction Single level vs. multiple-level analysis age(x, “30..39”) ® buys(x, “computer”) age(x, “30..39”) ® buys(x, “laptop”) Based on number of data dimensions Single dimension vs. multiple dimensional associations Based on the types of values handled Boolean vs. quantitative associations buys(x, “SQLServer”) ^ buys(x, “DMBook”) ® buys(x, “DBMiner”) [0.2%, 60%] age(x, “30..39”) ^ income(x, “42..48K”) ® buys(x, “PC”) [1%, 75%] Based on kinds of rules to be mined Association rules, correlation rules Based on the kinds of patterns to be mined Frequent itemset mining, sequential pattern mining, structured patterns mining
6
6 Mining Association Rules—An Example Min. support 50% Min. confidence 50%
7
7 The Apriori Algorithm Method: Initially, scan DB once to get frequent 1-itemset Generate length (k+1) candidate itemsets from length k frequent itemsets Test the candidates against DB Terminate when no frequent or candidate set can be generated Use the frequent itemsets to generate association rules. The Apriori principle: All nonempty subsets of a frequent itemset must be frequent
8
8 The Apriori Algorithm — Example Database D Scan D C1C1 L1L1 L2L2 C2C2 C2C2 C3C3 L3L3
9
9 The Apriori Algorithm Pseudo-code: C k : Candidate itemset of size k L k : frequent itemset of size k L 1 = {frequent items}; for (k = 1; L k != ; k++) do begin C k+1 = candidates generated from L k ; for each transaction t in database do increment the count of all candidates in C k+1 that are contained in t L k+1 = candidates in C k+1 with min_support end return k L k ;
10
10 Important Details of Apriori How to generate candidates? Step 1: self-joining L k Step 2: pruning How to count supports of candidates? Example of Candidate-generation L 3 ={abc, abd, acd, ace, bcd} Self-joining: L 3 *L 3 abcd from abc and abd acde from acd and ace Pruning: acde is removed because ade is not in L 3 C 4 ={abcd}
11
11 How to Generate Candidates? Suppose the items in L k-1 are listed in an order Step 1: self-joining L k-1 insert into C k select p.item 1, p.item 2, …, p.item k-1, q.item k-1 from L k-1 p, L k-1 q where p.item 1 =q.item 1, …, p.item k-2 =q.item k-2, p.item k-1 < q.item k-1 Step 2: pruning forall itemsets c in C k do forall (k-1)-subsets s of c do if (s is not in L k-1 ) then delete c from C k
12
12 Example – Transaction DB
13
13 Adapted from slides by Han and Kamber http://www-faculty.cs.uiuc.edu/~hanj/bk2/ Example – Finding Frequent Patterns (1)
14
14 Example – Finding Frequent Patterns (2)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.