Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.

Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot

2 Frequent Pattern Mining - Basic Concepts  Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set  Finding frequent associations or correlations among sets of items or objects in transaction databases, relational databases, and other information repositories  Let I={i 1,i 2,…i m } be a set of items, and let D be a set of database of transactions, where each transaction T is a list of items (purchased by a customer in a visit). An association rule is an implication of the form A → B, where A and B are subsets of I, and A∩B= Ø Customer buys A (Computer) Customer buys both Customer buys B (Software)

3 Association Mining-Basic Concepts (contd…)  Find all the rules A → B with minimum confidence and support support, s, probability that a transaction contains both A and B confidence, c, conditional probability that a transaction having A also contains B  Rules satisfying a minimum support threshold and a minimum confidence threshold are called strong  A set of items is referred to as an itemset.  An itemset containing k items is a k-itemset.  The occurrence frequency of an itemset is the number of transactions that contain the itemset (frequency, support count or count)  An itemset satisfying minimum support (count) is a frequent itemset commonly denoted by L k

4 Association Mining-Basic Concepts (contd…)  Association rule mining is a two step process Find all frequent itemsets Generate strong association rules from frequent itemsets  Performance determined by first step

5 Association Rule Mining: A Road Map  Based on the completeness of mined patterns Complete set of frequent itemsets, constrained frequent itemsets  Based on levels of abstraction Single level vs. multiple-level analysis  age(x, “30..39”) ®  buys(x, “computer”)  age(x, “30..39”) ®  buys(x, “laptop”)  Based on number of data dimensions Single dimension vs. multiple dimensional associations  Based on the types of values handled Boolean vs. quantitative associations buys(x, “SQLServer”) ^ buys(x, “DMBook”) ®  buys(x, “DBMiner”) [0.2%, 60%] age(x, “30..39”) ^ income(x, “42..48K”) ®  buys(x, “PC”) [1%, 75%]  Based on kinds of rules to be mined Association rules, correlation rules  Based on the kinds of patterns to be mined Frequent itemset mining, sequential pattern mining, structured patterns mining

6 Mining Association Rules—An Example Min. support 50% Min. confidence 50%

7 The Apriori Algorithm  Method: Initially, scan DB once to get frequent 1-itemset Generate length (k+1) candidate itemsets from length k frequent itemsets Test the candidates against DB Terminate when no frequent or candidate set can be generated  Use the frequent itemsets to generate association rules. The Apriori principle: All nonempty subsets of a frequent itemset must be frequent

8 The Apriori Algorithm — Example Database D Scan D C1C1 L1L1 L2L2 C2C2 C2C2 C3C3 L3L3

9 The Apriori Algorithm  Pseudo-code: C k : Candidate itemset of size k L k : frequent itemset of size k L 1 = {frequent items}; for (k = 1; L k !=  ; k++) do begin C k+1 = candidates generated from L k ; for each transaction t in database do increment the count of all candidates in C k+1 that are contained in t L k+1 = candidates in C k+1 with min_support end return  k L k ;

10 Important Details of Apriori  How to generate candidates? Step 1: self-joining L k Step 2: pruning  How to count supports of candidates?  Example of Candidate-generation L 3 ={abc, abd, acd, ace, bcd} Self-joining: L 3 *L 3  abcd from abc and abd  acde from acd and ace Pruning:  acde is removed because ade is not in L 3 C 4 ={abcd}

11 How to Generate Candidates?  Suppose the items in L k-1 are listed in an order  Step 1: self-joining L k-1 insert into C k select p.item 1, p.item 2, …, p.item k-1, q.item k-1 from L k-1 p, L k-1 q where p.item 1 =q.item 1, …, p.item k-2 =q.item k-2, p.item k-1 < q.item k-1  Step 2: pruning forall itemsets c in C k do forall (k-1)-subsets s of c do if (s is not in L k-1 ) then delete c from C k

12 Example – Transaction DB

13 Adapted from slides by Han and Kamber http://www-faculty.cs.uiuc.edu/~hanj/bk2/ Example – Finding Frequent Patterns (1)

14 Example – Finding Frequent Patterns (2)

Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.

Similar presentations

Presentation on theme: "Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.

Similar presentations

Presentation on theme: "Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot."— Presentation transcript:

Similar presentations

About project

Feedback