Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.

Slides:



Advertisements
Similar presentations
Association Rules Evgueni Smirnov.
Advertisements

Association Rule Mining
Mining Association Rules in Large Databases
Recap: Mining association rules from large datasets
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
CSE 634 Data Mining Techniques
Data Mining Techniques Association Rule
Association rules and frequent itemsets mining
COMP5318 Knowledge Discovery and Data Mining
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Data Mining Association Analysis: Basic Concepts and Algorithms
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.
Data Mining Association Analysis: Basic Concepts and Algorithms
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Asssociation Rules Prof. Sin-Min Lee Department of Computer Science.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Fast Algorithms for Association Rule Mining
Mining Association Rules
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
Mining Association Rules
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
DATA MINING LECTURE 3 Frequent Itemsets Association Rules.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Data Mining Find information from data data ? information.
Association Rule Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
Association Analysis (3)
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Association Rules Repoussis Panagiotis.
Frequent Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts
Presentation transcript:

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science

Huffman Code Example Given: ABCDE By using an increasing algorithm (changing from smallest to largest), it changes to: BCADE 12346

Huffman Code Example – Step 1 Because B and C are the lowest values, they can be appended. The new value is 3

Huffman Code Example – Step 2 Reorder the problem using the increasing algorithm again. This gives us: BCADE 3346

Huffman Code Example – Step 3 Doing another append will give:

Huffman Code Example – Step 4 From the initial BC A D E code we get: DEABC 46 6 DEBCA 46 6 DABCE 4 66 DBCAE 4 66

Huffman Code Example – Step 5 Taking derivates from the previous step, we get: DEBCA 466 EDBCA 6 10 DABCE 106 DEABC 46 6

Huffman Code Example – Step 6 Taking derivates from the previous step, we get: BCADE 646 EDBCA 6 10 EDABC ABCDE 64 6

Huffman Code Example – Step 7 After the previous step, we’re supposed to map a 1 to each right branch and a 0 to each left branch. The results of the codes are:

Example Items={milk, coke, pepsi, beer, juice}. Support = 3 baskets. B1 = {m, c, b}B2 = {m, p, j} B3 = {m, b} B4 = {c, j} B5 = {m, p, b}B6 = {m, c, b, j} B7 = {c, b, j}B8 = {b, c} Frequent itemsets: {m}, {c}, {b}, {j}, {m, b}, {c, b}, {j, c}.

Association Rules Association rule R : Itemset1 => Itemset2 –Itemset1, 2 are disjoint and Itemset2 is non- empty –meaning: if transaction includes Itemset1 then it also has Itemset2 Examples –A,B => E,C –A => B,C

Example B1 = {m, c, b}B2 = {m, p, j} B3 = {m, b}B4 = {c, j} B5 = {m, p, b}B6 = {m, c, b, j} B7 = {c, b, j}B8 = {b, c} An association rule: {m, b} → c. –Confidence = 2/4 = 50%. + _ +

From Frequent Itemsets to Association Rules Q: Given frequent set {A,B,E}, what are possible association rules? –A => B, E –A, B => E –A, E => B –B => A, E –B, E => A –E => A, B –__ => A,B,E (empty rule), or true => A,B,E

Classification vs Association Rules Classification Rules Focus on one target field Specify class in all cases Measures: Accuracy Association Rules Many target fields Applicable in some cases Measures: Support, Confidence, Lift

Rule Support and Confidence Suppose R : I => J is an association rule –sup (R) = sup (I  J) is the support count support of itemset I  J (I or J) –conf (R) = sup(J) / sup(R) is the confidence of R fraction of transactions with I  J that have J Association rules with minimum support and count are sometimes called “strong” rules

Association Rules Example: Q: Given frequent set {A,B,E}, what association rules have minsup = 2 and minconf= 50% ? A, B => E : conf=2/4 = 50% A, E => B : conf=2/2 = 100% B, E => A : conf=2/2 = 100% E => A, B : conf=2/2 = 100% Don’t qualify A =>B, E : conf=2/6 =33%< 50% B => A, E : conf=2/7 = 28% < 50% __ => A,B,E : conf: 2/9 = 22% < 50%

Find Strong Association Rules A rule has the parameters minsup and minconf: –sup(R) >= minsup and conf (R) >= minconf Problem: –Find all association rules with given minsup and minconf First, find all frequent itemsets

Finding Frequent Itemsets Start by finding one-item sets (easy) Q: How? A: Simply count the frequencies of all items

Finding itemsets: next level Apriori algorithm (Agrawal & Srikant) Idea: use one-item sets to generate two-item sets, two-item sets to generate three-item sets, … –If (A B) is a frequent item set, then (A) and (B) have to be frequent item sets as well! –In general: if X is frequent k-item set, then all (k-1)- item subsets of X are also frequent  Compute k-item set by merging (k-1)-item sets

Finding Association Rules A typical question: “find all association rules with support ≥ s and confidence ≥ c.” –Note: “support” of an association rule is the support of the set of items it mentions. Hard part: finding the high-support (frequent ) itemsets. –Checking the confidence of association rules involving those sets is relatively easy.

Naïve Algorithm A simple way to find frequent pairs is: –Read file once, counting in main memory the occurrences of each pair. Expand each basket of n items into its n (n -1)/2 pairs. Fails if #items-squared exceeds main memory.

C1C1 L1L1 C2C2 L2L2 C3C3 Filter Construct First pass Second pass

Fast Algorithms for Mining Association Rules, by Rakesh Agrawal and Ramakrishan Sikant, IBM Almaden Research Center [Agrawal, Srikant 94]

ItemsTID Set-of- itemsets TID { {1},{3},{4} }100 { {2},{3},{5} }200 { {1},{2},{3},{5} }300 { {2},{5} }400 SupportItemset 2{1} 3{2} 3{3} 3{5} itemset {1 2} {1 3} {1 5} {2 3} {2 5} {3 5} Set-of-itemsetsTID { {1 3} }100 { {2 3},{2 5} {3 5} }200 { {1 2},{1 3},{1 5}, {2 3}, {2 5}, {3 5} } 300 { {2 5} }400 SupportItemset 2{1 3} 3{2 3} 3{2 5} 2{3 5} itemset {2 3 5} Set-of-itemsetsTID { {2 3 5} }200 { {2 3 5} }300 SupportItemset 2{2 3 5} Database C^ 1 L2L2 C2C2 C^ 2 C^ 3 L1L1 L3L3 C3C3

Dynamic Programming Approach Dynamic Programming Approach l Want proof of principle of optimality and overlapping subproblems l Principle of Optimality F The optimal solution to L k includes the optimal solution of L k-1 F Proof by contradiction l Overlapping Subproblems F Lemma of every subset of a frequent item set is a frequent item set F Proof by contradiction

The Apriori Algorithm: Example Consider a database, D, consisting of 9 transactions. Suppose min. support count required is 2 (i.e. min_sup = 2/9 = 22 % ) Let minimum confidence required is 70%. We have to first find out the frequent itemset using Apriori algorithm. Then, Association rules will be generated using min. support & min. confidence. TIDList of Items T100I1, I2, I5 T100I2, I4 T100I2, I3 T100I1, I2, I4 T100I1, I3 T100I2, I3 T100I1, I3 T100I1, I2,I3, I5 T100I1, I2, I3

Step 1 : Generating 1-itemset Frequent Pattern ItemsetSup.Count {I1}6 {I2}7 {I3}6 {I4}2 {I5}2 ItemsetSup.Count {I1}6 {I2}7 {I3}6 {I4}2 {I5}2 In the first iteration of the algorithm, each item is a member of the set of candidate. The set of frequent 1-itemsets, L 1, consists of the candidate 1-itemsets satisfying minimum support. Scan D for count of each candidate Compare candidate support count with minimum support count C1C1 L1L1

Step 2 : Generating 2-itemset Frequent Pattern Itemset {I1, I2} {I1, I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2, I5} {I3, I4} {I3, I5} {I4, I5} ItemsetSup. Count {I1, I2}4 {I1, I3}4 {I1, I4}1 {I1, I5}2 {I2, I3}4 {I2, I4}2 {I2, I5}2 {I3, I4}0 {I3, I5}1 {I4, I5}0 Itemse t Sup Count {I1, I2}4 {I1, I3}4 {I1, I5}2 {I2, I3}4 {I2, I4}2 {I2, I5}2 Generate C 2 candidates from L 1 C2C2 C2C2 L2L2 Scan D for count of each candidate Compare candidate support count with minimum support count

Step 2 : Generating 2-itemset Frequent Pattern [Cont.] To discover the set of frequent 2-itemsets, L 2, the algorithm uses L 1 Join L 1 to generate a candidate set of 2-itemsets, C 2. Next, the transactions in D are scanned and the support count for each candidate itemset in C 2 is accumulated (as shown in the middle table). The set of frequent 2-itemsets, L 2, is then determined, consisting of those candidate 2-itemsets in C 2 having minimum support. Note: We haven’t used Apriori Property yet.

Step 3 : Generating 3-itemset Frequent Pattern Itemset {I1, I2, I3} {I1, I2, I5} ItemsetSup. Count {I1, I2, I3}2 {I1, I2, I5}2 ItemsetSup Count {I1, I2, I3}2 {I1, I2, I5}2 C3C3 C3C3 L3L3 Scan D for count of each candidate Compare candidate support count with min support count Scan D for count of each candidate The generation of the set of candidate 3-itemsets, C 3, involves use of the Apriori Property. In order to find C 3, we compute L 2 Join L 2. C 3 = L2 Join L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}. Now, Join step is complete and Prune step will be used to reduce the size of C 3. Prune step helps to avoid heavy computation due to large C k.

Step 3 : Generating 3-itemset Frequent Pattern [Cont.] Based on the Apriori property that all subsets of a frequent itemset must also be frequent, we can determine that four latter candidates cannot possibly be frequent. How ? For example, lets take {I1, I2, I3}. The 2-item subsets of it are {I1, I2}, {I1, I3} & {I2, I3}. Since all 2-item subsets of {I1, I2, I3} are members of L 2, We will keep {I1, I2, I3} in C 3. Lets take another example of {I2, I3, I5} which shows how the pruning is performed. The 2-item subsets are {I2, I3}, {I2, I5} & {I3,I5}. BUT, {I3, I5} is not a member of L 2 and hence it is not frequent violating Apriori Property. Thus We will have to remove {I2, I3, I5} from C 3. Therefore, C 3 = {{I1, I2, I3}, {I1, I2, I5}} after checking for all members of result of Join operation for Pruning. Now, the transactions in D are scanned in order to determine L 3, consisting of those candidates 3-itemsets in C 3 having minimum support.

Step 4 : Generating 4-itemset Frequent Pattern The algorithm uses L 3 Join L 3 to generate a candidate set of 4-itemsets, C 4. Although the join results in {{I1, I2, I3, I5}}, this itemset is pruned since its subset {{I2, I3, I5}} is not frequent. Thus, C 4 = φ, and algorithm terminates, having found all of the frequent items. This completes our Apriori Algorithm. What’s Next ? These frequent itemsets will be used to generate strong association rules ( where strong association rules satisfy both minimum support & minimum confidence).

Step 5: Generating Association Rules from Frequent Itemsets Procedure: For each frequent itemset “l”, generate all nonempty subsets of l. For every nonempty subset s of l, output the rule “s  (l-s)” if support_count(l) / support_count(s) >= min_conf where min_conf is minimum confidence threshold. Back To Example: We had L = {{I1}, {I2}, {I3}, {I4}, {I5}, {I1,I2}, {I1,I3}, {I1,I5}, {I2,I3}, {I2,I4}, {I2,I5}, {I1,I2,I3}, {I1,I2,I5}}. –Lets take l = {I1,I2,I5}. –Its all nonempty subsets are {I1,I2}, {I1,I5}, {I2,I5}, {I1}, {I2}, {I5}.

Step 5: Generating Association Rules from Frequent Itemsets [Cont.] Let minimum confidence threshold is, say 70%. The resulting association rules are shown below, each listed with its confidence. –R1: I1 ^ I2  I5 Confidence = sc{I1,I2,I5}/sc{I1,I2} = 2/4 = 50% R1 is Rejected. –R2: I1 ^ I5  I2 Confidence = sc{I1,I2,I5}/sc{I1,I5} = 2/2 = 100% R2 is Selected. –R3: I2 ^ I5  I1 Confidence = sc{I1,I2,I5}/sc{I2,I5} = 2/2 = 100% R3 is Selected.

Step 5: Generating Association Rules from Frequent Itemsets [Cont.] – R4: I1  I2 ^ I5 Confidence = sc{I1,I2,I5}/sc{I1} = 2/6 = 33% R4 is Rejected. – R5: I2  I1 ^ I5 Confidence = sc{I1,I2,I5}/{I2} = 2/7 = 29% R5 is Rejected. – R6: I5  I1 ^ I2 Confidence = sc{I1,I2,I5}/ {I5} = 2/2 = 100% R6 is Selected. In this way, We have found three strong association rules.

ABCDE ACDE  B ABCE  D ACD  BE ADE  BC CDE  AB ACE  BD BCE  AD ACE  BD ABE  CD ABC  ED Large itemset Rules with minsup Simple algorithm: Fast algorithm: ACE  BD ABCDE ACDE  B ABCE  D Example