Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.

Slides:



Advertisements
Similar presentations
Association Rules Evgueni Smirnov.
Advertisements

Association Rule Mining
Association Rule Mining
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Techniques Association Rule
LOGO Association Rule Lecturer: Dr. Bo Yuan
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Organization “Association Analysis”
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Association Rule Mining Instructor Qiang Yang Slides from Jiawei Han and Jian Pei And from Introduction to Data Mining By Tan, Steinbach, Kumar.
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Association Rule Mining Instructor Qiang Yang Thanks: Jiawei Han and Jian Pei.
Mining Association Rules in Large Databases
Mining Association Rules in Large Databases
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Mining Association Rules
Mining Association Rules
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Eick, Tan, Steinbach, Kumar: Association Analysis Part1 Organization “Association Analysis” 1. What is Association Analysis? 2. Association Rules 3. The.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Information Systems Data Analysis – Association Mining Prof. Les Sztandera.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Data Mining Find information from data data ? information.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) ICS, Polish Academy of Sciences.
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Data Mining Find information from data data ? information.
Reducing Number of Candidates
Data Mining: Concepts and Techniques
Association Rules Repoussis Panagiotis.
Mining Association Rules
Frequent Pattern Mining
Association Rules.
Association Rules Zbigniew W. Ras*,#) presented by
Dynamic Itemset Counting
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules in Large Databases
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS
Fractional Factorial Design
©Jiawei Han and Micheline Kamber
Design matrix Run A B C D E
Association Analysis: Basic Concepts
Presentation transcript:

Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach

Association rules define relationship of the form: Read as A implies B, where A and B are sets of binary valued attributes represented in a data set. Association Rule Mining (ARM) is then the process of finding all the ARs in a given DB. A  B Initial Definition of Association Rules (ARs) Mining

Association Rule: Basic Concepts Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit) Find: all rules that correlate the presence of one set of items with that of another set of items –E.g., 98% of students who study Databases and C++ also study Algorithms Applications –Home Electronics  * (What other products should the store stocks up?) –Attached mailing in direct marketing –Web page navigation in Search Engines (first page a-> page b) –Text mining if IT companies -> Microsoft

D = A data set comprising n records and m binary valued attributes. I = The set of m attributes, {i 1,i 2, …,i m }, represented in D. Itemset = Some subset of I. Each record in D is an itemset. Some Notation

I = {a,b,c,d,e}, D = {{a,b,c},{a,b,d},{a,b,e},{a,c,d}, {a,c,e},{a,d,e},{b,c,d},{b,c,e}, {b,d,e},{c,d,e}} Given attributes which are not binary valued (i.e. either nominal or 10c d e or ranged) the attributes can be “discretised” so that they are represented by a number of binary valued attributes. 9b d e 8b c e 7b c d 6a d e 5a c e 4a c d 3a b e 2a b d 1a b c TIDAtts Example DB

Association rules define relationship of the form: Read as A implies B Such that A  I, B  I, A  B=  (A and B are disjoint) and A  B  I. In other words an AR is made up of an itemset of cardinality 2 or more. A  B In depth Definition of ARs Mining

Given a database D we wish to find (Mine) all the itemsets of cardinality 2 or more, contained in D, and then use these item sets to create association rules of the form A  B. The number of potential itemsets of cardinality 2 or more is: 2 m -m-1 interesting So know we do not want to find “all the itemsets of cardinality 2 or more, contained in D”, we only want to find the interesting itemsets of cardinality 2 or more, contained in D. If m=5, #potential itemsets = 26 If m=20, #potential itemsets = ARM Problem Definition (1)

The most commonly used “interestingness” measures are: 1.Support 2.Confidence Association Rules Measurement

Itemset Support Support: A measure of the frequency with which an itemset occurs in a DB. If an itemset has support higher than some specified threshold we say that the itemset is supported or frequent (some authors use the term large). Support threshold is normally set reasonably low (say) 1%. supp(A) = # records that contain A m

Confidence Confidence: A measure, expressed as a ratio, of the support for an AR compared to the support of its antecedent. We say that we are confident in a rule if its confidence exceeds some threshold (normally set reasonably high, say, 80%). conf(A  B) = supp(A  B) supp(A)

Rule Measures: Support and Confidence Find all the rules X & Y  Z with minimum confidence and support –support, s, probability that a transaction contains {X  Y  Z} –confidence, c, conditional probability that a transaction having {X  Y} also contains Z Let minimum support 50%, and minimum confidence 50%, we have –A  C (50%, 66.6%) –C  A (50%, 100%) Customer buys Bread Customer buys both Customer buys Butter

Given a database D we wish to find all the frequent itemsets (F) and then use this knowledge to produce high confidence association rules. Note: Finding F is the most computationally expensive part, once we have the frequent sets generating ARs is straight forward ARM Problem Definition (2)

a 6 b6 ab3 c6 ac3 bc3 abc1 d6 ad6 bd3 abd1 cd3 acd1 bcd1 abcd0 e6 ae3 be3 abe1 ce3 ace1 bce1 abce0 de3 ade1 bde1 abde0 cde1 acde0 bcde0 abcde0 List all possible combinations in an array. For each record: 1.Find all combinations. 2.For each combination index into array and increment support by 1. Then generate rules BRUTE FORCE

a 6 b6 ab3 c6 ac3 bc3 abc1 d6 ad6 bd3 abd1 cd3 acd1 bcd1 abcd0 e6 ae3 be3 abe1 ce3 ace1 bce1 abce0 de3 ade1 bde1 abde0 cde1 acde0 bcde0 abcde0 Support threshold = 5% (count of 1.55) Frequents Sets ( F ): ab(3) ac(3) bc(3) ad(3) bd(3) cd(3) ae(3) be(3) ce(3) de(3) Rules: a  b conf=3/6=50% b  a conf=3/6=50% Etc.

Advantages: 1)Very efficient for data sets with small numbers of attributes (<20). Disadvantages: 1)Given 20 attributes, number of combinations is = Therefore array storage requirements will be 4.2MB. 2)Given a data sets with (say) 100 attributes it is likely that many combinations will not be present in the data set --- therefore store only those combinations present in the dataset! BRUTE FORCE

Association Rule Mining: A Road Map Boolean vs. quantitative associations (Based on the types of values handled) – buys(x, “SQLServer”) ^ buys(x, “DMBook”) ->  buys(x, “DBMiner”) [0.2%, 60%] – age(x, “30..39”) ^ income(x, “42..48K”) ->  buys(x, “PC”) [1%, 75%]

Mining Association Rules—An Example For rule A  C: support = support({A  C}) = 50% confidence = support({A  C})/support({A}) = 66.6% The Apriori principle: Any subset of a frequent itemset must be frequent Min. support 50% Min. confidence 50%

Mining Frequent Itemsets: the Key Step Find the frequent itemsets: the sets of items that have minimum support –A subset of a frequent itemset must also be a frequent itemset i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset –Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset) Use the frequent itemsets to generate association rules.

The Apriori Algorithm — Example Database D Scan D C1C1 L1L1 L2L2 C2C2 C2C2 C3C3 L3L3

The Apriori Algorithm Pseudo-code: C k : Candidate itemset of size k L k : frequent itemset of size k L 1 = {frequent items}; for (k = 1; L k !=  ; k++) do begin C k+1 = candidates generated from L k ; for each transaction t in database do increment the count of all candidates in C k+1 that are contained in t L k+1 = candidates in C k+1 with min_support end return  k L k ;

Important Details of Apriori How to generate candidates? –Step 1: self-joining L k –Step 2: pruning How to count supports of candidates? Example of Candidate-generation –L 3 ={abc, abd, acd, ace, bcd} –Self-joining: L 3 *L 3 abcd from abc and abd acde from acd and ace –Pruning: acde is removed because ade is not in L 3 –C 4 ={abcd}