1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.

Slides:



Advertisements
Similar presentations
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Advertisements

Data Mining Techniques Association Rule
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Data Mining Association Analysis: Basic Concepts and Algorithms
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining II Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining Part 1 Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Fast Algorithms for Association Rule Mining
Lecture14: Association Rules
Mining Association Rules
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
MIS 451 Building Business Intelligence Systems Association Rule Mining (1)
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
1 Mining Association Rules Mohamed G. Elfeky. 2 Introduction Data mining is the discovery of knowledge and useful information from the large amounts of.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Association rule mining Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf). Assume all data.
Association Rule Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CURE Clustering Using Representatives Handles outliers well. Hierarchical, partition First a constant number of points c, are chosen from each cluster.
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Association Rules Carissa Wang February 23, 2010.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Association Rules Repoussis Panagiotis.
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Frequent Pattern Mining
Association Rules.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Transactional data Algorithm Applications
Data Mining Association Analysis: Basic Concepts and Algorithms
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Market Basket Analysis and Association Rules
©Jiawei Han and Micheline Kamber
Association Analysis: Basic Concepts
Presentation transcript:

1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of Business

2 Main Expectations Knowledge pattern in focus Definitions and examples A basic method How to tune the method Decision support applications When to use association rule mining Reading – T2, pp

3 Association Under a given condition, a set of objects  (implies) another set of objects Examples Retail items purchased together Services subscribed by the same customer Web pages a user access in a session Courses taken by the same student Medications prescribed by a doctor for a patient visit Genes that are expressed at the same level

4 Decision Support Applications Customer relationship management Retail merchandise placement Online retail catalog design Website link re-organization Fraud detection Gene analysis for cancer prevention

5 Preliminary Set Theory –A set is a collection of objects. E.g., set A = {3,5} and set B= {1,3,5} –Elements of a set are the objects belong to it. E.g., 3 {3,5}, 3 {1,3,5}, 3 A and 3 B –Set X is a subset of set Y if any element in X belongs to Y, denoted as X Y. E.g., A B or {3,5} {1,3,5}

6 Preliminary Two properties of set –An element in a set is counted only once E.g., {1,3,5} = {1,3,3,5} –There is no order of elements in a set E.g., {3,1,5} = {1,3,5}

7 Association Rules Given: A database of transactions Example of transactions: a customer’s visit to a grocery store an online purchase at a virtual store such as ‘Amazon.com’ Format of transactions: datetransaction IDcustomer IDItem 1/1/ egg 1/1/ milk

8 Association Rules Find: patterns in the form of association rules Association rules : correlate the presence of one set of items (X) with the presence of another set of items (Y), denoted as X  Y Example : {purchase egg,milk}  {bread} How to measure correlations in association rules?

9 Association Rules Itemset: a set of items, ex. {egg, milk} Size of Itemset: number of items in that itemset. The ratio of the number of transactions that purchases all items in an itemset to the total number of transactions is called the support of the itemset.

10 Association Rules Example: TIDCIDItem PriceDate Computer15001/4/ MS Office3001/4/ MCSE Book1001/4/ Hard disk5001/8/ MCSE Book1001/8/ Computer15001/21/ Hard disk5001/ MCSE Book1001/2199

11 Association Rules In this example: The support of the 2-itemset {Computer,Hard disk} is 1/3=33.3%. What is the support of 1-itemset {Computer}?

12 Association Rules Two important metrics for association rules: If two itemsets X and Y co-exist in a transaction database, the association rule X  Y holds with supports s which is the ratio of the # of transactions purchasing both X and Y to (÷) the total # of transactions confidence c which is the ratio of the # of transactions purchasing both X and Y to (÷) the # of transactions purchasing X only.

13 Association Rules Association rule: {Computer}  {Hard disk} Support: 1/3=33.3% Confidence: 1/2=50% How about {Computer}  {MCSE book} {Computer, MCSE book}  {Hard disk}???

14 Association Rule Mining Association rule mining: find all association rules with support no less than user-specified minimum support and confidence no less than user-specified minimum confidence in a database For small problems, the process of mining association rules is not that complex. How about a transaction database with 1billion transactions and 1million different items? An efficient algorithm is needed!

15 Association Rules Two Steps in Association rule mining: 1. Find all large or frequent itemsets that have support above user-specified minimum support. 2.For each large itemset L, find all association rules in the form of a  (L-a) where a and (L-a) are non-empty subsets of L. Example: find all association rules in the example with minimum support 60% and minimum confidence 80%.

16 Association Rule Mining Step 2 is trivial compared to step 1: Exponential search space Size of transaction database

17 Apriori Algorithm Apriori is an efficient algorithm to discover all large itemsets from a huge database with large number of items. Apriori is developed by two researchers from IBM Almaden Research Lab.

18 Apriori Algorithm Apriori algorithm is based on Apriori property. Apriori property is that any subset of a large itemset must be large.

19 Apriori Algorithm Step 1: Scan DB one time to find all large 1- itemsets. Step 2: Generate candidate K-itemsets from large (k-1)-itemsets. Step 3: Find all large k-itemsets from candidate k-itemsets by scanning DB once Go back to step 2 and stop until no cadidate itemsets can be generated.

20 Apriori Algorithm Step 2 –Candidate k-itemsets are k-itemsets that could be large. –Why generate candidate k-itemsets only from large (k-1) itemsets? –How to generate? Step 2-1: Join: Two large (k-1)-itemsets, L1 amd L2, that are joinable must satisfy the following conditions: –L1(1)=L2(1) and L1(2)=L2(2) and …. L1(K-2)=L2(K-2) –L1(K-1)<L2(K-1) Step 2-2: Prune: prune itemsets generated in step 2-1 that have subset not large.

21 Apriori Algorithm Minimum support =40% Minimum confidence =70% Transaction IDItems 1001,3,4,6 2002,3,5,7 3001,2,3,5,8 4002,5,9, ,4

22 Association Rule Mining Large 1-itemset: {1}support=3/5=60% {2} support=3/5=60% {3}support=3/5=60% {4}support=2/5=40% {5}support=3/5=60% Tid items 1001, 3, 4, , 3, 5, , 2, 3, 5, , 5, 9, , 4 Minimum Support: 40%

23 Association Rule Mining Large 1-itemset: {1}support=3/5=60% {2} support=3/5=60% {3}support=3/5=60% {4}support=2/5=40% {5}support=3/5=60% Candidate 2-itemset: {1, 2}{1, 3}{1, 4}{1, 5} {2, 3}{2, 4}{2, 5} {3, 4}{3, 5} {4, 5}

24 Association Rule Mining Candidate 2-itemset: {1, 2}{1, 3}{1, 4}{1, 5} {2, 3}{2, 4}{2, 5} {3, 4}{3, 5} {4, 5} Large 2-itemset: {1, 3}support=2/5=40% {1, 4} support=2/5=40% {2, 3}support=2/5=40% {2, 5}support=3/5=60% {3, 5}support=2/5=40%

25 Association Rule Mining Candidate 3-itemset: {1, 3, 4} {2, 3, 5} Large 2-itemset: {1, 3}support=2/5=40% {1, 4} support=2/5=40% {2, 3}support=2/5=40% {2, 5}support=3/5=60% {3, 5}support=2/5=40%

26 Association Rule Mining Candidate 3-itemset: {1, 3, 4} {2, 3, 5} Large 3-itemset: {2, 3, 5}support=2/5=40% Candidate 4-itemset: No candidate 4-itemset. Stop.