Chapter 13 – Association Rules DM for Business Intelligence.

Slides:



Advertisements
Similar presentations
Association Rules Evgueni Smirnov.
Advertisements

Mining Association Rules from Microarray Gene Expression Data.
Data Mining Techniques Association Rule
DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
MIS2502: Data Analytics Association Rule Mining. Uses What products are bought together? Amazon’s recommendation engine Telephone calling patterns Association.
Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto.
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Mining Association Rules. Association rules Association rules… –… can predict any attribute and combinations of attributes … are not intended to be used.
Chapter 9 Business Intelligence Systems
AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Fast Algorithms for Association Rule Mining
IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
MIS 451 Building Business Intelligence Systems Association Rule Mining (1)
Chapter 13 – Association Rules
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Basic Data Mining Techniques
Overview DM for Business Intelligence.
Market Basket Analysis 포항공대 산업공학과 PASTA Lab. 석사과정 신원영.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Using Data Mining Technologies to find Currency Trading Rules A. G. Malliaris M. E. Malliaris Loyola University Chicago Multinational Finance Society,
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling.
Association Rule Mining
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
Charles Tappert Seidenberg School of CSIS, Pace University
CURE Clustering Using Representatives Handles outliers well. Hierarchical, partition First a constant number of points c, are chosen from each cluster.
Recommendation Algorithms for E-Commerce. Introduction Millions of products are sold over the web. Choosing among so many options is proving challenging.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.5: Mining Association Rules Rodney Nielsen.
Elsayed Hemayed Data Mining Course
Data Mining  Association Rule  Classification  Clustering.
Data Analytics CMIS Short Course part II Day 1 Part 1: Clustering Sam Buttrey December 2015.
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
MIS2502: Data Analytics Association Rule Mining David Schuff
Chapter 14 – Association Rules and Collaborative Filtering © Galit Shmueli and Peter Bruce 2016 Data Mining for Business Analytics (3rd ed.) Shmueli, Bruce.
MIS2502: Data Analytics Association Rule Mining Jeremy Shafer
Association Rules Repoussis Panagiotis.
Tutorial 3: Using XLMiner for Association Rule Mining
Association Rules.
Association Rule Mining (Data Mining Tool)
I. Association Market Basket Analysis.
Eco 6380 Predictive Analytics For Economists Spring 2016
Market Basket Analysis and Association Rules
Market Basket Many-to-many relationship between different objects
Association Rule Mining
DIRECT HASHING AND PRUNING (DHP) ALGORITHM
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Analysis of Customer Behavior and Service Modeling
MIS2502: Data Analytics Association Rule Mining
Market Basket Analysis and Association Rules
MIS2502: Data Analytics Association Rule Mining
©Jiawei Han and Micheline Kamber
Association Rules :A book store case
MIS2502: Data Analytics Association Rule Learning
Chapter 14 – Association Rules
Charles Tappert Seidenberg School of CSIS, Pace University
Presentation transcript:

Chapter 13 – Association Rules DM for Business Intelligence

What are Association Rules? Study of “what goes with what” “Customers who bought X also bought Y” What symptoms go with what diagnosis Transaction-based or event-based Also called “market basket analysis” and “affinity analysis” Originated with study of customer transactions databases to determine associations among items purchased Example of Unsupervised Learning

Used in many recommender systems

Generating Rules

Terms “IF” part = antecedent, represented by “(a)” “THEN” part = consequent, represented by “(c)” “Item set” = the items (e.g., products) comprising the antecedent and consequent, represented by “(a U c)” Antecedent and consequent are disjoint (i.e., have no items in common)

Tiny Example: Phone Cases/Faceplates

Many Rules are Possible For example: Transaction 1 supports several rules, such as “If red, then white” (“If a red faceplate is purchased, then so is a white one”) “If white, then red” “If red and white, then green” + several more

Frequent Item Sets Ideally, we want to create all possible combinations of items Problem: computation time grows exponentially as number of combinations increases Solution: consider only “frequent item sets” Criterion for frequent: Support

Support Support = Count (or percent) of total transactions that include an Item or Item Set Support(a) Support(c) Support(a U c) Example: support for the Item Set {red, white} is 4 out of 10 transactions, or 40%

Tiny Example: Phone Cases/Faceplates Support = 4/10 or 40%

Apriori Algorithm

Generating Frequent Item Sets For k products… 1. User sets a minimum support criterion 2. Next, generate list of one-item sets that meet the support criterion 3. Use the list of one-item sets to generate list of two- item sets that meet the support criterion 4. Use list of two-item sets to generate list of three- item sets 5. Continue up through k-item sets

Measures of Performance Support: = count or % of total transactions that include an Item or Item Set. Ex, Support(a), Support(c), or Support (a U c) Confidence = % of antecedent (IFs) transactions that also have the consequent (THEN) item set (“within sets” measure... IF and THENs / IFs) Or Support(a U c) / Support(a) Benchmark Confidence = transactions with consequent as % of all transactions (THENs / Total Records in Data) Or Support(c) / Total Records Lift = Confidence / Benchmark Confidence Lift > 1 indicates a rule that is useful in finding consequent items sets (i.e., more useful than just selecting transactions randomly)

Tiny Example: Phone Cases/Faceplates Confidence = 4/5 or 80% Benchmark Confidence = 4/10 or 40% Lift = 80%  40% or 2 (greater than 1, so good lift)

Select Ruleswith Confidence Generate all rules that meet specified support & confidence Find frequent item sets (those with sufficient support – see above) From these item sets, generate rules with sufficient confidence

Alternate Data Format: Binary Matrix IF Red and White, THEN Green = 2 IF Red and White = 4 = 2  4 or 50% Confidence

Example: Rules from {red, white, green} 1. {red, white}  {green} with confidence = 2/4 = 50% [(support {red, white, green})/(support {red, white})] 2. {red, green}  {white} with confidence = 2/2 = 100% [(support {red, white, green})/(support {red, green})] Plus 4 more with confidence of 100%, 33%, 29% & 100% If confidence criterion is 70%, report only rules 2, 3 and 6

All Rules (XLMiner Output) Benchmark Confidence = Support(c) / Total Records (e.g., 10 records) Lift Ratio = Confidence / Benchmark Confidence Red, White => Red, Green => White, Green => Red => White => Green => Green White Red White, Green Red, Green Red, White Confidence = Support(a U c) / Support(a) “IF” “THEN”

Interpretation Lift ratio shows how effective the rule is in finding consequents (useful if finding particular consequents is important) Confidence shows the rate at which consequents will be found (useful in learning costs of promotion) Support measures overall impact

Caution: The Role of Chance Random data can generate apparently interesting association rules The more rules you produce, the greater this danger Rules based on large numbers of records are less subject to this danger

Example: Charles Book Club Row 1, e.g., is a transaction in which books were bought in the following categories: Youth, Do it Yourself, Geography

XLMiner Output Rules arrayed in order of lift Information can be compressed e.g., rules 2 and 7 have same trio of books

Summary Association rules (or affinity analysis, or market basket analysis) produce rules on among items from a database of transactions Is Unsupervised Learning Widely used in recommender systems Most popular method is Apriori algorithm To reduce computation, we consider only “frequent” item sets Performance is measured by confidence and lift Can produce too many rules; review is required to identify useful rules and to reduce redundancy