Chapter 14 – Association Rules and Collaborative Filtering © Galit Shmueli and Peter Bruce 2016 Data Mining for Business Analytics (3rd ed.) Shmueli, Bruce.

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

Mining Association Rules from Microarray Gene Expression Data.
DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.
Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Chapter 8 – Logistic Regression
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Distance and Similarity Measures
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Mining Association Rules. Association rules Association rules… –… can predict any attribute and combinations of attributes … are not intended to be used.
Rubi’s Motivation for CF  Find a PhD problem  Find “real life” PhD problem  Find an interesting PhD problem  Make Money!
Memory-Based Recommender Systems : A Comparative Study Aaron John Mani Srinivasan Ramani CSCI 572 PROJECT RECOMPARATOR.
Chapter 7 – K-Nearest-Neighbor
Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Association Rules Olson Yanhong Li. Fuzzy Association Rules Association rules mining provides information to assess significant correlations in large.
Collaborative Filtering CMSC498K Survey Paper Presented by Hyoungtae Cho.
Recommender systems Ram Akella November 26 th 2008.
Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall12-1.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Distance and Similarity Measures
Chapter 13 – Association Rules
Chapter 12 (Section 12.4) : Recommender Systems Second edition of the book, coming soon.
Item-based Collaborative Filtering Recommendation Algorithms
Business Analytics: Methods, Models, and Decisions, 1 st edition James R. Evans Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall12-1.
Julian Keenaghan 1 Personalization of Supermarket Product Recommendations IBM Research Report (2000) R.D. Lawrence et al.
Collaborative Filtering - Rajashree. Apache Mahout In 2008 as a subproject of Apache’s Lucene project Mahout absorbed the Taste open source collaborative.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Item Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karpis, Joseph KonStan, John Riedl (UMN) p.s.: slides adapted from:
Chapter 12 – Discriminant Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Chapter 14 – Cluster Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Association Rule Mining
1 Collaborative Filtering & Content-Based Recommending CS 290N. T. Yang Slides based on R. Mooney at UT Austin.
EigenRank: A ranking oriented approach to collaborative filtering By Nathan N. Liu and Qiang Yang Presented by Zachary 1.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
Charles Tappert Seidenberg School of CSIS, Pace University
Recommendation Algorithms for E-Commerce. Introduction Millions of products are sold over the web. Choosing among so many options is proving challenging.
Elsayed Hemayed Data Mining Course
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Experimental Study on Item-based P-Tree Collaborative Filtering for Netflix Prize.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army.
Recommender Systems Based Rajaraman and Ullman: Mining Massive Data Sets & Francesco Ricci et al. Recommender Systems Handbook.
MIS2502: Data Analytics Association Rule Mining David Schuff
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
Item-Based Collaborative Filtering Recommendation Algorithms
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Chapter 13 – Association Rules DM for Business Intelligence.
Chapter 12 – Discriminant Analysis
Data Mining: Concepts and Techniques
Recommender Systems & Collaborative Filtering
Tutorial 3: Using XLMiner for Association Rule Mining
Association Rule Mining (Data Mining Tool)
Chapter 7 – K-Nearest-Neighbor
Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.
Market Basket Many-to-many relationship between different objects
Collaborative Filtering Nearest Neighbor Approach
MIS2502: Data Analytics Association Rule Mining
Association Rules :A book store case
Chapter 14 – Association Rules
Presentation transcript:

Chapter 14 – Association Rules and Collaborative Filtering © Galit Shmueli and Peter Bruce 2016 Data Mining for Business Analytics (3rd ed.) Shmueli, Bruce & Patel

What are Association Rules? Study of “what goes with what” “Customers who bought X also bought Y” What symptoms go with what diagnosis Transaction-based or event-based Also called “market basket analysis” and “affinity analysis” Originated with study of customer transactions databases to determine associations among items purchased

Used in many recommender systems

Generating Rules

Terms “IF” part = antecedent “THEN” part = consequent “Item set” = the items (e.g., products) comprising the antecedent or consequent Antecedent and consequent are disjoint (i.e., have no items in common)

Tiny Example: Phone Faceplates

Many Rules are Possible For example: Transaction 1 supports several rules, such as “If red, then white” (“If a red faceplate is purchased, then so is a white one”) “If white, then red” “If red and white, then green” + several more

Frequent Item Sets Ideally, we want to create all possible combinations of items Problem: computation time grows exponentially as # items increases Solution: consider only “frequent item sets” Criterion for frequent: support

Support Support = # (or percent) of transactions that include both the antecedent and the consequent Example: support for the item set {red, white} is 4 out of 10 transactions, or 40%

Apriori Algorithm

Generating Frequent Item Sets For k products… 1. User sets a minimum support criterion 2. Next, generate list of one-item sets that meet the support criterion 3. Use the list of one-item sets to generate list of two- item sets that meet the support criterion 4. Use list of two-item sets to generate list of three- item sets 5. Continue up through k-item sets

Measures of Performance Confidence: the % of antecedent transactions that also have the consequent item set Lift = confidence/(benchmark confidence) Benchmark confidence = transactions with consequent as % of all transactions Lift > 1 indicates a rule that is useful in finding consequent items sets (i.e., more useful than just selecting transactions randomly)

Alternate Data Format: Binary Matrix

Process of Rule Selection Generate all rules that meet specified support & confidence Find frequent item sets (those with sufficient support – see above) From these item sets, generate rules with sufficient confidence

Example: Rules from {red, white, green} {red, white} > {green} with confidence = 2/4 = 50% [(support {red, white, green})/(support {red, white})] {red, green} > {white} with confidence = 2/2 = 100% [(support {red, white, green})/(support {red, green})] Plus 4 more with confidence of 100%, 33%, 29% & 100% If confidence criterion is 70%, report only rules 2, 3 and 6

All Rules (XLMiner Output)

Interpretation Lift ratio shows how effective the rule is in finding consequents (useful if finding particular consequents is important) Confidence shows the rate at which consequents will be found (useful in learning costs of promotion) Support measures overall impact

Caution: The Role of Chance Random data can generate apparently interesting association rules The more rules you produce, the greater this danger Rules based on large numbers of records are less subject to this danger

Example: Charles Book Club Row 1, e.g., is a transaction in which books were bought in the following categories: Youth, Do it Yourself, Geography

XLMiner Output Rules arrayed in order of lift Information can be compressed e.g., rules 2 and 7 have same trio of books

Summary – Association Rules Association rules (or affinity analysis, or market basket analysis) produce rules on associations between items from a database of transactions Widely used in recommender systems Most popular method is Apriori algorithm To reduce computation, we consider only “frequent” item sets (=support) Performance is measured by confidence and lift Can produce a profusion of rules; review is required to identify useful rules and to reduce redundancy

Collaborative Filtering User based methods Item based methods

Item-user matrix Cells are user preferences, r ij, for items Preferences can be ratings, or binary (buy, click, like)

More efficient to store as rows of triplets Each row has the user ID, the item ID, and the user’s rating of that item (U u, I i, r ui )

User-based Collaborative Filtering Start with a single user who will be the target of the recommendations Find other users who are most similar, based on comparing preference vectors

Measuring Proximity Like nearest-neighbor algorithm But Euclidean distance does not do well Correlation proximity does better (Pearson) For each user pair, find the co-rated items, calculate correlation between the vectors of their ratings for those items Note that the average ratings for each user are across all products, not just the co-rated ones

Cosine Similarity Like correlation coefficient, except do not subtract the means “Cold start” problem: For users with just one item, or items with just one neighbor, neither cosine similarity nor correlation produces useful metric Binary matrix? Must use all the data, not just the co-rated items. This can add useful info – in the Netflix contest, information about which movies users chose to rate was informative.

Example – Tiny Netflix subset Consider users and

Correlation between users and First find average ratings for each user: Find correlation using departure from avg. ratings for the co-rated movies (movies 1, 28 and 30):

Find cosine similarity for same users Use raw ratings instead of departures from averages: Ranges from 0 (no similarity) to 1 (perfect match)

Using the similarity info to make recommendations Given a new user, identify k-nearest users Consider all the items they rated/purchased, except for the co-rated ones Among these other items, what is the best one? “Best” could be Most purchased Highest rated Most rated That “best” item is the recommendation for the new user

Item-based collaborative filtering When the number of users is huge, user-based calculations pose an obstacle (similarity measures cannot be calculated until user shows up) Alternative – when a user purchases an item, focus on similar items 1. Find co-rated (co-purchased) items (by any user) 2. Recommend the most popular or most correlated item

Summary – Collaborative Filtering User-based – for a new user, find other users who share his/her preferences, recommend the highest- rated item that new user does not have. User-user correlations cannot be calculated until new user appears on the scene… so it is slow if lots of users Item-based – for a new user considering an item, find other item that is most similar in terms of user preferences. Ability to calculate item-item correlations in advance greatly speeds up the algorithm

Association Rules vs. Collaborative Filtering AR: focus entirely on frequent (popular) item combinations. Data rows are single transactions. Ignores user dimension. Often used in displays (what goes with what). CF: focus is on user preferences. Data rows are user purchases or ratings over time. Can capture “long tail” of user preferences – useful for recommendations involving unusual items