Recommender Systems. Collaborative Filtering Process.

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

Traditional IR models Jian-Yun Nie.
Geometric Representation of Regression. ‘Multipurpose’ Dataset from class website Attitude towards job –Higher scores indicate more unfavorable attitude.
Coordinatate systems are used to assign numeric values to locations with respect to a particular frame of reference commonly referred to as the origin.
Correlation and regression Dr. Ghada Abo-Zaid
CLUSTERING PROXIMITY MEASURES
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
Oct 14, 2014 Lirong Xia Recommender systems acknowledgment: Li Zhang, UCSC.
Distance and Similarity Measures
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Item-based Collaborative Filtering Idea: a user is likely to have the same opinion for similar items [if I like Canon cameras, I might also like Canon.
6 6.1 © 2012 Pearson Education, Inc. Orthogonality and Least Squares INNER PRODUCT, LENGTH, AND ORTHOGONALITY.
Chapter 5 Orthogonality
Learning Bit by Bit Collaborative Filtering/Recommendation Systems.
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
Cluster Analysis (1).
Separate multivariate observations
Distance and Similarity Measures
Recommendations Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Summarized by Soo-Jin Kim
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 9.1 Chapter 9 : Social Networks What is a social.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Programming Collective Intelligence by Toby.
CHAPTER FIVE Orthogonality Why orthogonal? Least square problem Accuracy of Numerical computation.
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
CORRELATION & REGRESSION
14 Elements of Nonparametric Statistics
Flowchart IDT. What is a flowchart? Visual representation of a flow of data Outline of process or a solution to a problem Outline the basic logic behind.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
11.5 Day 2 Lines and Planes in Space. Equations of planes.
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
SINGULAR VALUE DECOMPOSITION (SVD)
GG 313 Geological Data Analysis Lecture 13 Solution of Simultaneous Equations October 4, 2005.
Artificial Intelligence with Web Applications Dell Zhang Birkbeck, University of London 2010/11.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
CORRELATION. Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson’s coefficient of correlation.
EigenRank: A ranking oriented approach to collaborative filtering By Nathan N. Liu and Qiang Yang Presented by Zachary 1.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Pearson Correlation Coefficient 77B Recommender Systems.
The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
CORRELATION ANALYSIS.
Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Company LOGO MovieMiner A collaborative filtering system for predicting Netflix user’s movie ratings [ECS289G Data Mining] Team Spelunker: Justin Becker,
THE ROLE OF STATISTICS IN RESEARCH. Reading APPENDIX A: Statistics pp
Coordinatate systems are used to assign numeric values to locations with respect to a particular frame of reference commonly referred to as the origin.
Item-Based Collaborative Filtering Recommendation Algorithms
Chapter 14 – Association Rules and Collaborative Filtering © Galit Shmueli and Peter Bruce 2016 Data Mining for Business Analytics (3rd ed.) Shmueli, Bruce.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Unsupervised Learning
Matrix Factorization and Collaborative Filtering
Calculating the correlation coefficient
Data Mining: Concepts and Techniques
Recommender Systems & Collaborative Filtering
Distance and Similarity Measures
CS728 The Collaboration Graph
CORRELATION.
Chapter 5 STATISTICS (PART 4).
Collaborative Filtering Nearest Neighbor Approach
Section 7.12: Similarity By: Ralucca Gera, NPS.
M.Sc. Project Doron Harlev Supervisor: Dr. Dana Ron
Step-By-Step Instructions for Miniproject 2
Advanced Artificial Intelligence
CORRELATION ANALYSIS.
Movie Recommendation System
Collaborative Filtering Non-negative Matrix Factorization
Unsupervised Learning
Presentation transcript:

Recommender Systems

Collaborative Filtering Process

Challenge - Sparsity Active users may have purchased well under 1% of the items (1% of 2 million books is 20,000 books). Solution: Use sparse representations of the rating matrix.

Ratings in a hashtable critics = { 'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5, 'Just my Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5, 'The Night Listener': 3.0}, 'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5, 'Just my Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 3.5},

Ratings in a hashtable 'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0, 'Superman Returns': 3.5, 'The Night Listener': 4.0}, 'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just my Luck': 3.0, 'The Night Listener': 4.5, 'Superman Returns': 4.0, 'You, Me and Dupree': 2.5}, 'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 'Just my Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 2.0},

Ratings in a hashtable 'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 'Superman Returns': 5.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 3.5}, 'Toby': {'Snakes on a Plane': 4.5, 'Superman Returns': 4.0, 'You, Me and Dupree': 1.0} }

Finding Similar Users Simple way to calculate a similarity score is to use Euclidean distance, which considers the items that people have ranked in common. People in preference space

Computing Euclidean Distance def sim_distance(prefs, person1, person2): #Get the list of shared items si=[] for item in prefs[person1]: if item in prefs[person2]: si += [item] if len(si) == 0: return 0 sum_of_squares = sum( [ (prefs[person1][item]-prefs[person2][item])**2 for item in si] ) return 1/(1+sqrt(sum_of_squares))

Pearson Correlation Score The correlation coefficient is a measure of how well two sets of data fit on a straight line. Best fit line

Pearson Correlation Score Corrects for grade inflation. –E.g., Jack Matthews tends to give higher scores than Lisa Rose, but the line still fits because they have relatively similar preferences. Euclidean distance score will say they are quite dissimilar... Two critics with a high correlation score.

Pearson Correlation Formula

Geometric Interpretation For centered data (i.e., data which have been shifted by the sample mean so as to have an average of zero), the correlation coefficient can also be viewed as the cosine of the angle between two vectors. E.g., suppose a critic rated five movies by 1, 2, 3, 5, and 8, respectively, and another critic rated those movies by.11,.12,.13,.15, and.18. These data are perfectly correlated: y = x. –Pearson correlation coefficient must therefore be exactly one. Centering the data (shifting x by E(x) = 3.8 and y by E(y) = 0.138) yields x = (−2.8, −1.8, −0.8, 1.2, 4.2) and y = (−0.028, −0.018, −0.008, 0.012, 0.042), from which as expected.

Pearson Correlation Code def sim_pearson(prefs, person1, person2): si=[] for item in prefs[person1]: if item in prefs[person2]: si += [item] n = len(si) if n == 0: return 0 #Add up all the preferences sum1 = sum([prefs[person1][item] for item in si]) sum2 = sum([prefs[person2][item] for item in si]) #Sum up the squares sum1Sq = sum([prefs[person1][item]**2 for item in si]) sum2Sq = sum([prefs[person2][item]**2 for item in si]) #Sum up the products pSum=sum([ prefs[person1][item] * prefs[person2][item] for item in si ]) #Calculate Pearson Score numerator = pSum-(sum1*sum2/n) denumerator = sqrt( (sum1Sq-sum1**2/n) * (sum2Sq-sum2**2/n) ) if denumerator == 0: return 0 return numerator/denumerator

Top Matches def topMatches(critics, person, n=5, similarity=sim_pearson): scores=[ (similarity(critics,person,other), other) for other in critics if other!=person] scores.sort() scores.reverse() return scores[0:n] >> recommendations.topMatches(recommendations.critics,'Toby',n=3) [( , 'Lisa Rose'), ( , 'Mick LaSalle'), ( , 'Claudia Puig')]

Recommending Items

def getRecommendations(prefs, person, similarity=sim_pearson): totals={} simSums={} for other in prefs: if other==person: continue sim=similarity(prefs,person,other) if sim<=0: continue for item in prefs[other]: # only score movies I haven't seen yet if item not in prefs[person]: # Similarity * Score totals.setdefault(item,0) totals[item]+=prefs[other][item]*sim # Sum of similarities simSums.setdefault(item,0) simSums[item]+=sim

Recommending Items # Create the normalized list rankings=[(total/simSums[item],item) for item,total in totals.items()] # Return the sorted list rankings.sort( ) rankings.reverse( ) return rankings >>> recommendations.getRecommendations(recommendations.critics,'Toby') [( , 'The Night Listener'), ( , 'Lady in the Water'), ( , 'Just My Luck')]

Matching Products Recall Amazon…

Transform the data {'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5}, 'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5}} to: {'Lady in the Water':{'Lisa Rose':2.5,'Gene Seymour':3.0}, 'Snakes on a Plane':{'Lisa Rose':3.5,'Gene Seymour':3.5}} etc.. def transformPrefs(prefs): result={} for person in prefs: for item in prefs[person]: result.setdefault(item,{}) # Flip item and person result[item][person]=prefs[person][item] return result

Getting Similar Items >> movies=recommendations.transformPrefs(recommendations.critics) >> recommendations.topMatches(movies,'Superman Returns') [(0.657, 'You, Me and Dupree'), (0.487, 'Lady in the Water'), (0.111, 'Snakes on a Plane'), (-0.179, 'The Night Listener'), (-0.422, 'Just My Luck')]

Whom to invite to a premiere? >>recommendations.getRecommendations(movies,'Just My Luck') [(4.0, 'Michael Phillips'), (3.0, 'Jack Matthews')] For another example, reversing the products with the people, as done here, would allow an online retailer to search for people who might buy certain products.

Building a Cache def calculateSimilarItems(prefs,n=10): # Create a dictionary of items showing which other items they # are most similar to. result={} # Invert the preference matrix to be item-centric itemPrefs=transformPrefs(prefs) for item in itemPrefs: # Find the most similar items to this one scores=topMatches(itemPrefs,item,n=n,similarity=sim_distance) result[item]=scores return result