Top-N Recommendation Algorithm Based on Item-Graph

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

Item Based Collaborative Filtering Recommendation Algorithms
A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.
Collaborative Filtering Sue Yeon Syn September 21, 2005.
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown.
Yoda: An Accurate and Scalable Web-based Recommendation Systems Cyrus Shahabi, Farnoush Banaei-Kashani, Yi-Shin Chen, and Dennis McLeod Integrated Media.
Item-based Collaborative Filtering Idea: a user is likely to have the same opinion for similar items [if I like Canon cameras, I might also like Canon.
Using a Trust Network To Improve Top-N Recommendation
Rubi’s Motivation for CF  Find a PhD problem  Find “real life” PhD problem  Find an interesting PhD problem  Make Money!
Memory-Based Recommender Systems : A Comparative Study Aaron John Mani Srinivasan Ramani CSCI 572 PROJECT RECOMPARATOR.
EigenTaste: A Constant Time Collaborative Filtering Algorithm Ken Goldberg Students: Theresa Roeder, Dhruv Gupta, Chris Perkins Industrial Engineering.
Lecture 14: Collaborative Filtering Based on Breese, J., Heckerman, D., and Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative.
Artificial Intelligence and Case-Based Reasoning Computer Science and Engineering Mälardalen University Västerås, Mikael Sollenborn, CSL,
Modeling User Rating Profiles For Collaborative Filtering
1 Introduction to Recommendation System Presented by HongBo Deng Nov 14, 2006 Refer to the PPT from Stanford: Anand Rajaraman, Jeffrey D. Ullman.
Analysis of Recommendation Algorithms for E-Commerce Badrul M. Sarwar, George Karypis*, Joseph A. Konstan, and John T. Riedl GroupLens Research/*Army HPCRC.
Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
Collaborative Recommendation via Adaptive Association Rule Mining KDD-2000 Workshop on Web Mining for E-Commerce (WebKDD-2000) Weiyang Lin Sergio A. Alvarez.
Chapter 12 (Section 12.4) : Recommender Systems Second edition of the book, coming soon.
Item-based Collaborative Filtering Recommendation Algorithms
References: Linden, G.; Smith, B.; York, J.; , "Amazon.com recommendations: item-to-item collaborative filtering,". Internet Computing, IEEE , vol.7,
+ Social Bookmarking and Collaborative Filtering Christopher G. Wagner.
Performance of Recommender Algorithms on Top-N Recommendation Tasks RecSys 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering.
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Item Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karpis, Joseph KonStan, John Riedl (UMN) p.s.: slides adapted from:
Collaborative Filtering Recommendation Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
LOGO Recommendation Algorithms Lecturer: Dr. Bo Yuan
Presented By :Ayesha Khan. Content Introduction Everyday Examples of Collaborative Filtering Traditional Collaborative Filtering Socially Collaborative.
Google News Personalization: Scalable Online Collaborative Filtering
Recommender Systems David M. Pennock NEC Research Institute contributions: John Riedl, GroupLens University of Minnesota.
Online Learning for Collaborative Filtering
1 Social Networks and Collaborative Filtering Qiang Yang HKUST Thanks: Sonny Chee.
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
Link-based Similarity Measurement Techniques and Applications Department of Computer Science & Engineering The Chinese University of Hong Kong Zhenjiang.
EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 May 21 st, 2009 Nathan.
Collaborative Filtering  Introduction  Search or Content based Method  User-Based Collaborative Filtering  Item-to-Item Collaborative Filtering  Using.
Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John T. Riedl
The Effect of Dimensionality Reduction in Recommendation Systems
Collaborative Data Analysis and Multi-Agent Systems Robert W. Thomas CSCE APR 2013.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
EigenRank: A ranking oriented approach to collaborative filtering By Nathan N. Liu and Qiang Yang Presented by Zachary 1.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Recommender Systems. Recommender Systems (RSs) n RSs are software tools providing suggestions for items to be of use to users, such as what items to buy,
The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
Community-Based Link Prediction/Recommendation in the Bipartite Network of BoardGameGeek.com Brett Boge CS 765 University of Nevada, Reno.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
ICONIP 2010, Sydney, Australia 1 An Enhanced Semi-supervised Recommendation Model Based on Green’s Function Dingyan Wang and Irwin King Dept. of Computer.
Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army.
Recommender Systems Based Rajaraman and Ullman: Mining Massive Data Sets & Francesco Ricci et al. Recommender Systems Handbook.
Learning in Bayesian Networks. Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown Structure Incomplete.
The Wisdom of the Few Xavier Amatrian, Neal Lathis, Josep M. Pujol SIGIR’09 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Collaborative Filtering: Searching and Retrieving Web Information Together Huimin Lu December 2, 2004 INF 385D Fall 2004 Instructor: Don Turnbull.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
ItemBased Collaborative Filtering Recommendation Algorithms 1.
Slope One Predictors for Online Rating-Based Collaborative Filtering Daniel Lemire, Anna Maclachlan In SIAM Data Mining (SDM’05), Newport Beach, California,
Item-Based Collaborative Filtering Recommendation Algorithms
1 Top-N Recommendation Algorithm Based on Item-Graph Allen, Zhenjiang LIN CSE, CUHK June 7, 2007.
Recommender systems 06/10/2017 S. Trausan-Matu.
Data Mining: Concepts and Techniques
Recommender Systems & Collaborative Filtering
Collaborative Filtering
Collaborative Filtering Nearest Neighbor Approach
Movie Recommendation System
ITEM BASED COLLABORATIVE FILTERING RECOMMENDATION ALGORITHEMS
Recommender Systems Copyright: Dietmar Jannah, Markus Zanker and Gerhard Friedrich (slides based on their IJCAI talk „Tutorial: Recommender Systems”)
Presentation transcript:

Top-N Recommendation Algorithm Based on Item-Graph Allen, Zhenjiang LIN CSE, CUHK Nov 13, 2007 Good afternoon, everyone. My topic today is ” Top-N Recommendation Algorithm Based on Item-Graph”.

Outline 1. Top-N Recommendation Problem 2. Top-N Recommendation Algorithm 3. Item-Graph Model and GCP-based Method Item-Graph Model Generalized Conditional Probability (GCP)-based Recommendation Algorithm 4. Preliminary Experimental Results 5. Conclusion and Future Work We focus on the “Top-N recommendation problem and algorithm” First, I’ll give some basic introduction on the background. After that, I’ll present the Item-Graph model and the Generalized Conditional Probability-based recommendation algorithm which is based on this model. Preliminary experimental results are given to show the performance of the algorithm. The last part is conclusion and future work.

1. Top-N Recommendation Problem The Top-N Recommendation Problem Given the preference information of users, recommend a set of N items to a certain user that he might be interested in, based on the items he has selected. E-commerce system example: Amazon. COM, customers vs products. User-Item matrix Item 1 Item 2 Item 3 … Item m User 1 1 User 2 User n New User ? The Top-N recommendation problem is common in E-commerce systems such as the Amazon.com. Here’s the definition: … In the Amazon.com, users and items correspond to customers and products respectively. Usually, in an on-line shopping system, we have the transaction records of users which is represented by the User-Item matrix with n users and m items. This matrix can be binary or multi-valued. We only consider the binary case in this talk, where 1 indicates certain user bought certain item, 0 otherwise. When a new user (which is called the active user) is shopping on the system, recommending products to the user based on his cart or basket will be very useful. Apparently, this problem can be regarded as a prediction problem. That is, predicting how much the active user will like a certain item, based on his basket. Basket Active User

Example: the Amazon.com Active User Basket Let’s see an example of Amazon.com online shopping system. Here’s the active user, that’s me. In my shopping cart, there’re two books. The Amazon system recommends a list of related books to me. Obviously, these recommendations have been ranked. This is a typical application of top-N recommendation algorithm. Recommendations

1. Top-N Recommendation Problem Challenges in E-commerce Systems Huge amounts of data: millions of users and/or items; Real-time return the results set; Limited new user’s preference information; Volatile users’ preference information. There’re many challenges in E-commerce systems. First, the amounts of data is usually very huge. Second, the system should be real-time. The preference information of new user is usually very limited and users’ preference information may change frequently. These motivate our work. In this work, we first proposed the Item-Graph model which is simple and incremental to reflect the relationship among items. Second, we developed the so-called Generalized Conditional Probability-based top-N recommendation algorithm, which is item-centric and based on the Item-Graph model.

2. Top-N Recommendation Algorithm Two major approaches Content-based: recommend items based on the content (textual information) of items. Fab system [Balabanovic97], Syskill & Webert system [Pazzani97]. Collaborative Filtering (CF): recommend items by collecting taste information from other users. Collaborative (correlation) information between users. More popular than content-based recommendation, since in many domains (such as music, restaurants) it is hard to extract useful features from items. Tapestry system [Goldberg92], Video Recommender [Hill95], Ringo [Shardanand95], GroupLens [Konstan97], Jester system [Goldberg01], Amazon [Linden03]. I’d like to briefly introduce some background of the recommender system first. There’re two major paradigms of the Top-N recommendation algorithms: Content-based and Collaborative Filtering. The content-based approaches recommend items based on their content. Usually the contents are textual information. These are two systems using this paradigm. Whereas the Collaborative Filtering approaches recommend items by collecting taste information from other users. That is, they use the collaborative information between users and/or items. These information is link-based actually. Many systems use the CF paradigm.

2. Top-N Recommendation Algorithm CF algorithms classified by strategy of using data Memory-based: make recommendations based on the entire collection of references of the users. No pre-computing is needed, suffer serious scalability problem. E.g., Correlation-based [Resnick94], Cosine-based [Breese98]. Model-based: use the collection of user preferences to learn a model, which is then used to make recommendations. Building a model off-line, more scalable. E.g., Cluster models [Ungar98], Bayesian network model [Breese98], Association Rule Mining approach [Lin00]. The CF algorithms can be classified by strategy of using data. They are memory-based and model-based. The memory-based algorithms make recommendations based on the entire collection of references of the users. The advantage is that they do not need pre-computing, but these memory-based algorithms usually suffer serious scalability problem when the amounts of data grows very huge. The model-based algorithms use the collection of user preferences to learn a model, which is then used to make recommendations. They need to build a model off-line, but they are more scalable.

2. Top-N Recommendation Algorithm CF algorithms classified by strategy of using objects User-centric: look for similar (like-minded) users first and then make recommendation. Similarity between users is relatively dynamic. Pre-computing user neighborhood may lead to poor predictions. Item-centric: look for similar (or related) items first and then make recommendation. Similarity between items is relatively static. Enables pre-computing of item-item similarity. More scalable. The CF algorithms can also be classified by strategy of using objects. The user-centric algorithms look for similar (like-minded) users first and then make recommendation. However, since the similarity between users is relatively dynamic, pre-computing user neighborhood may lead to poor predictions. The item-centric algorithms look for similar (or related) items first and then make recommendation. Because the similarity between items is relatively static, this enables pre-computing of item-item similarity. So they are usually more scalable. The aim of our work is to develop a Model-based and Item-centric CF top-N recommendation algorithm.

2. Top-N Recommendation Algorithm Notations Item set I = {I1, I2, …, Im}. User set U = {U1, U2, …, Un}. User-Item (binary) matrix D = (Dn,m). Basket of the active user B  I. Similarity score of x and y: sim(x,y). Formal definition of top-N recommendation problem Given a user-item matrix D and a set of items B that have been selected by the active user, identify an ordered set of items X, such that |X| ≤ N, and X ∩B = 0. Here’re some notations. The item set has m elements. The user set has n elements. The user-item matrix D is an n by m binary matrix. B denotes the basket of the active user. And the similarity score of x and y is denoted by sim(x,y). Here’s the formal definition of top-N recommendation problem: Given a user-item matrix D and a set of items B that have been purchased by the active user, identify an ordered set of items X such that |X| ≤ N, and X cap B = 0.

2. Top-N Recommendation Algorithm Two classical item-item similarity measures Cosine-based (symmetric) sim(Ii, Ij) = cos(D*,i, D*,j) (1) Conditional Probability(CP)-based (asymmetric) sim(Ii, Ij) = P(Ij | Ii) ≈ Freq(Ii Ij) / Freq(Ii) (2) Freq(X): the number of users who have purchased the item set X. The ranking score for item x RS(x) = ∑ b∈B sim(b,x) (3) (the sum of similarity score between x and the items in the basket B) Now I introduce two classical item-centric algorithm, both of which are based on the similarity between items. In the Cosine-based algorithm, the similarity between two items i and j is the cosine of the ith and jth columns of the user-item matrix. In the CP-based algorithm, sim(i,j) is the conditional probability of the active user purchasing item i, given the condition that item j is in his basket. In Eq.(2), Freq(X) is the function that returns the number of customers who have purchased the item set X. Therefore, the ranking score for item x is defined by the sum of similarity score between x and the items in the basket.

4. Preliminary Experimental Results Dataset The MovieLens (http://www.grouplens.org/data) A web-based movies recommender system; Contains multi-valued ratings that indicate how much each user liked a particular movie or not; Each user has rated at least 20 movies. We treat the ratings as an indication that the users have seen the movies (nonzero) or not (zero). Table 1: The characteristics of the MovieLens dataset The dataset we used in the experiments comes from MovieLens, which is a web-based movies recommender system. The MovieLens contains multi-value ratings that indicate how much each user liked a particular movie or not. But we treat the ratings as an indication that the users have seen the movies (nonzero) or not (zero). The characteristics of the MovieLens dataset we used is listed in table 1. The Density is the percentage of nonzero entries in the user-item matrix. # of Users # of Items Density1 Average Basket Size 943 1682 6.31% 106.04 1Density: the percentage of nonzero entries in the user-item matrix.

4. Preliminary Experimental Results-1 Evaluation Design Split the dataset into training and test sets by randomly selecting one rated movie of each user to be part of the test set, use the remaining rated movies for training. Cosine(COS)-based, CP-based, GCP-based methods, 10-runs average. Evaluation Metrics Hit-Rate (HR) HR = # of hits / n (6) Average Reciprocal Hit-Rate (ARHR) ARHR = (∑i=1,h1/pi) / n (7) # of hits: the number of items in the test set that were also in the top-N lists. h is the number of hits that occurred at positions p1, p2, … , ph within the top-N lists (i.e., 1 ≤ pi ≤ N). We split the dataset into a training and test set by randomly selecting one rated movie of each user to be part of the test set, and use the remaining rated movies for training. We tested cosine-based, CP-based, and GCP-based methods. Run 10 times and calculate the average. We adopted Hit-Rate and Average Reciprocal Hit-Rate as the metrics. # of hits is the number of items in the test set that were also in the top-N lists. So HR measures the percentage of the test items which is also in the top-N recommendation list. Further, ARHR also takes into account the position of the hits in the top-N recommendation list. In this metrics, h is the number of hits that occurred at positions p1, p2, … , ph within the top-N lists.

4. Preliminary Experimental Results-1 Performance of Top-N Recommendation Algorithms HR (left): x-axis: top-N items, y-axis: hit-rate of all users. ARHR (right): x-axis: top-N items, y-axis: average reciprocal hit-rate of all users. (For the GCP-based method, set d = 2.) This is the preliminary result of cosine-based, CP-based, and GCP-based methods. The x-axis represents top-N, and y-axis represents the HR and ARHR value of the methods, where N varied from 10 to 100 in step 10. We can see that in both HR and ARHR measure, the GCP-based method outperforms other two methods.

4. Preliminary Experimental Results-2 Testing the Parameter d in GCP Method Testing the effect of d ( d = 1, 2, 3 ). Evaluation: Online Shopping Simulation Randomly selecting part of the user records to be the training set; Use the remaining user records for training. STEP 0: Constructing the item-graph based on the training set; STEP 1: for each user in the training set randomly moving one item out of the user’s basket and make recommendation based on the remaining items in the basket; computing the order of this item in the recommendation list; updating the item-graph. STEP 2: Computing HR and ARHR metrics.

4. Preliminary Experimental Results-2 Performance of Top-N Recommendation Algorithms HR (left): x-axis: top-N items, y-axis: hit-rate of all users. ARHR (right): x-axis: top-N items, y-axis: average reciprocal hit-rate of all users. This is the preliminary result of cosine-based, CP-based, and GCP-based methods. The x-axis represents top-N, and y-axis represents the HR and ARHR value of the methods, where N varied from 10 to 100 in step 10. We can see that in both HR and ARHR measure, the GCP-based method outperforms other two methods.

5. Conclusion and Future Work Top-N Recommendation Problem and item-centric Algorithms Cosine-based, conditional probability-based Item-Graph model Visualizing the relationship among items. Easy to update. Generalized Conditional Probability-based top-N recommendation algorithm Item-centric & based on the Item-Graph model Future Work Clustering items and measuring item-item similarities based on the Item-Graph model Speeding up the GCP method. Here’s the conclusion. At first, I briefly introduced the top-N recommendation problem and two item-centric Algorithms. They are Cosine-based and conditional probability-based algorithms. After that, I presented the Item-Graph model and the GCP-based recommendation algorithm. The Item-Graph model is proposed to visualize the relationship of items. This model is easy to update. The Generalized Conditional Probability-based top-N method is Item-centric & based on the Item-Graph model. Our next work will focus on Clustering items and measuring item-item similarities based on the Item-Graph model, Improving the efficiency of the GCP method is another work to do in future.

References [Balabanovic97] M. Balabanovic and Y. Shoham. Fab: Content-based, Collaborative Recommendation. Commun. ACM, 40(3):66-72, 1997. [Breese98] J. S. Breese, D. Heckerman, David and C. Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 43-52, San Francisco, 1998. [Deshpande04] M. Deshpande and G. Karypis. Item-based Top-N Recommendation Algorithms. ACM Trans. Inf. Syst., 22(1):143-177, 2004. [Lin00] W. Lin. Association Rule Mining for Collaborative Recommender Systems. Thesis submitted for the Degree of M.S. in Computer Science. [Linden03] G. Linden, B. Smith and J. York. Amazon.com Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Computing, 7(1):76-80, 2003. [Resnick94] P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm and J. Riedl. GroupLens: An Open Architecture for Collaborative Filtering of Netnews. Proc. Computer Supported Cooperative Work Conf., pages 175-186, 1994. These are some useful references. Thank you!