Download presentation
1
Top-N Recommendation Algorithm Based on Item-Graph
Allen, Zhenjiang LIN CSE, CUHK Nov 13, 2007 Good afternoon, everyone. My topic today is ” Top-N Recommendation Algorithm Based on Item-Graph”.
2
Outline 1. Top-N Recommendation Problem
2. Top-N Recommendation Algorithm 3. Item-Graph Model and GCP-based Method Item-Graph Model Generalized Conditional Probability (GCP)-based Recommendation Algorithm 4. Preliminary Experimental Results 5. Conclusion and Future Work We focus on the “Top-N recommendation problem and algorithm” First, I’ll give some basic introduction on the background. After that, I’ll present the Item-Graph model and the Generalized Conditional Probability-based recommendation algorithm which is based on this model. Preliminary experimental results are given to show the performance of the algorithm. The last part is conclusion and future work.
3
1. Top-N Recommendation Problem
The Top-N Recommendation Problem Given the preference information of users, recommend a set of N items to a certain user that he might be interested in, based on the items he has selected. E-commerce system example: Amazon. COM, customers vs products. User-Item matrix Item 1 Item 2 Item 3 … Item m User 1 1 User 2 User n New User ? The Top-N recommendation problem is common in E-commerce systems such as the Amazon.com. Here’s the definition: … In the Amazon.com, users and items correspond to customers and products respectively. Usually, in an on-line shopping system, we have the transaction records of users which is represented by the User-Item matrix with n users and m items. This matrix can be binary or multi-valued. We only consider the binary case in this talk, where 1 indicates certain user bought certain item, 0 otherwise. When a new user (which is called the active user) is shopping on the system, recommending products to the user based on his cart or basket will be very useful. Apparently, this problem can be regarded as a prediction problem. That is, predicting how much the active user will like a certain item, based on his basket. Basket Active User
4
Example: the Amazon.com
Active User Basket Let’s see an example of Amazon.com online shopping system. Here’s the active user, that’s me. In my shopping cart, there’re two books. The Amazon system recommends a list of related books to me. Obviously, these recommendations have been ranked. This is a typical application of top-N recommendation algorithm. Recommendations
5
1. Top-N Recommendation Problem
Challenges in E-commerce Systems Huge amounts of data: millions of users and/or items; Real-time return the results set; Limited new user’s preference information; Volatile users’ preference information. There’re many challenges in E-commerce systems. First, the amounts of data is usually very huge. Second, the system should be real-time. The preference information of new user is usually very limited and users’ preference information may change frequently. These motivate our work. In this work, we first proposed the Item-Graph model which is simple and incremental to reflect the relationship among items. Second, we developed the so-called Generalized Conditional Probability-based top-N recommendation algorithm, which is item-centric and based on the Item-Graph model.
6
2. Top-N Recommendation Algorithm
Two major approaches Content-based: recommend items based on the content (textual information) of items. Fab system [Balabanovic97], Syskill & Webert system [Pazzani97]. Collaborative Filtering (CF): recommend items by collecting taste information from other users. Collaborative (correlation) information between users. More popular than content-based recommendation, since in many domains (such as music, restaurants) it is hard to extract useful features from items. Tapestry system [Goldberg92], Video Recommender [Hill95], Ringo [Shardanand95], GroupLens [Konstan97], Jester system [Goldberg01], Amazon [Linden03]. I’d like to briefly introduce some background of the recommender system first. There’re two major paradigms of the Top-N recommendation algorithms: Content-based and Collaborative Filtering. The content-based approaches recommend items based on their content. Usually the contents are textual information. These are two systems using this paradigm. Whereas the Collaborative Filtering approaches recommend items by collecting taste information from other users. That is, they use the collaborative information between users and/or items. These information is link-based actually. Many systems use the CF paradigm.
7
2. Top-N Recommendation Algorithm
CF algorithms classified by strategy of using data Memory-based: make recommendations based on the entire collection of references of the users. No pre-computing is needed, suffer serious scalability problem. E.g., Correlation-based [Resnick94], Cosine-based [Breese98]. Model-based: use the collection of user preferences to learn a model, which is then used to make recommendations. Building a model off-line, more scalable. E.g., Cluster models [Ungar98], Bayesian network model [Breese98], Association Rule Mining approach [Lin00]. The CF algorithms can be classified by strategy of using data. They are memory-based and model-based. The memory-based algorithms make recommendations based on the entire collection of references of the users. The advantage is that they do not need pre-computing, but these memory-based algorithms usually suffer serious scalability problem when the amounts of data grows very huge. The model-based algorithms use the collection of user preferences to learn a model, which is then used to make recommendations. They need to build a model off-line, but they are more scalable.
8
2. Top-N Recommendation Algorithm
CF algorithms classified by strategy of using objects User-centric: look for similar (like-minded) users first and then make recommendation. Similarity between users is relatively dynamic. Pre-computing user neighborhood may lead to poor predictions. Item-centric: look for similar (or related) items first and then make recommendation. Similarity between items is relatively static. Enables pre-computing of item-item similarity. More scalable. The CF algorithms can also be classified by strategy of using objects. The user-centric algorithms look for similar (like-minded) users first and then make recommendation. However, since the similarity between users is relatively dynamic, pre-computing user neighborhood may lead to poor predictions. The item-centric algorithms look for similar (or related) items first and then make recommendation. Because the similarity between items is relatively static, this enables pre-computing of item-item similarity. So they are usually more scalable. The aim of our work is to develop a Model-based and Item-centric CF top-N recommendation algorithm.
9
2. Top-N Recommendation Algorithm
Notations Item set I = {I1, I2, …, Im}. User set U = {U1, U2, …, Un}. User-Item (binary) matrix D = (Dn,m). Basket of the active user B I. Similarity score of x and y: sim(x,y). Formal definition of top-N recommendation problem Given a user-item matrix D and a set of items B that have been selected by the active user, identify an ordered set of items X, such that |X| ≤ N, and X ∩B = 0. Here’re some notations. The item set has m elements. The user set has n elements. The user-item matrix D is an n by m binary matrix. B denotes the basket of the active user. And the similarity score of x and y is denoted by sim(x,y). Here’s the formal definition of top-N recommendation problem: Given a user-item matrix D and a set of items B that have been purchased by the active user, identify an ordered set of items X such that |X| ≤ N, and X cap B = 0.
10
2. Top-N Recommendation Algorithm
Two classical item-item similarity measures Cosine-based (symmetric) sim(Ii, Ij) = cos(D*,i, D*,j) (1) Conditional Probability(CP)-based (asymmetric) sim(Ii, Ij) = P(Ij | Ii) ≈ Freq(Ii Ij) / Freq(Ii) (2) Freq(X): the number of users who have purchased the item set X. The ranking score for item x RS(x) = ∑ b∈B sim(b,x) (3) (the sum of similarity score between x and the items in the basket B) Now I introduce two classical item-centric algorithm, both of which are based on the similarity between items. In the Cosine-based algorithm, the similarity between two items i and j is the cosine of the ith and jth columns of the user-item matrix. In the CP-based algorithm, sim(i,j) is the conditional probability of the active user purchasing item i, given the condition that item j is in his basket. In Eq.(2), Freq(X) is the function that returns the number of customers who have purchased the item set X. Therefore, the ranking score for item x is defined by the sum of similarity score between x and the items in the basket.
11
4. Preliminary Experimental Results
Dataset The MovieLens ( A web-based movies recommender system; Contains multi-valued ratings that indicate how much each user liked a particular movie or not; Each user has rated at least 20 movies. We treat the ratings as an indication that the users have seen the movies (nonzero) or not (zero). Table 1: The characteristics of the MovieLens dataset The dataset we used in the experiments comes from MovieLens, which is a web-based movies recommender system. The MovieLens contains multi-value ratings that indicate how much each user liked a particular movie or not. But we treat the ratings as an indication that the users have seen the movies (nonzero) or not (zero). The characteristics of the MovieLens dataset we used is listed in table 1. The Density is the percentage of nonzero entries in the user-item matrix. # of Users # of Items Density1 Average Basket Size 943 1682 6.31% 106.04 1Density: the percentage of nonzero entries in the user-item matrix.
12
4. Preliminary Experimental Results-1
Evaluation Design Split the dataset into training and test sets by randomly selecting one rated movie of each user to be part of the test set, use the remaining rated movies for training. Cosine(COS)-based, CP-based, GCP-based methods, 10-runs average. Evaluation Metrics Hit-Rate (HR) HR = # of hits / n (6) Average Reciprocal Hit-Rate (ARHR) ARHR = (∑i=1,h1/pi) / n (7) # of hits: the number of items in the test set that were also in the top-N lists. h is the number of hits that occurred at positions p1, p2, … , ph within the top-N lists (i.e., 1 ≤ pi ≤ N). We split the dataset into a training and test set by randomly selecting one rated movie of each user to be part of the test set, and use the remaining rated movies for training. We tested cosine-based, CP-based, and GCP-based methods. Run 10 times and calculate the average. We adopted Hit-Rate and Average Reciprocal Hit-Rate as the metrics. # of hits is the number of items in the test set that were also in the top-N lists. So HR measures the percentage of the test items which is also in the top-N recommendation list. Further, ARHR also takes into account the position of the hits in the top-N recommendation list. In this metrics, h is the number of hits that occurred at positions p1, p2, … , ph within the top-N lists.
13
4. Preliminary Experimental Results-1
Performance of Top-N Recommendation Algorithms HR (left): x-axis: top-N items, y-axis: hit-rate of all users. ARHR (right): x-axis: top-N items, y-axis: average reciprocal hit-rate of all users. (For the GCP-based method, set d = 2.) This is the preliminary result of cosine-based, CP-based, and GCP-based methods. The x-axis represents top-N, and y-axis represents the HR and ARHR value of the methods, where N varied from 10 to 100 in step 10. We can see that in both HR and ARHR measure, the GCP-based method outperforms other two methods.
14
4. Preliminary Experimental Results-2
Testing the Parameter d in GCP Method Testing the effect of d ( d = 1, 2, 3 ). Evaluation: Online Shopping Simulation Randomly selecting part of the user records to be the training set; Use the remaining user records for training. STEP 0: Constructing the item-graph based on the training set; STEP 1: for each user in the training set randomly moving one item out of the user’s basket and make recommendation based on the remaining items in the basket; computing the order of this item in the recommendation list; updating the item-graph. STEP 2: Computing HR and ARHR metrics.
15
4. Preliminary Experimental Results-2
Performance of Top-N Recommendation Algorithms HR (left): x-axis: top-N items, y-axis: hit-rate of all users. ARHR (right): x-axis: top-N items, y-axis: average reciprocal hit-rate of all users. This is the preliminary result of cosine-based, CP-based, and GCP-based methods. The x-axis represents top-N, and y-axis represents the HR and ARHR value of the methods, where N varied from 10 to 100 in step 10. We can see that in both HR and ARHR measure, the GCP-based method outperforms other two methods.
16
5. Conclusion and Future Work
Top-N Recommendation Problem and item-centric Algorithms Cosine-based, conditional probability-based Item-Graph model Visualizing the relationship among items. Easy to update. Generalized Conditional Probability-based top-N recommendation algorithm Item-centric & based on the Item-Graph model Future Work Clustering items and measuring item-item similarities based on the Item-Graph model Speeding up the GCP method. Here’s the conclusion. At first, I briefly introduced the top-N recommendation problem and two item-centric Algorithms. They are Cosine-based and conditional probability-based algorithms. After that, I presented the Item-Graph model and the GCP-based recommendation algorithm. The Item-Graph model is proposed to visualize the relationship of items. This model is easy to update. The Generalized Conditional Probability-based top-N method is Item-centric & based on the Item-Graph model. Our next work will focus on Clustering items and measuring item-item similarities based on the Item-Graph model, Improving the efficiency of the GCP method is another work to do in future.
17
References [Balabanovic97] M. Balabanovic and Y. Shoham. Fab: Content-based, Collaborative Recommendation. Commun. ACM, 40(3):66-72, 1997. [Breese98] J. S. Breese, D. Heckerman, David and C. Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 43-52, San Francisco, 1998. [Deshpande04] M. Deshpande and G. Karypis. Item-based Top-N Recommendation Algorithms. ACM Trans. Inf. Syst., 22(1): , 2004. [Lin00] W. Lin. Association Rule Mining for Collaborative Recommender Systems. Thesis submitted for the Degree of M.S. in Computer Science. [Linden03] G. Linden, B. Smith and J. York. Amazon.com Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Computing, 7(1):76-80, 2003. [Resnick94] P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm and J. Riedl. GroupLens: An Open Architecture for Collaborative Filtering of Netnews. Proc. Computer Supported Cooperative Work Conf., pages , 1994. These are some useful references. Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.