Download presentation
Presentation is loading. Please wait.
Published byNoreen Evans Modified over 8 years ago
1
1 Top-N Recommendation Algorithm Based on Item-Graph Allen, Zhenjiang LIN CSE, CUHK June 7, 2007
2
2 Outline 1. Top-N Recommendation Problem 2. Top-N Recommendation Algorithm 3. Item-Graph Model and GCP-based Method Item-Graph Model Generalized Conditional Probability(GCP)-based Recommendation Algorithm 4. Preliminary Experimental Results 5. Conclusion and Future Work
3
3 1. Top-N Recommendation Problem The Top-N Recommendation Problem Given the preference information of users, recommend a set of N items to a certain user that he might be interested in, based on the items he has selected. E-commerce system example: Amazon. COM, customers vs products. Item 1Item 2Item 3…Item m User 1 1010 User 2 1100 … User n 0101 New User 1?1?? Basket Active User User- Item matrix
4
4 Example: the Amazon.com Basket Active User Recommend ations
5
5 1. Top-N Recommendation Problem Challenges in E-commerce Systems Huge amounts of data: millions of users and/or items; Real-time return the results set; Limited new user’s preference information; Volatile users’ preference information. Contributions Propose the Item-Graph model. simple & incremental to reflect the relationship among items Develop the Generalized Conditional Probability-based top-N recommendation algorithm. item-centric based-on the Item-Graph model
6
6 Two main paradigms Content-based: recommend items based on the content (textual information) of items. Fab system [Balabanovic97], Syskill & Webert system [Pazzani97]. Collaborative Filtering (CF): recommend items by collecting taste information from other users. Collaborative between users (link information). More popular than content-based recommendation, since in many domains (such as music, restaurants) it is hard to extract useful features from items. Tapestry system [Goldberg92], Video Recommender [Hill95], Ringo [Shardanand95], GroupLens [Konstan97], Jester system [Goldberg01], Amazon [Linden03]. 2. Top-N Recommendation Algorithm
7
7 CF algorithms classified by strategy of using data Memory-based: make recommendations based on the entire collection of references of the users. No pre-computing is needed, suffer serious scalability problem. E.g., Correlation-based [Resnick94], Cosine-based [Breese98]. Model-based: use the collection of user preferences to learn a model, which is then used to make recommendations. Building a model off-line, more scalable. E.g., Cluster models [Ungar98], Bayesian network model [Breese98], Association Rule Mining approach [Lin00 ]. 2. Top-N Recommendation Algorithm
8
8 CF algorithms classified by strategy of using objects User-centric: look for similar (like-minded) users first and then make recommendation. Similarity between users is relatively dynamic. Pre-computing user neighborhood may lead to poor predictions. Item-centric: look for similar (or related) items first and then make recommendation. Similarity between items is relatively static. Enables pre-computing of item-item similarity. Therefore, more scalable. The aim of our work Model-based Item-centric CF top-N recommendation algorithm. 2. Top-N Recommendation Algorithm
9
9 Notations Item set I = { I 1, I 2, …, I m }. User set U = {U 1, U 2, …, U n }. User-Item matrix D = (D n,m ). Basket of the active user B I. Similarity score of x and y: sim(x,y). Formal definition of top-N recommendation problem Given a user-item matrix D and a set of items B that have been purchased by the active user, identify an ordered set of items X such that |X| ≤ N, and X ∩ B = 0. 2. Top-N Recommendation Algorithm
10
10 Two classical item-item similarity measures Cosine-based (symmetric) sim( I i, I j ) = cos( D *,i, D *,j ) (1) Conditional Probability(CP)-based (asymmetric) sim( I i, I j ) = P( I j | I i ) ≈ Freq( I i I j ) / Freq( I i ) (2) Freq(X): the number of customers who have purchased the item set X. The ranking score for item x RS(x) = ∑ b ∈ B sim(b,x) (3) 2. Top-N Recommendation Algorithm
11
11 3. Item-Graph Model & GCP-based Method Intuitions behind the Item-Graph The similarity between two items is proportional to the times of co-purchase of them. The similarity of item-pairs is transmissible. E.g., Definition of the Item-Graph Given a dataset D = (D n,m ), the Item-Graph is defined by a weighted & undirected graph G(V, E, W), where V is the item set I. An edge (x, y) ∈ E if and only if items x and y have been co- purchased. The weight of edge (x, y) is defined by the number of co- purchase of items x and y. abc 2 1
12
12 Updating the Item-Graph is easy Adding new user’s preference information T into the graph needs O(|T| 2 ) operations, including adding edges and/or increasing weight of edges. E.g., Potentially direct application of the Item-Graph Clustering the items. Measuring item-item similarity. Measuring importance of items. 3. Item-Graph Model & GCP-based Method abc 2 1 (a,b,c) abc 3 2 1
13
13 Ideas in Generalized Conditional Probability-based method According to the definition of top-N recommendation problem, for any x in I-B, we just need to compute the “basket-based” conditional probability P(x|B) = Freq(xB) / Freq(B). However, Freq(xB) or Freq(B) may not exist, or Freq(xB) or Freq(B) are too small to make much sense. The CP-based method considers the sum of “1-item”-based conditional probabilities P(x|y) instead, where x ∈ I-B, y ∈ B. However, the “multi-item”-based conditional probabilities may also contribute to the recommendation. E.g., suppose the ranking scores of x and y computed by the CP-based method are equal, and we also know P(x|B)>P(y|B). Which one should be ranked higher, x or y? 3. Item-Graph Model & GCP-based Method
14
14 The Generalized Conditional Probability (GCP)-based recommendation algorithm The ranking score of item x is defined by the sum of all possible “multi-item”-based conditional probabilities, that is, GCP(x|B) = ∑ S B P(x|S) ≈ ∑ S B (Freq(xS) / Freq(S)). (4) However, the number of subsets of B is 2 |B|. Use GCP d (x|B) instead (set d=2 in the following experiments) GCP d (x|B) = ∑ S B, |S|≤ d P(x|S). (5) Freq(xS) and Freq(S) can be extracted from the Item-Graph approximately. 3. Item-Graph Model & GCP-based Method
15
15 Extracting Freq(A) from Item-Graph approximately For an item set A, obtaining the exact Freq(A) may not be possible from the Item-Graph. Extracting approximate Freq(A) from the Item-Graph instead. Find out the complete sub-graph of A (denoted by CSG(A)) in the Item-Graph, running time O(|A| 2 ). Freq(A) ≈ minimal weight of edges in CSG(A). E.g., for A = {a,b}, Freq(A) ≈ 3. for B = {a,b,c}, Freq(B) ≈ 1. P(c|ab) ≈ Freq(abc) / Freq(ab) ≈ 1 / 3. 3. Item-Graph Model & GCP-based Method abc 3 2 1
16
16 4. Preliminary Experimental Results Dataset The MovieLens (http://www.grouplens.org/data) A web-based movies recommender system; Contains multi-valued ratings that indicate how much each user liked a particular movie or not; Each user has rated at least 20 movies. We treat the ratings as an indication that the users have seen the movies (nonzero) or not (zero). # of Users# of ItemsDensity 1 Average Basket Size 94316826.31%106.04 Table 1: The characteristics of the MovieLens dataset 1 Density: the percentage of nonzero entries in the user-item matrix.
17
17 4. Preliminary Experimental Results-1 Evaluation Design Split the dataset into a training and test set by randomly selecting one rated movie of each user to be part of the test set, use the remaining rated movies for training. Cosine(COS)-based, CP-based, GCP-based methods, 10-runs average. Evaluation Metrics Hit-Rate (HR) HR = # of hits / n (6) Average Reciprocal Hit-Rate (ARHR) ARHR = ( ∑ i=1,h 1/p i ) / n (7) # of hits: the number of items in the test set that were also in the top-N lists. h is the number of hits that occurred at positions p 1, p 2, …, p h within the top-N lists (i.e., 1 ≤ p i ≤ N).
18
18 4. Preliminary Experimental Results-1 Performance of Top-N Recommendation Algorithms HR (left): x-axis: top-N items, y-axis: hit-rate of all users. ARHR (right): x-axis: top-N items, y-axis: average reciprocal hit-rate of all users. (For the GCP-based method, set d = 2.)
19
19 4. Preliminary Experimental Results-2 Testing the Parameter d in GCP Method Testing the effect of d ( d = 1, 2, 3 ). Evaluation: Online Shopping Simulation Randomly selecting part of the user records to be the training set; Use the remaining user records for training. STEP 0: Constructing the item-graph based on the training set; STEP 1: for each user in the training set randomly moving one item out of the user’s basket and make recommendation based on the remaining items in the basket; computing the order of this item in the recommendation list; updating the item-graph. STEP 2: Computing HR and ARHR metrics.
20
20 4. Preliminary Experimental Results-2 Performance of Top-N Recommendation Algorithms HR (left): x-axis: top-N items, y-axis: hit-rate of all users. ARHR (right): x-axis: top-N items, y-axis: average reciprocal hit-rate of all users.
21
21 5. Conclusion and Future Work Conclusion Top-N Recommendation Problem and item-centric Algorithms Cosine-based, conditional probability-based Item-Graph model Visualizing the relationship among items. Easy to update. Generalized Conditional Probability-based top-N recommendation algorithm Item-centric & based on the Item-Graph model Future Work Clustering items and measuring item-item similarities based on the Item- Graph model Speeding up the GCP method.
22
22 References [Balabanovic97] M. Balabanovic and Y. Shoham. Fab: Content-based, Collaborative Recommendation. Commun. ACM, 40(3):66-72, 1997. [Breese98] J. S. Breese, D. Heckerman, David and C. Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 43-52, San Francisco, 1998. [Deshpande04] M. Deshpande and G. Karypis. Item-based Top-N Recommendation Algorithms. ACM Trans. Inf. Syst., 22(1):143-177, 2004. [Lin00] W. Lin. Association Rule Mining for Collaborative Recommender Systems. Thesis submitted for the Degree of M.S. in Computer Science. [Linden03] G. Linden, B. Smith and J. York. Amazon.com Recommendations: Item- to-Item Collaborative Filtering. IEEE Internet Computing, 7(1):76-80, 2003. [Resnick94] P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm and J. Riedl. GroupLens: An Open Architecture for Collaborative Filtering of Netnews. Proc. Computer Supported Cooperative Work Conf., pages 175-186, 1994.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.