Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Top-N Recommendation Algorithm Based on Item-Graph Allen, Zhenjiang LIN CSE, CUHK June 7, 2007.

Similar presentations


Presentation on theme: "1 Top-N Recommendation Algorithm Based on Item-Graph Allen, Zhenjiang LIN CSE, CUHK June 7, 2007."— Presentation transcript:

1 1 Top-N Recommendation Algorithm Based on Item-Graph Allen, Zhenjiang LIN CSE, CUHK June 7, 2007

2 2 Outline 1. Top-N Recommendation Problem 2. Top-N Recommendation Algorithm 3. Item-Graph Model and GCP-based Method  Item-Graph Model  Generalized Conditional Probability(GCP)-based Recommendation Algorithm 4. Preliminary Experimental Results 5. Conclusion and Future Work

3 3 1. Top-N Recommendation Problem The Top-N Recommendation Problem  Given the preference information of users, recommend a set of N items to a certain user that he might be interested in, based on the items he has selected. E-commerce system example: Amazon. COM, customers vs products. Item 1Item 2Item 3…Item m User 1 1010 User 2 1100 … User n 0101 New User 1?1?? Basket Active User User- Item matrix

4 4 Example: the Amazon.com Basket Active User Recommend ations

5 5 1. Top-N Recommendation Problem Challenges in E-commerce Systems  Huge amounts of data: millions of users and/or items;  Real-time return the results set;  Limited new user’s preference information;  Volatile users’ preference information. Contributions  Propose the Item-Graph model. simple & incremental to reflect the relationship among items  Develop the Generalized Conditional Probability-based top-N recommendation algorithm. item-centric based-on the Item-Graph model

6 6 Two main paradigms  Content-based: recommend items based on the content (textual information) of items. Fab system [Balabanovic97], Syskill & Webert system [Pazzani97].  Collaborative Filtering (CF): recommend items by collecting taste information from other users. Collaborative between users (link information). More popular than content-based recommendation, since in many domains (such as music, restaurants) it is hard to extract useful features from items. Tapestry system [Goldberg92], Video Recommender [Hill95], Ringo [Shardanand95], GroupLens [Konstan97], Jester system [Goldberg01], Amazon [Linden03]. 2. Top-N Recommendation Algorithm

7 7 CF algorithms classified by strategy of using data  Memory-based: make recommendations based on the entire collection of references of the users. No pre-computing is needed, suffer serious scalability problem. E.g., Correlation-based [Resnick94], Cosine-based [Breese98].  Model-based: use the collection of user preferences to learn a model, which is then used to make recommendations. Building a model off-line, more scalable. E.g., Cluster models [Ungar98], Bayesian network model [Breese98], Association Rule Mining approach [Lin00 ]. 2. Top-N Recommendation Algorithm

8 8 CF algorithms classified by strategy of using objects  User-centric: look for similar (like-minded) users first and then make recommendation. Similarity between users is relatively dynamic. Pre-computing user neighborhood may lead to poor predictions.  Item-centric: look for similar (or related) items first and then make recommendation. Similarity between items is relatively static. Enables pre-computing of item-item similarity. Therefore, more scalable. The aim of our work  Model-based Item-centric CF top-N recommendation algorithm. 2. Top-N Recommendation Algorithm

9 9 Notations  Item set I = { I 1, I 2, …, I m }.  User set U = {U 1, U 2, …, U n }.  User-Item matrix D = (D n,m ).  Basket of the active user B  I.  Similarity score of x and y: sim(x,y). Formal definition of top-N recommendation problem  Given a user-item matrix D and a set of items B that have been purchased by the active user, identify an ordered set of items X such that |X| ≤ N, and X ∩ B = 0. 2. Top-N Recommendation Algorithm

10 10 Two classical item-item similarity measures  Cosine-based (symmetric) sim( I i, I j ) = cos( D *,i, D *,j ) (1)  Conditional Probability(CP)-based (asymmetric) sim( I i, I j ) = P( I j | I i ) ≈ Freq( I i I j ) / Freq( I i ) (2) Freq(X): the number of customers who have purchased the item set X. The ranking score for item x RS(x) = ∑ b ∈ B sim(b,x) (3) 2. Top-N Recommendation Algorithm

11 11 3. Item-Graph Model & GCP-based Method Intuitions behind the Item-Graph  The similarity between two items is proportional to the times of co-purchase of them.  The similarity of item-pairs is transmissible.  E.g., Definition of the Item-Graph  Given a dataset D = (D n,m ), the Item-Graph is defined by a weighted & undirected graph G(V, E, W), where V is the item set I. An edge (x, y) ∈ E if and only if items x and y have been co- purchased. The weight of edge (x, y) is defined by the number of co- purchase of items x and y. abc 2 1

12 12 Updating the Item-Graph is easy  Adding new user’s preference information T into the graph needs O(|T| 2 ) operations, including adding edges and/or increasing weight of edges.  E.g., Potentially direct application of the Item-Graph  Clustering the items.  Measuring item-item similarity.  Measuring importance of items. 3. Item-Graph Model & GCP-based Method abc 2 1 (a,b,c) abc 3 2 1

13 13 Ideas in Generalized Conditional Probability-based method  According to the definition of top-N recommendation problem, for any x in I-B, we just need to compute the “basket-based” conditional probability P(x|B) = Freq(xB) / Freq(B). However, Freq(xB) or Freq(B) may not exist, or Freq(xB) or Freq(B) are too small to make much sense.  The CP-based method considers the sum of “1-item”-based conditional probabilities P(x|y) instead, where x ∈ I-B, y ∈ B.  However, the “multi-item”-based conditional probabilities may also contribute to the recommendation.  E.g., suppose the ranking scores of x and y computed by the CP-based method are equal, and we also know P(x|B)>P(y|B). Which one should be ranked higher, x or y? 3. Item-Graph Model & GCP-based Method

14 14 The Generalized Conditional Probability (GCP)-based recommendation algorithm  The ranking score of item x is defined by the sum of all possible “multi-item”-based conditional probabilities, that is, GCP(x|B) = ∑ S  B P(x|S) ≈ ∑ S  B (Freq(xS) / Freq(S)). (4)  However, the number of subsets of B is 2 |B|.  Use GCP d (x|B) instead (set d=2 in the following experiments) GCP d (x|B) = ∑ S  B, |S|≤ d P(x|S). (5)  Freq(xS) and Freq(S) can be extracted from the Item-Graph approximately. 3. Item-Graph Model & GCP-based Method

15 15 Extracting Freq(A) from Item-Graph approximately  For an item set A, obtaining the exact Freq(A) may not be possible from the Item-Graph.  Extracting approximate Freq(A) from the Item-Graph instead. Find out the complete sub-graph of A (denoted by CSG(A)) in the Item-Graph, running time O(|A| 2 ). Freq(A) ≈ minimal weight of edges in CSG(A). E.g.,  for A = {a,b}, Freq(A) ≈ 3.  for B = {a,b,c}, Freq(B) ≈ 1.  P(c|ab) ≈ Freq(abc) / Freq(ab) ≈ 1 / 3. 3. Item-Graph Model & GCP-based Method abc 3 2 1

16 16 4. Preliminary Experimental Results Dataset  The MovieLens (http://www.grouplens.org/data) A web-based movies recommender system; Contains multi-valued ratings that indicate how much each user liked a particular movie or not; Each user has rated at least 20 movies. We treat the ratings as an indication that the users have seen the movies (nonzero) or not (zero). # of Users# of ItemsDensity 1 Average Basket Size 94316826.31%106.04 Table 1: The characteristics of the MovieLens dataset 1 Density: the percentage of nonzero entries in the user-item matrix.

17 17 4. Preliminary Experimental Results-1 Evaluation Design  Split the dataset into a training and test set by randomly selecting one rated movie of each user to be part of the test set, use the remaining rated movies for training.  Cosine(COS)-based, CP-based, GCP-based methods, 10-runs average. Evaluation Metrics  Hit-Rate (HR) HR = # of hits / n (6)  Average Reciprocal Hit-Rate (ARHR) ARHR = ( ∑ i=1,h 1/p i ) / n (7) # of hits: the number of items in the test set that were also in the top-N lists. h is the number of hits that occurred at positions p 1, p 2, …, p h within the top-N lists (i.e., 1 ≤ p i ≤ N).

18 18 4. Preliminary Experimental Results-1 Performance of Top-N Recommendation Algorithms HR (left): x-axis: top-N items, y-axis: hit-rate of all users. ARHR (right): x-axis: top-N items, y-axis: average reciprocal hit-rate of all users. (For the GCP-based method, set d = 2.)

19 19 4. Preliminary Experimental Results-2 Testing the Parameter d in GCP Method  Testing the effect of d ( d = 1, 2, 3 ). Evaluation: Online Shopping Simulation  Randomly selecting part of the user records to be the training set;  Use the remaining user records for training.  STEP 0: Constructing the item-graph based on the training set;  STEP 1: for each user in the training set randomly moving one item out of the user’s basket and make recommendation based on the remaining items in the basket; computing the order of this item in the recommendation list; updating the item-graph.  STEP 2: Computing HR and ARHR metrics.

20 20 4. Preliminary Experimental Results-2 Performance of Top-N Recommendation Algorithms HR (left): x-axis: top-N items, y-axis: hit-rate of all users. ARHR (right): x-axis: top-N items, y-axis: average reciprocal hit-rate of all users.

21 21 5. Conclusion and Future Work Conclusion  Top-N Recommendation Problem and item-centric Algorithms Cosine-based, conditional probability-based  Item-Graph model Visualizing the relationship among items. Easy to update.  Generalized Conditional Probability-based top-N recommendation algorithm Item-centric & based on the Item-Graph model Future Work  Clustering items and measuring item-item similarities based on the Item- Graph model  Speeding up the GCP method.

22 22 References [Balabanovic97] M. Balabanovic and Y. Shoham. Fab: Content-based, Collaborative Recommendation. Commun. ACM, 40(3):66-72, 1997. [Breese98] J. S. Breese, D. Heckerman, David and C. Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 43-52, San Francisco, 1998. [Deshpande04] M. Deshpande and G. Karypis. Item-based Top-N Recommendation Algorithms. ACM Trans. Inf. Syst., 22(1):143-177, 2004. [Lin00] W. Lin. Association Rule Mining for Collaborative Recommender Systems. Thesis submitted for the Degree of M.S. in Computer Science. [Linden03] G. Linden, B. Smith and J. York. Amazon.com Recommendations: Item- to-Item Collaborative Filtering. IEEE Internet Computing, 7(1):76-80, 2003. [Resnick94] P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm and J. Riedl. GroupLens: An Open Architecture for Collaborative Filtering of Netnews. Proc. Computer Supported Cooperative Work Conf., pages 175-186, 1994.


Download ppt "1 Top-N Recommendation Algorithm Based on Item-Graph Allen, Zhenjiang LIN CSE, CUHK June 7, 2007."

Similar presentations


Ads by Google