Page 1 A Random Walk Method for Alleviating the Sparsity Problem in Collaborative Filtering Hilmi Yıldırım and Mukkai S. Krishnamoorthy Rensselaer Polytechnic Institute Computer Science Department Troy, New York 2011 Spring Seminar Presented by Sangkeun Lee
Page 2 Collaborative Filtering(CF) –Recommendation method rely on the past behavior (ratings, purchase history, time spent) of the users. –User-based CF (finding similar users) vs. Item-based CF (finding similar items)? Item-oriented collaborative filtering methods came into prominence –as they are more scalable compared to user- oriented methods (similarity between items is more stable than between users) –Prevents User Cold-Start problem(New User problem) In this paper, the authors propose a novel item-oriented algorithm –Based on finite numbers of Random Walk on item similarity graph –especially useful when training data is less than plentiful –enhance similarity matrices under sparse data I NTRODUCTION
Page 3 H ISTORICAL R EVIEW – U SER - BASED C F Aggregation function: often weighted sum Weight depends on similarity Neighbours are people who have similar tastes as active user Reference Lecture Slide from ‘
Page 4 H ISTORICAL R EVIEW – I TEM - BASED C F Item Aggregation function: often weighted sum Weight depends on similarity Item 5 Item 4 Item 2 Item 1 Item 2Item 3Item 4Item 5Item 6Item 7Item 8Item 9 User User User User … User m53214?
Page 5 PageRank & A New Approach
Page 6 The Model
Page 7 ③ The Model ① ② ④ i 노드에서 j 노드로 넘어갈 확률값을 가지는 행렬 P 구성 K step 에 유저 u 가 아이템 j 에 있을 확 률 계산 종합하여 유저 u 가 아이템 j 에 있을 확 률 계산 최종 아이템의 랭 크는 단순 행렬 곱 으로 표현됨 Note that various similarity measures can be used Similar Item? Or uniform distribution? Scale Rank to Ratings
Page 8 More about the model Cosine Similarity Adjusted Cosine Similarity Computing Similarities Interpreting Rank Scores Basically the score is for top-K Recommendation But for Rating Prediction, authors linearly scaled up each row of values such that the maximum of each row corresponds to 5. I doubt it! Computational Cost computing similarity matrix is O(m^2n) vector-matrix multiplication which has complexity O (m^2) I doubt it! too 역행렬 계 산 cost 고려하지 않음
Page 9 Experiments MovieLens –This data set contains 1,000,209 ratings of 6040 anonymous MovieLens users on 3952 movies
Page 10 Discussion Summary –Presented and experimentally evaluated a model-based item-oriented collaborative filtering algorithm. –outperforms a slightly modified version of item based top-N algorithm in all test cases since top- N is a special case of Random Walk Recommender. –better than top-N algorithm especially when training data is sparse. –For extremely sparse data sets optimal α values approaches 1 whereas it approaches to 0 as data gets denser. –Random Walk Recommender captures some transitive associations between items. Questions? –Few doubts in Paper – Time Complexity, Linear scaling up? –Interesting application of random walk for recommendation –RWWR vs. finite steps of randomwalk