Google News Personalization: Scalable Online Collaborative Filtering

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

Google News Personalization: Scalable Online Collaborative Filtering
Google News Personalization Scalable Online Collaborative Filtering
Algorithms of Google News An Analysis of Google News Personalization Scalable Online Collaborative Filtering 1.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Item Based Collaborative Filtering Recommendation Algorithms
Suleyman Cetintas 1, Monica Rogati 2, Luo Si 1, Yi Fang 1 Identifying Similar People in Professional Social Networks with Discriminative Probabilistic.
Introduction to Information Retrieval
Collaborative QoS Prediction in Cloud Computing Department of Computer Science & Engineering The Chinese University of Hong Kong Hong Kong, China Rocky.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
Oct 14, 2014 Lirong Xia Recommender systems acknowledgment: Li Zhang, UCSC.
Recommender Systems Aalap Kohojkar Yang Liu Zhan Shi March 31, 2008.
Rubi’s Motivation for CF  Find a PhD problem  Find “real life” PhD problem  Find an interesting PhD problem  Make Money!
Memory-Based Recommender Systems : A Comparative Study Aaron John Mani Srinivasan Ramani CSCI 572 PROJECT RECOMPARATOR.
Probability based Recommendation System Course : ECE541 Chetan Tonde Vrajesh Vyas Ashwin Revo Under the guidance of Prof. R. D. Yates.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Recommender systems Ram Akella November 26 th 2008.
1 Collaborative Filtering: Latent Variable Model LIU Tengfei Computer Science and Engineering Department April 13, 2011.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Google News Personalization: Scalable Online Collaborative Filtering
Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]
Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John T. Riedl
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
EigenRank: A ranking oriented approach to collaborative filtering By Nathan N. Liu and Qiang Yang Presented by Zachary 1.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
Google News Personalization Big Data reading group November 12, 2007 Presented by Babu Pillai.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Author(s): Rahul Sami, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Noncommercial.
Book web site:
Announcements Paper presentation Project meet with me ASAP
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Queensland University of Technology
Data Mining: Concepts and Techniques
Recommender Systems & Collaborative Filtering
Recommending Forum Posts to Designated Experts
WSRec: A Collaborative Filtering Based Web Service Recommender System
Methods and Metrics for Cold-Start Recommendations
Machine Learning With Python Sreejith.S Jaganadh.G.
Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie1, Shenglin Zhao2, Zibin Zheng3, Jieming Zhu2 and Michael.
Author(s): Rahul Sami, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Noncommercial.
Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.
Recommender Systems.
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Collaborative Filtering Nearest Neighbor Approach
M.Sc. Project Doron Harlev Supervisor: Dr. Dana Ron
Q4 : How does Netflix recommend movies?
Movie Recommendation System
Probabilistic Latent Preference Analysis
Collaborative Filtering Non-negative Matrix Factorization
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Recommendation Systems
Recommender Systems Group 6 Javier Velasco Anusha Sama
WSExpress: A QoS-Aware Search Engine for Web Services
GhostLink: Latent Network Inference for Influence-aware Recommendation
Presentation transcript:

Google News Personalization: Scalable Online Collaborative Filtering Abhinandan Das, Mayur Datar, Ashutosh Garg, Shyam Rajaram Google Inc, University of Illinois at Urbana Paper Review By Archana Bhattarai Introduction to Data Mining Google News Personalization: Scalable Online Collaborative Filtering

Google News Personalization: Scalable Online Collaborative Filtering Outline Background Introduction Motivation Method System Algorithms Result Conclusion Google News Personalization: Scalable Online Collaborative Filtering

Google News Personalization: Scalable Online Collaborative Filtering Paper: Introduction As the topic suggests, this paper talks about a special case of a “Recommender System” specific to Google News scenario for generating personalized recommendations for users of Google News. The basic research problem that is addressed by this paper is the challenge of matching the right content to the right user. Based on user profile, the system recommends top K stories that user might be interested in. Google News Personalization: Scalable Online Collaborative Filtering

Google News Personalization: Scalable Online Collaborative Filtering Background Information overflow with the advent of technologies like Internet. People are drowning in data pool without getting right information they want. Challenge: To find right information. Right Information: Something that will answer users’ query. Something that user would love to read, listen or see. Solution: Search Engines Solve the first requirement What if user does not know what to look for ? Google News Personalization: Scalable Online Collaborative Filtering

Introduction: Collaborative Filtering It is a technology that aims to learn user preferences and make recommendations based on user and community data. Example: Amazon: User’s past shopping history is used to make recommendations for new products. Netflix, movie recommender Recommendations for clubs, cosmetics, travel locations. Personalized Google News Google News Personalization: Scalable Online Collaborative Filtering

Google News Personalization: Scalable Online Collaborative Filtering Motivation Google News is visited by several millions in a period of few days. There are lots of articles being created each day. Scalability is a big issue for such personalized system. Moreover, since it is a news based system, the items cannot be static as the articles are changing very fast. Existing recommender system thus unsuitable for such need. Need for a novel scalable algorithm. Google News Personalization: Scalable Online Collaborative Filtering

Google News Personalization: Scalable Online Collaborative Filtering Google News System Google news will record the search queries and clicks on news stories. Makes previously read articles easily accessible. Recommends top stories based on past click history. Recommendations based on: Click history. Click history of the community. User’s click on an article is treated as positive vote. Could be noisy No negative votes Google News Personalization: Scalable Online Collaborative Filtering

Google News Personalization: Scalable Online Collaborative Filtering Problem statement Given a click history of N users, U = {u1, u2, u3, u4, u5………….’ uN } And M items S = {s1, s2, ………….’ sM } User u with click history set Cu consisting of stories {si1, si2, ………….’ sCu } System is to recommend K stories that user might be interested in. Incorporate user feedback instantly. Google News Personalization: Scalable Online Collaborative Filtering

Related Work :Architectures and algorithm Algorithms Memory-based algorithms Predictions made based on past ratings of the user. Weighted average of ratings given by other users Weight is the similarity of users ( Pearson correlation coefficient, cosine similarity) Model-based algorithms A model of the user developed based on their past ratings. Use the models to predict unseen items.(Bayesian, clustering etc) Google News Personalization: Scalable Online Collaborative Filtering

Google News Personalization: Scalable Online Collaborative Filtering Proposed System Mixture of Model based algorithms Probabilistic Latent Semantic Indexing MinHash Memory based algorithms Item co-visitation The scores given by each algorithm is combined as ∑wa rs where wa is the weight given to algorithm ‘a’ and rs is its rank. Google News Personalization: Scalable Online Collaborative Filtering

Google News Personalization: Scalable Online Collaborative Filtering Algorithms MinHash A probabilistic clustering method that assigns a pair of users to the same cluster with probability proportional to the overlap between the set of items that these users have voted for. User U is represented by a set of items that she has clicked, Cu. The similarity between their item-sets is given be : S(ui, uj) = | Cui, ∩ Cuj | (Jaccard Coeffient) | Cui U Cuj | Similarity of a user with all other users can be calculated. Not scalable in real time Google News Personalization: Scalable Online Collaborative Filtering

Google News Personalization: Scalable Online Collaborative Filtering MinHash: Example User u1 clicks on the items: S1, S2, S5, S6, S9 Similarly, user u2 clicks on the items: S1, S2, S3, S4, S5 Jaccard Coefficient : 3/7 S1, S2, S5 S3, S4 S6, S9 User:U2 User: U1 Google News Personalization: Scalable Online Collaborative Filtering

Google News Personalization: Scalable Online Collaborative Filtering Algorithms Min-Hashing Each hash bucket corresponds to a cluster, that puts two users together in the same cluster with probability equal to their item-set overlap similarity S( ui, uj ). Randomly permute a set of items(S) and for each user uu, compute its hash value h(u) as the index of the first item under the permutation that belongs to the user’s item set Cu For a random permutation, chosen uniformly over the set of all permutations over S, the probability of two users having same hash value is Jaccard coefficient. MapReduce is used for MinHash clustering over large clusters of machines. MapReduce is a simple model of computation over large clusters of machines. Google News Personalization: Scalable Online Collaborative Filtering

Google News Personalization: Scalable Online Collaborative Filtering Algorithms Probabilistic Latent Semantic Indexing[PLSI] With users U and items S, the relationship between users and items is learned by modeling the joint distribution of users and items as a mixture distribution. A hidden variable Z is introduced to capture this relationship, which can be thought of as representing user communities(like minded users) and item communities(like items) Mathematically, P(s/u) = ∑ Lz=1 p(z/u) p(s/z) like users like items The conditional probabilities p(z/u) and p(s/z) are learned from the training data using Expectation maximization algorithm. Google News Personalization: Scalable Online Collaborative Filtering

Google News Personalization: Scalable Online Collaborative Filtering PLSI: Concept User/ News S1 S2 S3 S4 S5 S6 U1 C11 C12 C13 C14 C15 C16 U2 C21 C22 C23 C24 C25 C26 U3 C31 C32 C33 C34 C35 C36 S1 … ….. …. S2 S3 S4 S5 S6 Decompose Matrix as, C = UZS New term ‘Z’ is introduced. Matrix decomposed using Singular Value decomposition User/ News U1 .. … U2 U3 *Z* is a diagonal matrix Google News Personalization: Scalable Online Collaborative Filtering

Google News Personalization: Scalable Online Collaborative Filtering Algorithms Co-visitation An event in which two stories are clicked by the same user within a certain time interval. For a user u, covisitation based recommendation score is generated for a candidate item s For every item si in the user’s click history, a lookup for the entry pair si, s is gotten. The value stored in the entry is added and then normalized by the sum of all entries for si. S1 S2 Sn ……………………………. Time period Google News Personalization: Scalable Online Collaborative Filtering

Google News Personalization: Scalable Online Collaborative Filtering Data stored User Table: Cluster information (MinHash and PLSI) Click history Story Table: Cluster Statistics: How many times was the story S clicked on by users from each cluster C. Co-visitation: How many times was story S co-visited with each story S’ Google News Personalization: Scalable Online Collaborative Filtering

Google News Personalization: Scalable Online Collaborative Filtering System Components NFE: News Front End NPS: News Personalization Server NSS: News Statistics Server UT: User Table ST: Story Table Google News Personalization: Scalable Online Collaborative Filtering

Google News Personalization: Scalable Online Collaborative Filtering Evaluation Results Google News Personalization: Scalable Online Collaborative Filtering

Conclusion and Future Work Algorithms for scalable real time recommendation engines presented. Presented a novel approach to cluster dynamic datasets using MinHash and PLSI. Scalability and quality have a tradeoff. The system is content independent and thus easily extendible to other domains. As a future work, suitable algorithm can be explored to determine how to combine scores from different algorithms. Google News Personalization: Scalable Online Collaborative Filtering

Google News Personalization: Scalable Online Collaborative Filtering Analysis The paper has successfully addressed the problem of scalability for large recommender systems. It has only looked at the content independent features of articles. Thus the content dependent features are out of scope for the paper. Evaluation based on content could be an open research problem. It can be argued that instead of only considering user click for clustering similar users, content based clustering of the stories could open up more similarity metrics for the recommendation system. The precision lies around 30% for the current system showing that more study needs to be done in the field. Google News Personalization: Scalable Online Collaborative Filtering

Thank You!!! Any questions ? Google News Personalization: Scalable Online Collaborative Filtering