Google News Personalization Scalable Online Collaborative Filtering

Slides:



Advertisements
Similar presentations
1 11. Hash Tables Heejin Park College of Information and Communications Hanyang University.
Advertisements

There is a pattern for factoring trinomials of this form, when c
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Introduction to Rails.
Performance in Decentralized Filesharing Networks Theodore Hong Freenet Project.
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
0 - 0.
1 Access Control. 2 Objects and Subjects A multi-user distributed computer system offers access to objects such as resources (memory, printers), data.
Recommender Systems & Collaborative Filtering
School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:
Object Recognition Using Locality-Sensitive Hashing of Shape Contexts Andrea Frome, Jitendra Malik Presented by Ilias Apostolopoulos.
George Anadiotis, Spyros Kotoulas and Ronny Siebes VU University Amsterdam.
Context-Sensitive Query Auto-Completion AUTHORS:NAAMA KRAUS AND ZIV BAR-YOSSEF DATE OF PUBLICATION:NOVEMBER 2010 SPEAKER:RISHU GUPTA 1.
Google News Personalization: Scalable Online Collaborative Filtering
Algorithms of Google News An Analysis of Google News Personalization Scalable Online Collaborative Filtering 1.
Information Retrieval in Practice
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Modeling Grid Job Time Properties Lovro.
David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Megastore: Providing Scalable, Highly Available Storage for Interactive Services. Presented by: Hanan Hamdan Supervised by: Dr. Amer Badarneh 1.
Location-Based Social Networks Yu Zheng and Xing Xie Microsoft Research Asia Chapter 8 and 9 of the book Computing with Spatial Trajectories.
Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.
Addition 1’s to 20.
Psychology Practical (Year 2) PS2001 Correlation and other topics.
Music Recommendation by Unified Hypergraph: Music Recommendation by Unified Hypergraph: Combining Social Media Information and Music Content Jiajun Bu,
Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Prediction Modeling for Personalization & Recommender Systems Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Overview of this week Debugging tips for ML algorithms
BY ANISH D. SARMA, XIN DONG, ALON HALEVY, PROCEEDINGS OF SIGMOD'08, VANCOUVER, BRITISH COLUMBIA, CANADA, JUNE 2008 Bootstrapping Pay-As-You-Go Data Integration.
Introduction to Information Retrieval
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
A P2P REcommender system based on Gossip Overlays (PREGO) ‏ R.Baraglia, P.Dazzi M.Mordacchini, L.Ricci A P2P REcommender system based on Gossip Overlays.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Recommender Systems Aalap Kohojkar Yang Liu Zhan Shi March 31, 2008.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Probability based Recommendation System Course : ECE541 Chetan Tonde Vrajesh Vyas Ashwin Revo Under the guidance of Prof. R. D. Yates.
1 Collaborative Filtering and Pagerank in a Network Qiang Yang HKUST Thanks: Sonny Chee.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
7/14/2015EECS 584, Fall MapReduce: Simplied Data Processing on Large Clusters Yunxing Dai, Huan Feng.
1 Collaborative Filtering: Latent Variable Model LIU Tengfei Computer Science and Engineering Department April 13, 2011.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Hash Table March COP 3502, UCF.
Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Compact Data Structures and Applications Gil Einziger and Roy Friedman Technion, Haifa.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Google News Personalization: Scalable Online Collaborative Filtering
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
Reporter : Yu Shing Li 1.  Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Google News Personalization Big Data reading group November 12, 2007 Presented by Babu Pillai.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Netflix Challenge: Combined Collaborative Filtering Greg Nelson Alan Sheinberg.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army.
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
Presented By: Madiha Saleem Sunniya Rizvi.  Collaborative filtering is a technique used by recommender systems to combine different users' opinions and.
Big Data Infrastructure Week 9: Data Mining (4/4) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.
Methods and Metrics for Cold-Start Recommendations
Google News Personalization: Scalable Online Collaborative Filtering
Presentation transcript:

Google News Personalization Scalable Online Collaborative Filtering Abhinandan Das - abhinandan@google.com Mayur Datar - mayur@google.com Ashutosh Garg - ashutosh@google.com Shyam Rajaram - rajaram1@ifp.uiuc.edu Presented by: Aniket Zamwar - zamwar@usc.edu

Already Studied in Class Map Reduce Collaborative Filtering Content Based Recommendation Clustering Techniques - Pros/Cons

Problem Statement Scale of operation is very huge - order of several million news stories dynamically changing at high rate. Presented with the click history for N users ( U = { u1,u2,...,uN} ) over M items ( S = {s1,s2,...,sM} ), and given a specific user ‘u’ with click history set Cu consisting of stories { si1 , . . . , si|Cu | }, recommend K stories to the user. Strict timing constraints for recommendation engine to generate recommendations. 4/18/13 Google News Personalization

Approaches Collaborative Clustering Probabilistic Latent Semantic Indexing Covisitation Counts 4/18/13 Google News Personalization

Problem Setting Record User Queries and Clicks Recommendations of News using user click history and click history of the community 4/18/13 Google News Personalization

Recommender System Content based Systems Collaborative Filtering Systems Memory-based Algorithms Prediction calculated as weighted average of the ratings given by other users weight is proportional to to “similarity” between users. Model-based Algorithms Model the users based on their past ratings and use these models to predict ratings of unseen items. Mix of memory based + model based systems 4/18/13 Google News Personalization

Algorithms Model based approach Clustering Techniques: Probabilistic Latent Semantic Indexing(PLSI) and min hash Memory based approach Item Covisitation 4/18/13 Google News Personalization

Min Hashing Probabilistic clustering technique - assigns pairs of users to same cluster with probability proportional to overlab between the set of items the users have voted for. Similarity calculated using Jaccard Coefficient To Do: Given user u-i, compute similarity S(u-i, u-j) for all users u-j, and recommend stories to u-i voted by u-j with weight equal to S(u-i, u-j) Issues: Real time not scalable, using hash table to find vote for specific user is also not feasible, offline computation is also not feasible Locality Sensitive Hashing (LSH) comes for rescue 4/18/13 Google News Personalization

Locality Sensitive Hashing Key Idea: Hash data points using several hash functions, such that for each hash function the probability of collision is much higher for objects which are close to each other. Min-hashing technique is used to randomly permute the set of items (S) and for each user u-i compute its hash value h(u-i) as the index of first item under the permutation that belongs to user’s item set Cu-i. Min-hashing = probabilistic clustering where each hash bucket corresponds to a cluster, that puts two users together in the same cluster with probability equal to item set similarity S(u-i, u-j) 4/18/13 Google News Personalization

PLSI Probabilistic Latent Semantic models Models users and items as random variables - relationship between users and items is learned by modeling joint distribution of users and items as mixed distribution A hidden variable Z is used to define the relationship, it represents user communities and item communities. 4/18/13 Google News Personalization

Covisitation Covisitation is defined as event in which two stories are clicked by same user within a certain time interval. A graph whose nodes represent items and weighted edges represent time discounted number of covisitation instances. For each user click the adjacency list representing graph is updated: for entry for each item in user history, new entry corresponding to clicked item is added if not there; if it is already there then the age discounted count is updated. 4/18/13 Google News Personalization

Covisitation based Recommendation Fetch user u-i’s recent click history - limited to past few hours or days. For each item s-i in click history of user, lookup the entry for pair (s-i, s) in adjacency list for s-i stored in Big Table. The value stored in entry normalized by sum of all entries for s-i is stored to recommendation score. Recommendation score is normalized to a value between 0 and 1 by linear scaling. 4/18/13 Google News Personalization

System Setup Three Components: Offline component to cluster users based on click history Online servers: Updating story and user statistics each time user clicks on news story Generating news story when user requests Two Data Tables User Table (UT) indexed by user-id, stores user click history and clustering information. Story Table (ST) indexed by story-id, stores real time click counts for every story- story and story-cluster pair. 4/18/13 Google News Personalization

Clusters + Click History System Components NSS Min Hashing PLSI Clusters + Click History b u f e r UserID + Clicked Story UT UserId + Clicked Story User Clusters user click NFE Offline Log Analysis User Click Histories UserID + Candidate Stories Clusters + Click History view personalized news page request Update Statistics ST NPS Ranked Stories Fetch Statistics c a h e NFE: News Front End NSS: News Statistics Server NPS: News Personalization Server UT: User Table ST: Story Table 4/18/13 Google News Personalization

Pros Scalable collaboration of Content based and Collaborative clustering Recommendation system using Min Hashing and PLSI Algorithms Scaling the algorithms by using Map Reduce and Big Table representation for data Using click events as vote for news Dynamically providing the latest likely news that suits the interest of the user 4/18/13 Google News Personalization

Cons Depends a lot on User Clicks User Clicks considered as positive vote Does not say anything about negative vote 4/18/13 Google News Personalization