Memory-Based Recommender Systems : A Comparative Study Aaron John Mani Srinivasan Ramani CSCI 572 PROJECT RECOMPARATOR.

Slides:

Advertisements

Similar presentations

Google News Personalization: Scalable Online Collaborative Filtering

Advertisements

Item Based Collaborative Filtering Recommendation Algorithms

Book Recommender System Guided By: Prof. Ellis Horowitz Kaijian Xu Group 3 Ameet Nanda Bhaskar Upadhyay Bhavana Parekh.

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Collaborative QoS Prediction in Cloud Computing Department of Computer Science & Engineering The Chinese University of Hong Kong Hong Kong, China Rocky.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.

1 RegionKNN: A Scalable Hybrid Collaborative Filtering Algorithm for Personalized Web Service Recommendation Xi Chen, Xudong Liu, Zicheng Huang, and Hailong.

Learning Objectives Explain similarities and differences among algorithms, programs, and heuristic solutions List the five essential properties of an algorithm.

Active Learning and Collaborative Filtering

An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.

The Wisdom of the Few A Collaborative Filtering Approach Based on Expert Opinions from the Web Xavier Amatriain Telefonica Research Nuria Oliver Telefonica.

Memory-Based Recommender Systems : A Comparative Study Aaron John Mani Srinivas Ramani CSCI 572 PROJECT RECOMPARATOR.

I NCREMENTAL S INGULAR V ALUE D ECOMPOSITION A LGORITHMS FOR H IGHLY S CALABLE R ECOMMENDER S YSTEMS (S ARWAR ET AL ) Presented by Sameer Saproo.

1 Collaborative Filtering and Pagerank in a Network Qiang Yang HKUST Thanks: Sonny Chee.

Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.

Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.

Recommender systems Ram Akella November 26 th 2008.

Item-based Collaborative Filtering Recommendation Algorithms

Performance of Recommender Algorithms on Top-N Recommendation Tasks

Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.

DEXA 2005 Quality-Aware Replication of Multimedia Data Yicheng Tu, Jingfeng Yan and Sunil Prabhakar Department of Computer Sciences, Purdue University.

Performance of Recommender Algorithms on Top-N Recommendation Tasks RecSys 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering.

Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.

MINING RELATED QUERIES FROM SEARCH ENGINE QUERY LOGS Xiaodong Shi and Christopher C. Yang Definitions: Query Record: A query record represents the submission.

Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.

Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.

Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif

Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.

1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)

EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Classical Music for Rock Fans?: Novel Recommendations for Expanding User Interests Makoto Nakatsuji, Yasuhiro Fujiwara, Akimichi Tanaka, Toshio Uchiyama,

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

Google News Personalization: Scalable Online Collaborative Filtering

Online Learning for Collaborative Filtering

RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.

Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John T. Riedl

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Temporal Diversity in Recommender Systems Neal Lathia, Stephen Hailes, Licia Capra, and Xavier Amatriain SIGIR 2010 April 6, 2011 Hyunwoo Kim.

A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.

A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.

EigenRank: A ranking oriented approach to collaborative filtering By Nathan N. Liu and Qiang Yang Presented by Zachary 1.

Chapter 6: Analyzing and Interpreting Quantitative Data

Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun

CS378 Final Project The Netflix Data Set Class Project Ideas and Guidelines.

ApproxHadoop Bringing Approximations to MapReduce Frameworks

Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.

Company LOGO MovieMiner A collaborative filtering system for predicting Netflix user’s movie ratings [ECS289G Data Mining] Team Spelunker: Justin Becker,

Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army.

Recommender Systems Based Rajaraman and Ullman: Mining Massive Data Sets & Francesco Ricci et al. Recommender Systems Handbook.

CS791 - Technologies of Google Spring A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.

Spring Staff Lecturer: Prof. Sara Cohen Graders: Igor Lifshits, Arbel Moshe 2.

The Wisdom of the Few Xavier Amatrian, Neal Lathis, Josep M. Pujol SIGIR’09 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!

Recommendation Systems ARGEDOR. Introduction Sample Data Tools Cases.

Slope One Predictors for Online Rating-Based Collaborative Filtering Daniel Lemire, Anna Maclachlan In SIAM Data Mining (SDM’05), Newport Beach, California,

Item-Based Collaborative Filtering Recommendation Algorithms

Chapter 14 – Association Rules and Collaborative Filtering © Galit Shmueli and Peter Bruce 2016 Data Mining for Business Analytics (3rd ed.) Shmueli, Bruce.

Collaborative Filtering With Decoupled Models for Preferences and Ratings Rong Jin 1, Luo Si 1, ChengXiang Zhai 2 and Jamie Callan 1 Language Technology.

Recommender Systems Session I

Application Level Fault Tolerance and Detection

CALIFORNIA STATE UNIVERSITY, SACRAMENTO

Collaborative Filtering Nearest Neighbor Approach

M.Sc. Project Doron Harlev Supervisor: Dr. Dana Ron

Google News Personalization: Scalable Online Collaborative Filtering

Movie Recommendation System

ITEM BASED COLLABORATIVE FILTERING RECOMMENDATION ALGORITHEMS

Presentation transcript:

Memory-Based Recommender Systems : A Comparative Study Aaron John Mani Srinivasan Ramani CSCI 572 PROJECT RECOMPARATOR

Problem definition This project is a comparative study of two movie recommendation systems based on collaborative filtering. User-User Rating vs Item-Item Rating Slope-One algorithm - Prediction engine. Pearson’s Correlation – Calculate similarity of users/items The aim of the experiment is to study the accuracy of the two algorithms when applied on the same dataset under similar conditions 2

S/W, Language used 3 S/W, LanguagePurpose NetFlixDataset Java/J2EEMain programming language HTML/JavaScriptFrond End/ GUI JavaScriptScraping/RegEx MySQLBack End Database JavaScripts for importing the dataset

Plan of Action 4 SNo #TaskResponsibilityCheckPoint (Week Ending) 1.Scripts to import DatasetAJ25 th March 2.Similarity RankingSR1 st April 3.Prediction EngineAJ1 st April 4.UI DesignAJ25 th March 5.Results FormSR8 th April 6.Graphs/Metrics Data PlotAJ, SR15 th April 7.NetFlix ScrapingSR8 th April 8.Unit/Incremental Testing, QCAJ, SR22 nd April

Sample Proposed Screenshot [Recommendation Page] 5

Actual Screenshot [Recommendation Page] 6

Actual Screenshot [Movie Search Page] 7

Sample graphs showing the data you will collect and how it will be presented. Mean Absolute Error (MAE) – Sample error difference of approx.100 Users. This is a standard metric which is essentially used to measure how much deviation a particular algorithm will show against original ratings. 8

Results - MAE Fifty movie combinations where chosen at random as the sample set for calculating the MAE. The results are presented in the form of a trend chart in the next slide. The Mean Absolute Error was calculated as the average of the difference between the actual and predicted ratings for both algorithms. Slope One user-user algorithm performs slightly better than the Slope One item-item algorithm. We presume that this is mostly because of the slice of the NetFlix dataset that was used in calculations and the fact that the available user data is far greater than the item data in the data slice. Consequently, user similarity generated predictions will be more focused as compared to item similarity generated predictions ALGORITHMMAE ITEM-ITEM0.9 USER-USER0.3

Trend Chart – Actual Ratings of a sample set versus Predicted Ratings

New User Problems Analysis Results New User Problem – Conduct a survey among 10 human testers to gauge how relevant the top n predictions are compared to the selected movie and rate their accuracy on a scale of These users will be new user rows in the User-Item Matrix with a single rating. The mean of this test data will provide a human perspective on the Precision of machine-generated suggestions for new users introduced into the system. Results - We found that for users with very few ratings, the predictions are mostly inaccurate in the case of the User-User algorithm. However, as the Item-Item algorithm provides reasonable predictions as expected. The following slide shows an example of a New User prediction. Note the difference between the User-User recommendations of a New User and the recommendations of a User who has rated a considerable amount of movies as shown in the previous snapshot. 11

Screenshot - New User Problem

Average Precision Analysis Results Average Precision Analysis – Create similar test conditions as before. Each human tester logs the relevancy of the top-n predictions of each algorithm to the selected movie. The average across each category of algorithms should provide some insight into the # of relevant predictions generated as compared to the total predictions generated. 13 Human UsersP User-User % P Item-Item % User 10.2 User User User User User User 70.0 User User User

Average Precision Analysis We conclude that the recommendations generated are not very precise as compared to the recommendations for popular websites like NetFlix and IMDB. This is directly related to a user’s perception of precision. This metric is not entirely reliable as we have taken a slice of the NetFlix dataset as our sample set of data to run tests on. There are bound to be anomalies in this result set due to the lack of similarity among certain items and users. Average Precision analysis cannot be used to make a judgment on the quality of the recommendations.

PROBLEMS FACED AND SOLUTION Loading the DataSet – The size of the dataset was too large to load into the database directly. Solution – The java code was optimized to handle memory overflow problems caused due to the enormous number of files in the dataset. Selecting a sizable dataset to work on – After loading the entire dataset, we found that the size was just too large to handle and queries were taking too much time to run. Solution – We reduced the size of the dataset and indexed the main tables. UI –User input led to various exceptions Solution – Created a popup which enables the user to select a movie from a list. Query Optimization – The underlying prediction engine initially returned extremely vague results. Solution – The logic at the back end was tweaked to improve the recommendation accuracy.

ACCOMPLISHMENTS AND CONCLUSION GOAL - We have successfully implemented and compared two forms of algorithms belonging to the Slope ONE family. We managed to optimize our system to a certain extent by moving the computation to the database instead of inside the server/servlet. RESULT - Our analysis shows that the User-User algorithm performs better than the Item-Item algorithm under the specific dataset. NEW USER PROBLEM - The Item-Item algorithm provides better recommendations when a new User is taken into account FUTURE WORK – Efficiency can be improved by optimizing the back end database. The presently existing static user selection can be made dynamic. A Distributed system can be considered in order to account for scalability of data.