A shot at Netflix Challenge Hybrid Recommendation System Priyank Chodisetti.

Slides:



Advertisements
Similar presentations
Item Based Collaborative Filtering Recommendation Algorithms
Advertisements

The Netflix Challenge Parallel Collaborative Filtering James Jolly Ben Murrell CS 387 Parallel Programming with MPI Dr. Fikret Ercal.
Dimensionality Reduction PCA -- SVD
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Latent Semantic Analysis
Using a Trust Network To Improve Top-N Recommendation
I NCREMENTAL S INGULAR V ALUE D ECOMPOSITION A LGORITHMS FOR H IGHLY S CALABLE R ECOMMENDER S YSTEMS (S ARWAR ET AL ) Presented by Sameer Saproo.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Lecture 19 Quadratic Shapes and Symmetric Positive Definite Matrices Shang-Hua Teng.
1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University
Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Sparsity, Scalability and Distribution in Recommender Systems
Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
CS 277: Data Mining Recommender Systems
Chapter 12 (Section 12.4) : Recommender Systems Second edition of the book, coming soon.
Item-based Collaborative Filtering Recommendation Algorithms
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.
Collaborative Filtering Recommendation Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
LOGO Recommendation Algorithms Lecturer: Dr. Bo Yuan
Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
1 Comparison of Principal Component Analysis and Random Projection in Text Mining Steve Vincent April 29, 2004 INFS 795 Dr. Domeniconi.
Indices Tomasz Bartoszewski. Inverted Index Search Construction Compression.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
June 5, 2006University of Trento1 Latent Semantic Indexing for the Routing Problem Doctorate course “Web Information Retrieval” PhD Student Irina Veredina.
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John T. Riedl
The Effect of Dimensionality Reduction in Recommendation Systems
Artificial Intelligence with Web Applications Dell Zhang Birkbeck, University of London 2010/11.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.
A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.
Progress Report (Concept Extraction) Presented by: Mohsen Kamyar.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Techniques for Collaboration in Text Filtering 1 Ian Soboroff Department of Computer Science and Electrical Engineering University of Maryland, Baltimore.
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
Collaborative Filtering Zaffar Ahmed
The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Google News Personalization Big Data reading group November 12, 2007 Presented by Babu Pillai.
FISM: Factored Item Similarity Models for Top-N Recommender Systems
Matrix Factorization and its applications By Zachary 16 th Nov, 2010.
ITCS 6265 Information Retrieval & Web Mining Lecture 16 Latent semantic indexing Thanks to Thomas Hofmann for some slides.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 8. Text Clustering.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets 
Item-Based Collaborative Filtering Recommendation Algorithms
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Document Clustering Based on Non-negative Matrix Factorization
MATRIX FACTORIZATION TECHNIQUES FOR RECOMMENDER SYSTEMS
Web Data Integration Using Approximate String Join
Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.
LSI, SVD and Data Management
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
RECOMMENDER SYSTEMS WITH SOCIAL REGULARIZATION
Lecture 15: Least Square Regression Metric Embeddings
Recommendation Systems
Latent semantic space: Iterative scaling improves precision of inter-document similarity measurement Rie Kubota Ando. Latent semantic space: Iterative.
Latent Semantic Analysis
Presentation transcript:

A shot at Netflix Challenge Hybrid Recommendation System Priyank Chodisetti

Problem and Approach A data set of 240,000 users and their ratings for movies is provided. Given a user ‘p’ and movie ‘m’, we should predict how much the ‘p’ will rate the movie ‘m’ My Idea: Take entirely two different approaches and merge those results. Applied Latent Semantic Analysis and Collaborative filtering techniques on the dataset independently Through LSI, Mapped the dataset to lower dimensional space and tried to extract relation between different movies Through Collaborative filtering, tried to find the user tastes by comparing with other similar users Major Problems: –Computationally large, for example one soultion of mine ran for 14 hrs with most diappointing results –Matrix is Sparse for almost ~99 and hence ~99% missing values

Handling Major Problems Generally missing values are handled by taking the average rating given by the user or overall average rating of all users. But I believe that, $1,000,000 winner will be the one who handles the missing values well. Adopted method described in [2] which aptly fits in the current situtation. LSI: – Apply SVD on the Matrix, retain the first ‘k’ higher singular values. It gives us the space in ‘k’ dimensions or best ‘k’ rank approximation –But to How Many Dimensions?? Experiment –To make a prediction for person p's rating for movie m, we would take the mth row of U, matrix multiply with S, and matrix multiply that with the pth column of V(t) Collaborative Filtering: –Find the kNN and come out with predicted rating. –If we consider Euclidean distance as distance measure, we have dimensions. So consider Pearson Co-efficient

Implementation Mixing LSI and Collaborative Filtering –Find kNN in reduced dimension space, and consider euclidean distance as the distance measure. Used SVDLIBC which used Lanczo method for Singular Value Decomposition Computational Challenges: –All the files in the training set are converted into one single larget file, so as to reduce disk access and increase the response time –Converted the whole data into sparse text format –Also generated a large data set, which gives in terms of user: movie, his rating format in contrast to given movie: user, his rating format –Using C++ Future Extensions this Winter –Plans to implement General Hebbian Algorithm, so as to reduce the computation time and will be easier to handle missing values. –Interested and motivated friends can join me this winter

References 1.M Brand. Fast Online SVD revisions for lightweight recommender systems. In Proc. SIAM International Conference on Data Mining M. W. Berry. Incremental Singular Value Decomposition of uncertain data. In Proceedings, European conference on the SIGIR. ACM B. Sarwar, G. Karypis, J.Konstan, and J.Riedi. Application of Dimensionality Reduction in recommender System - a case study. In ACM WebKDD Workshop, 2000