I NCREMENTAL S INGULAR V ALUE D ECOMPOSITION A LGORITHMS FOR H IGHLY S CALABLE R ECOMMENDER S YSTEMS (S ARWAR ET AL ) Presented by Sameer Saproo.

Slides:



Advertisements
Similar presentations
Item Based Collaborative Filtering Recommendation Algorithms
Advertisements

Dimensionality Reduction PCA -- SVD
CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.
A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University.
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Hinrich Schütze and Christina Lioma
Memory-Based Recommender Systems : A Comparative Study Aaron John Mani Srinivasan Ramani CSCI 572 PROJECT RECOMPARATOR.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
EigenTaste: A Constant Time Collaborative Filtering Algorithm Ken Goldberg Students: Theresa Roeder, Dhruv Gupta, Chris Perkins Industrial Engineering.
Principal Component Analysis
A shot at Netflix Challenge Hybrid Recommendation System Priyank Chodisetti.
Latent Semantic Indexing via a Semi-discrete Matrix Decomposition.
Lecture 19 Quadratic Shapes and Symmetric Positive Definite Matrices Shang-Hua Teng.
1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing 1.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
Sparsity, Scalability and Distribution in Recommender Systems
A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University.
Lecture 20 SVD and Its Applications Shang-Hua Teng.
Analysis of Recommendation Algorithms for E-Commerce Badrul M. Sarwar, George Karypis*, Joseph A. Konstan, and John T. Riedl GroupLens Research/*Army HPCRC.
1 Collaborative Filtering: Latent Variable Model LIU Tengfei Computer Science and Engineering Department April 13, 2011.
CS276A Text Retrieval and Mining Lecture 15 Thanks to Thomas Hoffman, Brown University for sharing many of these slides.
CS 277: Data Mining Recommender Systems
Chapter 12 (Section 12.4) : Recommender Systems Second edition of the book, coming soon.
Item-based Collaborative Filtering Recommendation Algorithms
Performance of Recommender Algorithms on Top-N Recommendation Tasks
Cao et al. ICML 2010 Presented by Danushka Bollegala.
CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.
Performance of Recommender Algorithms on Top-N Recommendation Tasks RecSys 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering.
Item Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karpis, Joseph KonStan, John Riedl (UMN) p.s.: slides adapted from:
Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar.
Matrix Factorization and Latent Semantic Indexing 1 Lecture 13: Matrix Factorization and Latent Semantic Indexing Web Search and Mining.
Introduction to Information Retrieval Lecture 19 LSI Thanks to Thomas Hofmann for some slides.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
1 Social Networks and Collaborative Filtering Qiang Yang HKUST Thanks: Sonny Chee.
Authors: Rosario Sotomayor, Joe Carthy and John Dunnion Speaker: Rosario Sotomayor Intelligent Information Retrieval Group (IIRG) UCD School of Computer.
Text Categorization Moshe Koppel Lecture 12:Latent Semantic Indexing Adapted from slides by Prabhaker Raghavan, Chris Manning and TK Prasad.
Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John T. Riedl
Latent Semantic Indexing
The Effect of Dimensionality Reduction in Recommendation Systems
Progress Report (Concept Extraction) Presented by: Mohsen Kamyar.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
Google News Personalization Big Data reading group November 12, 2007 Presented by Babu Pillai.
Recommendation Algorithms for E-Commerce. Introduction Millions of products are sold over the web. Choosing among so many options is proving challenging.
ITCS 6265 Information Retrieval & Web Mining Lecture 16 Latent semantic indexing Thanks to Thomas Hofmann for some slides.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Web Search and Data Mining Lecture 4 Adapted from Manning, Raghavan and Schuetze.
DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.
D YNA MM O : M INING AND S UMMARIZATION OF C OEVOLVING S EQUENCES WITH M ISSING V ALUES Lei Li joint work with Christos Faloutsos, James McCann, Nancy.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Document Clustering Based on Non-negative Matrix Factorization
School of Computer Science & Engineering
Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.
Parallelism in High-Performance Computing Applications
Asymmetric Transitivity Preserving Graph Embedding
Lecture 13: Singular Value Decomposition (SVD)
Restructuring Sparse High Dimensional Data for Effective Retrieval
Presentation transcript:

I NCREMENTAL S INGULAR V ALUE D ECOMPOSITION A LGORITHMS FOR H IGHLY S CALABLE R ECOMMENDER S YSTEMS (S ARWAR ET AL ) Presented by Sameer Saproo

R ECOMMENDER S YSTEMS Apply Knowledge Discovery in Databases (KDD) to make personalized product recommendations during live customer interaction Offline Vs Online Not Google!

CF- BASED R ECOMMENDER SYSTEMS Suggest new products or suggest utility of a certain product for a particular customer, based on customer’s previous liking and the opinions of other like-minded customers MatrixPiAI Alice53x Bobx35 Carol5xx

C HALLENGES Quality of Recommendation (Q) Scalability of CF Algorithms (S) SVD based Latent Semantic Indexing presents an approach to CF based recommendations, but stumbles in Scalability The paper produces an algorithm for improving scalability for SVD based CF by sacrificing accuracy a little.

I N N UTSHELL Problem The matrix factorization step in SVD is computationally very expensive Solution Have a small pre-computed SVD model, and build upon this model incrementally using inexpensive techniques

S INGULAR V ALUE DECOMPOSITION Matrix factorization technique for producing low- rank approximations ←n → ↑m↓↑m↓ ↑m↓↑m↓ ←m → A V SU ↑n↓↑n↓ ↑m↓↑m↓ ←n → =

L OW R ANK A PPROXIMATION (USV T ) U and V are orthogonal matrices and S is a diagonal matrix S has r non-zero entries for a rank r matrix A. Diagonal Entries (S 1, S 2, S 3, S 4,…, S r ) have the property that S 1 ≥ S 2 ≥ S 3 ≥ S ≥ … ≥ S r SVD provides best low-rank linear approximation of the original matrix A i.e. if

C ONTD. A low-rank approximation of the original space is better than the original space as small singular values which introduce noise in customer-product matrix are filtered out. SVD produces uncorrelated eigenvectors, and each customer/product is represented by its own eigenvector. This dimensionality reduction helps customers with similar taste to be mapped into space represented by same eigenvectors.

P REDICTION G ENERATION Formally,

C HALLENGES OF D IMENTIONALITY REDUCTION Offline Step Also known as Model Building User-user similarity computation and neighborhood formation i.e. SVD decomposition Time consuming and infrequent O(m 3 ) for m x n matrix A Online Step Also known as Execution step Actual prediction generation O(1)

I NCREMENTAL SVD A LGORITHMS Borrowed from the LSI world to handle dynamic databases Projection of additional users provides good approximation to the complete model Authors build a suitably sized model first and then use projections to incrementally build on that Errors induced as the space is not orthogonal

F OLDING -I N As depicted in the paperFound in Reference [1]

E XPERIMENTAL E VALUATION Dataset : About 100,000 ratings User – Movie matrix : 943 users and 1682 movies Training – Test ratio : 80% Evaluation Metric Mean Absolute Error (MAE) = is a ratings – prediction pair

M ODEL S IZE Optimal reduced Rank k=14 was found empirically (943 – Model size) is projected using folding-in

R ESULTS QualityPerformance For Model size of 600, quality loss was 1.22% whereas performance increase was 81.63%