Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Eigen Decomposition and Singular Value Decomposition
3D Geometry for Computer Graphics
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Clustering Basic Concepts and Algorithms
Tensors and Component Analysis Musawir Ali. Tensor: Generalization of an n-dimensional array Vector: order-1 tensor Matrix: order-2 tensor Order-3 tensor.
Dimensionality Reduction PCA -- SVD
PCA + SVD.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Intro to RecSys and CCF Brian Ackerman 1. Roadmap Introduction to Recommender Systems & Collaborative Filtering Collaborative Competitive Filtering 2.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Some useful linear algebra. Linearly independent vectors span(V): span of vector space V is all linear combinations of vectors v i, i.e.
Canonical correlations
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
SVD and PCA COS 323. Dimensionality Reduction Map points in high-dimensional space to lower number of dimensionsMap points in high-dimensional space to.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Lecture 20 SVD and Its Applications Shang-Hua Teng.
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
Dirac Notation and Spectral decomposition
Boot Camp in Linear Algebra Joel Barajas Karla L Caballero University of California Silicon Valley Center October 8th, 2008.
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
CS 277: Data Mining Recommender Systems
Chapter 12 (Section 12.4) : Recommender Systems Second edition of the book, coming soon.
Summarized by Soo-Jin Kim
Chapter 2 Dimensionality Reduction. Linear Methods
Matrix Factorization Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Computing Eigen Information for Small Matrices The eigen equation can be rearranged as follows: Ax = x  Ax = I n x  Ax - I n x = 0  (A - I n )x = 0.
Text Categorization Moshe Koppel Lecture 12:Latent Semantic Indexing Adapted from slides by Prabhaker Raghavan, Chris Manning and TK Prasad.
SINGULAR VALUE DECOMPOSITION (SVD)
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Eigenvalues The eigenvalue problem is to determine the nontrivial solutions of the equation Ax= x where A is an n-by-n matrix, x is a length n column.
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
Review of Matrix Operations Vector: a sequence of elements (the order is important) e.g., x = (2, 1) denotes a vector length = sqrt(2*2+1*1) orientation.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Matrix Factorization and its applications By Zachary 16 th Nov, 2010.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.
Instructor: Mircea Nicolescu Lecture 8 CS 485 / 685 Computer Vision.
Optimization Indiana University July Geoffrey Fox
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Chapter 13 Discrete Image Transforms
Data Mining Lectures Lecture 7: Regression Padhraic Smyth, UC Irvine ICS 278: Data Mining Lecture 7: Regression Algorithms Padhraic Smyth Department of.
Reduced echelon form Matrix equations Null space Range Determinant Invertibility Similar matrices Eigenvalues Eigenvectors Diagonabilty Power.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Statistical Interpretation of Least Squares ASEN.
Characteristic Polynomial Hung-yi Lee. Outline Last lecture: Given eigenvalues, we know how to find eigenvectors or eigenspaces Check eigenvalues This.
Matrix Factorization and Collaborative Filtering
Singular Value Decomposition
Object Modeling with Layers
Step-By-Step Instructions for Miniproject 2
Advanced Artificial Intelligence
Principal Component Analysis
Matrix Factorization & Singular Value Decomposition
Outline Singular Value Decomposition Example of PCA: Eigenfaces.
Symmetric Matrices and Quadratic Forms
Lecture 13: Singular Value Decomposition (SVD)
Restructuring Sparse High Dimensional Data for Effective Retrieval
Presentation transcript:

Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University

Matrix Decomposition  Matrix D = m x n  e.g., Ratings matrix with m customers, n items  e.g., term-document matrix with m terms and n documents  Typically  D is sparse, e.g., less than 1% of entries have ratings  n is large, e.g., movies (Netflix), millions of docs, etc.  So finding matches to less popular items will be difficult  Basic Idea:  compress the columns (items) into a lower-dimensional representation Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 2

Singular Value Decomposition (SVD) where:rows of V t are eigenvectors of D t D = basis functions  is diagonal, with  ii = sqrt( i ) (ith eigenvalue) rows of U are coefficients for basis functions in V (here we assumed that m > n, and rank(D) = n) D = U  V t m x n n x n Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 3

SVD Example  Data D = Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 4

SVD Example  Data D = Note the pattern in the data above: the center column values are typically about twice the 1 st and 3 rd column values:  So there is redundancy in the columns, i.e., the column values are correlated Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 5

SVD Example  Data D = D = U  V t where U = where  = and V t = Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 6

SVD Example  Data D = D = U  V t where U = where  = and V t = Note that first singular value is much larger than the others Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 7

SVD Example  Data D = D = U  V t where U = where  = and V t = Note that first singular value is much larger than the others First basis function (or eigenvector) carries most of the information and it “discovers” the pattern of column dependence Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 8

Rows in D = weighted sums of basis vectors 1 st row of D = [ ] Since D = U S V, then D[0,: ] = U[0,: ] *  * V t = [ ] * V t V t =  D[0,: ] = 24.5 v v v 3 where v 1, v 2, v 3 are rows of V t and are our basis vectors Thus, [24.5, 0.2, 0.22] are the weights that characterize row 1 in D In general, the ith row of U*   is the set of weights for the ith row in D Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 9

Summary of SVD Representation D = U  V t Data matrix: Rows = data vectors U*  matrix: Rows = weights for the rows of D V t matrix: Rows = our basis functions Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 10

How do we compute U, , and V?  SVD decomposition is a standard eigenvector/value problem  The eigenvectors of D t D = the rows of V  The eigenvectors of D D t = the columns of U  The diagonal matrix elements in  are square roots of the eigenvalues of D t D => finding U, ,V is equivalent to finding eigenvectors of D t D  Solving eigenvalue problems is equivalent to solving a set of linear equations – time complexity is O(m n 2 + n 3 ) Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 11

Matrix Approximation with SVD D U  V t ~ ~ m x n m x k k x k k x n where:columns of V are first k eigenvectors of D t D  is diagonal with k largest eigenvalues rows of U are coefficients in reduced dimension V-space This approximation gives the best rank-k approximation to matrix D in a least squares sense (this is also known as principal components analysis) Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 12

Collaborative Filtering & Matrix Factorization ,700 movies 480,000 users The $1 Million Question

User-Based Collaborative Filtering 14 Item1Item 2Item 3Item 4Item 5Item 6Correlation with Alice Alice5233? User User User User User User User Best match Prediction  Using k-nearest neighbor with k = 1

Item-Based Collaborative Filtering 15 Item1Item 2Item 3Item 4Item 5Item 6 Alice5233? User User User User User User User Item similarity Best match Prediction  Item-Item similarities: usually computed using Cosine Similarity measure

Matrix Factorization of Ratings Data  Based on the idea of Latent Factor Analysis  Identify latent (unobserved) factors that “explain” observations in the data  In this case, observations are user ratings of movies  The factors may represent combinations of features or characteristics of movies and users that result in the ratings 16 R P Q m users n movies m users n movies f f ~ ~ x r ui p u q T i ~ ~

Matrix Factorization QkTQkT Dim Dim PkPk Dim1Dim2 Alice Bob Mary Sue Prediction: Note: Can also do factorization via Singular Value Decomposition (SVD) SVD:

Lower Dimensional Feature Space Bob Mary Alice Sue

Learning the Factor Matrices  Need to learn the user and item feature vectors from training data  Approach: Minimize the errors on known ratings  Typically, regularization terms, user and item bias parameters are added  Done via Stochastic Gradient Descent or other optimization approaches