Download presentation
Presentation is loading. Please wait.
Published byCharles Gallagher Modified over 9 years ago
1
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University
2
Matrix Decomposition Matrix D = m x n e.g., Ratings matrix with m customers, n items e.g., term-document matrix with m terms and n documents Typically D is sparse, e.g., less than 1% of entries have ratings n is large, e.g., 18000 movies (Netflix), millions of docs, etc. So finding matches to less popular items will be difficult Basic Idea: compress the columns (items) into a lower-dimensional representation Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 2
3
Singular Value Decomposition (SVD) where:rows of V t are eigenvectors of D t D = basis functions is diagonal, with ii = sqrt( i ) (ith eigenvalue) rows of U are coefficients for basis functions in V (here we assumed that m > n, and rank(D) = n) D = U V t m x n n x n Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 3
4
SVD Example Data D = 102010 252 8177 92010 122211 Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 4
5
SVD Example Data D = Note the pattern in the data above: the center column values are typically about twice the 1 st and 3 rd column values: So there is redundancy in the columns, i.e., the column values are correlated 102010 252 8177 92010 122211 Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 5
6
SVD Example Data D = D = U V t where U = 0.50 0.14 -0.19 0.12 -0.35 0.07 0.41 -0.54 0.66 0.49 -0.35 -0.67 0.56 0.66 0.27 where = 48.6 0 0 0 1.5 0 0 0 1.2 and V t = 0.41 0.82 0.40 0.73 -0.56 0.41 0.55 0.12 -0.82 102010 252 8177 92010 122211 Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 6
7
SVD Example Data D = D = U V t where U = 0.50 0.14 -0.19 0.12 -0.35 0.07 0.41 -0.54 0.66 0.49 -0.35 -0.67 0.56 0.66 0.27 where = 48.6 0 0 0 1.5 0 0 0 1.2 and V t = 0.41 0.82 0.40 0.73 -0.56 0.41 0.55 0.12 -0.82 102010 252 8177 92010 122211 Note that first singular value is much larger than the others Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 7
8
SVD Example Data D = D = U V t where U = 0.50 0.14 -0.19 0.12 -0.35 0.07 0.41 -0.54 0.66 0.49 -0.35 -0.67 0.56 0.66 0.27 where = 48.6 0 0 0 1.5 0 0 0 1.2 and V t = 0.41 0.82 0.40 0.73 -0.56 0.41 0.55 0.12 -0.82 102010 252 8177 92010 122211 Note that first singular value is much larger than the others First basis function (or eigenvector) carries most of the information and it “discovers” the pattern of column dependence Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 8
9
Rows in D = weighted sums of basis vectors 1 st row of D = [10 20 10] Since D = U S V, then D[0,: ] = U[0,: ] * * V t = [24.5 0.2 -0.22] * V t V t = 0.41 0.82 0.40 0.73 -0.56 0.41 0.55 0.12 -0.82 D[0,: ] = 24.5 v 1 + 0.2 v 2 + -0.22 v 3 where v 1, v 2, v 3 are rows of V t and are our basis vectors Thus, [24.5, 0.2, 0.22] are the weights that characterize row 1 in D In general, the ith row of U* is the set of weights for the ith row in D Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 9
10
Summary of SVD Representation D = U V t Data matrix: Rows = data vectors U* matrix: Rows = weights for the rows of D V t matrix: Rows = our basis functions Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 10
11
How do we compute U, , and V? SVD decomposition is a standard eigenvector/value problem The eigenvectors of D t D = the rows of V The eigenvectors of D D t = the columns of U The diagonal matrix elements in are square roots of the eigenvalues of D t D => finding U, ,V is equivalent to finding eigenvectors of D t D Solving eigenvalue problems is equivalent to solving a set of linear equations – time complexity is O(m n 2 + n 3 ) Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 11
12
Matrix Approximation with SVD D U V t ~ ~ m x n m x k k x k k x n where:columns of V are first k eigenvectors of D t D is diagonal with k largest eigenvalues rows of U are coefficients in reduced dimension V-space This approximation gives the best rank-k approximation to matrix D in a least squares sense (this is also known as principal components analysis) Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine 12
13
Collaborative Filtering & Matrix Factorization 134 355 455 3 3 222 5 211 3 3 1 17,700 movies 480,000 users The $1 Million Question
14
User-Based Collaborative Filtering 14 Item1Item 2Item 3Item 4Item 5Item 6Correlation with Alice Alice5233? User 12441 User 2213120.33 User 342321.90 User 4332310.19 User 53222 User 6531320.65 User 75151 Best match Prediction Using k-nearest neighbor with k = 1
15
Item-Based Collaborative Filtering 15 Item1Item 2Item 3Item 4Item 5Item 6 Alice5233? User 12441 User 221312 User 342321 User 433231 User 53222 User 653132 User 75151 Item similarity 0.760.790.600.710.75 Best match Prediction Item-Item similarities: usually computed using Cosine Similarity measure
16
Matrix Factorization of Ratings Data Based on the idea of Latent Factor Analysis Identify latent (unobserved) factors that “explain” observations in the data In this case, observations are user ratings of movies The factors may represent combinations of features or characteristics of movies and users that result in the ratings 16 R P Q m users n movies m users n movies f f ~ ~ x r ui p u q T i ~ ~
17
Matrix Factorization QkTQkT Dim1-0.44-0.570.060.380.57 Dim20.58-0.660.260.18-0.36 PkPk Dim1Dim2 Alice0.47-0.30 Bob -0.440.23 Mary0.70-0.06 Sue0.310.93 Prediction: Note: Can also do factorization via Singular Value Decomposition (SVD) SVD:
18
Lower Dimensional Feature Space Bob Mary Alice Sue
19
Learning the Factor Matrices Need to learn the user and item feature vectors from training data Approach: Minimize the errors on known ratings Typically, regularization terms, user and item bias parameters are added Done via Stochastic Gradient Descent or other optimization approaches
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.