1/11 Tea Talk: Weighted Low Rank Approximations Ben Marlin Machine Learning Group Department of Computer Science University of Toronto April 30, 2003
2/11 Paper Details: Authors: Nathan Srebro, Tommi Jaakkola (MIT) Title: Weighted Low Rank Approximations URL: Submitted: ICML 2003
3/11 Motivation: Missing Data: Weighted LRA naturally handles data matrices with missing elements by using a 0/1 weight matrix. Noisy Data: Weighted LRA naturally handles data matrices with different noise variance estimates σ ij for each of the elements of the matrix by setting W ij = 1/σ ij.
4/11 The Problem: Given an nxm data matrix D and an nxm weight matrix W, construct a rank-K approximation X=UV’ to D that minimizes error in the weighted Froebenius norm E WF. DU V’ = mm n W m X mK K
5/11 Relationship to standard SVD: Critical points of E WF can be local minima that are not global minima. wSVD does not admit a solution based on eigenvectors of the data matrix D. Adding the requirement that U and V are orthogonal results in a weighted low rank approximation analogous to SVD.
6/11 Optimization Approach: Main Idea: For a given V the optimal U v * can be calculated analytically, as can the gradient of the projected objective function E* WF (V)= E* WF (U v *, V). Thus, perform gradient descent on E* WF (V). Where d(W i ) is the mxm matrix with the i th row of W along the diagonal and D i is the i th row of D.
7/11 Missing Value Approach: Main Idea: Consider a model of the data matrix given by D=X+Z where Z is white Gaussian noise. The weighted cost of X is equivalent to the log-likelihood of the observed variables. This suggests an EM approach where in the E step the missing values in D are filled in according to the values in X creating a matrix F. In the M step X is re-estimated as the rank-K SVD of F.
8/11 Missing Value Approach: Extension to General Weights: Consider a system with several data matrices D n =X+Z n where the Z n are independent gaussian white noise. The maximum likelihood X in this case is found by taking the rank-K SVD of the mean of the F n ’s. Now consider a weighted rank-K approximation problem where W ij = w ij /N and w ij ={1,…,N}. Such a problem can be converted to the type of problem described above by observing D ij in w ij of a total of N D n ’s. For any N the mean of the N matrices F n is given by:
9/11 Missing Value Approach: EM Algorithm: This approach yields an extremely simple EM- Algorithm: E-Step: M-Step: Obtain U,V from SVD of F Set X t+1 = UV’ function X=wsvd(D,W,K) X=zeros(size(D)); Xold=inf*ones(size(D)); C=inf; while(sum(sum((X-Xold).^2))>eps) Xold=X; [U,S,V]=svd(W.*D+(1-W).*X); S(K+1:end,K+1:end)=0; X=U*S*V'; end
10/11 Example: DataWeights wSVD K= = Synthetic Rank 2 Matrix:
11/11 The End