Recitation: SVD and dimensionality reduction Zhenzhen Kou Thursday, April 21, 2005
SVD Intuition: find the axis that shows the greatest variation, and project all points into this axis f2 e1 e2 f1
SVD: Mathematical Background The reconstructed matrix Xk = Uk.Sk.Vk’ is the closest rank-k matrix to the original matrix R. Xk = X m X n U m X r S r X r V’ r X n Uk m X k Vk’ k X n xSk k X k
SVD: The mathematical formulation Let X be the M x N matrix of M N-dimensional points SVD decomposition X= U x S x VT U(M x M) U is orthogonal: UTU = I columns of U are the orthogonal eigenvectors of XXT called the left singular vectors of X V(N x N) V is orthogonal: VTV = I columns of V are the orthogonal eigenvectors of XTX called the right singular vectors of X S(M x N) diagonal matrix consisting of r non-zero values in descending order square root of the eigenvalues of XXT (or XTX) r is the rank of the symmetric matrices called the singular values
SVD - Interpretation
SVD - Interpretation X = U S VT - example: = x v1
variance (‘spread’) on the v1 axis SVD - Interpretation X = U S VT - example: variance (‘spread’) on the v1 axis x x =
SVD - Interpretation X = U S VT - example: U L gives the coordinates of the points in the projection axis x x =
Dimensionality reduction set the smallest eigenvalues to zero: x x =
Dimensionality reduction x x ~
Dimensionality reduction x x ~
Dimensionality reduction x x ~
Dimensionality reduction ~
Dimensionality reduction Equivalent: ‘spectral decomposition’ of the matrix: x x =
Dimensionality reduction Equivalent: ‘spectral decomposition’ of the matrix: l1 x x = u1 u2 l2 v1 v2
Dimensionality reduction ‘spectral decomposition’ of the matrix: m r terms = u1 l1 vT1 + u2 l2 vT2 +... n n x 1 1 x m
Dimensionality reduction approximation / dim. reduction: by keeping the first few terms (Q: how many?) m = u1 l1 vT1 + u2 l2 vT2 +... n assume: l1 >= l2 >= ...
Dimensionality reduction A heuristic: keep 80-90% of ‘energy’ (= sum of squares of li ’s) m = u1 l1 vT1 + u2 l2 vT2 +... n assume: l1 >= l2 >= ...
Another example-Eigenface The PCA problem in HW5 Face data X Eigenvectors associated with the first few large eigenvalues of XXT have face-like images
Dimensionality reduction Matrix V in the SVD decomposition (X = USVT ) is used to transform the data. XV (= US) defines the transformed dataset. For a new data element x, xV defines the transformed data. Keeping the first k (k < n) dimensions, amounts to keeping only the first k columns of V.
Principal Components Analysis (PCA) Transfer the dataset to the center by subtracting the means: let matrix X be the result. Compute the matrix XTX. The covariance matrix except for constants. Project the dataset along a subset of the eigenvectors of XTX. Matrix V in the SVD decomposition (X= U S VT ) contains the eigenvectors of XTX. Also known as K-L transform.