Dimensionality Reduction
High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs – Surveys – Netflix: 480k users x 177k movies Slides by Jure Leskovec2
Dimensionality Reduction Compress / reduce dimensionality: – 10 6 rows; 10 3 columns; no updates – random access to any cell(s); small error: OK Slides by Jure Leskovec3
Dimensionality Reduction Assumption: Data lies on or near a low d-dimensional subspace Axes of this subspace are effective representation of the data Slides by Jure Leskovec4
Why Reduce Dimensions? Why reduce dimensions? Discover hidden correlations/topics – Words that occur commonly together Remove redundant and noisy features – Not all words are useful Interpretation and visualization Easier storage and processing of the data Slides by Jure Leskovec5
SVD - Definition A [m x n] = U [m x r] r x r] (V [n x r] ) T A: Input data matrix – m x n matrix (e.g., m documents, n terms) U: Left singular vectors – m x r matrix (m documents, r concepts) : Singular values – r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix A) V: Right singular vectors – n x r matrix (n terms, r concepts) Slides by Jure Leskovec6
SVD Slides by Jure Leskovec7 A m n m n U VTVT T
SVD Slides by Jure Leskovec8 A m n + 1u1v11u1v1 2u2v22u2v2 σ i … scalar u i … vector v i … vector T
SVD - Properties It is always possible to decompose a real matrix A into A = U V T, where U, , V: unique U, V: column orthonormal: – U T U = I; V T V = I (I: identity matrix) – (Cols. are orthogonal unit vectors) : diagonal – Entries (singular values) are positive, and sorted in decreasing order ( σ 1 σ 2 ... 0) Slides by Jure Leskovec9
SVD – Example: Users-to-Movies A = U V T - example: Slides by Jure Leskovec10 = SciFi Romnce xx Matrix Alien Serenity Casablanca Amelie
SVD – Example: Users-to-Movies A = U V T - example: Slides by Jure Leskovec11 = xx SciFi-concept Romance-concept SciFi Romnce Matrix Alien Serenity Casablanca Amelie
SVD - Example A = U V T - example: Slides by Jure Leskovec12 = xx SciFi-concept Romance-concept U is “user-to-concept” similarity matrix SciFi Romnce Matrix Alien Serenity Casablanca Amelie
SVD - Example A = U V T - example: Slides by Jure Leskovec13 = xx ‘strength’ of SciFi-concept SciFi Romnce Matrix Alien Serenity Casablanca Amelie
SVD - Example A = U V T - example: Slides by Jure Leskovec14 = xx V is “movie-to-concept” similarity matrix SciFi-concept SciFi Romnce Matrix Alien Serenity Casablanca Amelie
SVD - Example A = U V T - example: Slides by Jure Leskovec15 = xx SciFi-concept SciFi Romnce Matrix Alien Serenity Casablanca Amelie V is “movie-to-concept” similarity matrix
SVD - Interpretation #1 ‘movies’, ‘users’ and ‘concepts’: U: user-to-concept similarity matrix V: movie-to-concept sim. matrix : its diagonal elements: ‘strength’ of each concept Slides by Jure Leskovec16
SVD - interpretation #2 Slides by Jure Leskovec17 SVD gives best axis to project on: ‘best’ = min sum of squares of projection errors minimum reconstruction error v1v1 first right singular vector Movie 1 rating Movie 2 rating
SVD - Interpretation #2 A = U V T - example: Slides by Jure Leskovec18 xx v1v1 = v1v1 first right singular vector Movie 1 rating Movie 2 rating
SVD - Interpretation #2 A = U V T - example: Slides by Jure Leskovec19 xx variance (‘spread’) on the v 1 axis =
SVD - Interpretation #2 More details Q: How exactly is dim. reduction done? Slides by Jure Leskovec20 xx =
SVD - Interpretation #2 More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero Slides by Jure Leskovec21 = xx A=
SVD - Interpretation #2 More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero Slides by Jure Leskovec22 xx A= ~
SVD - Interpretation #2 More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero: Slides by Jure Leskovec23 xx A= ~
SVD - Interpretation #2 More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero: Slides by Jure Leskovec24 xx A= ~
SVD - Interpretation #2 More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero Slides by Jure Leskovec25 ~ A= B= Frobenius norm: ǁ M ǁ F = Σ ij M ij 2 ǁ A-B ǁ F = Σ ij (A ij -B ij ) 2 is “small”
Slides by Jure Leskovec26 A U Sigma VTVT = B U VTVT = B is approx A
SVD – Best Low Rank Approx. Slides by Jure Leskovec27
SVD – Best Low Rank Approx. Slides by Jure Leskovec28 We apply: -- P column orthonormal -- R row orthonormal -- Q is diagonal
SVD – Best Low Rank Approx. Slides by Jure Leskovec29 U V T - U S V T = U ( - S) V T
SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix: Slides by Jure Leskovec30 = xx u1u1 u2u2 σ1σ1 σ2σ2 v1v1 v2v2
SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix Slides by Jure Leskovec31 = u1u1 σ1σ1 vT1vT1 u2u2 σ2σ2 vT2vT n m n x 1 1 x m k terms Assume: σ 1 σ 2 σ 3 ... 0 Why is setting small σs the thing to do? Vectors u i and v i are unit length, so σ i scales them. So, zeroing small σs introduces less error.
SVD - Interpretation #2 Q: How many σ s to keep? A: Rule-of-a thumb: keep 80-90% of ‘energy’ (= σ i 2 ) Slides by Jure Leskovec32 =u1u1 σ1σ1 vT1vT1 u2u2 σ2σ2 vT2vT n m assume: σ 1 σ 2 σ 3 ...
SVD - Complexity To compute SVD: – O(nm 2 ) or O(n 2 m) (whichever is less) But: – Less work, if we just want singular values – or if we want first k singular vectors – or if the matrix is sparse Implemented in linear algebra packages like – LINPACK, Matlab, SPlus, Mathematica... Slides by Jure Leskovec33
SVD - Conclusions so far SVD: A= U V T : unique – U: user-to-concept similarities – V: movie-to-concept similarities – : strength of each concept Dimensionality reduction: – keep the few largest singular values (80-90% of ‘energy’) – SVD: picks up linear correlations Slides by Jure Leskovec34
Case study: How to query? Q: Find users that like ‘Matrix’ and ‘Alien’ Slides by Jure Leskovec35 = SciFi Romnce xx Matrix Alien Serenity Casablanca Amelie
Case study: How to query? Q: Find users that like ‘Matrix’ A: Map query into a ‘concept space’ – how? Slides by Jure Leskovec36 = SciFi Romnce xx Matrix Alien Serenity Casablanca Amelie
Case study: How to query? Q: Find users that like ‘Matrix’ A: map query vectors into ‘concept space’ – how? Slides by Jure Leskovec37 q=q= Matrix Alien v1 q v2 Matrix Alien Serenity Casablanca Amelie Project into concept space: Inner product with each ‘concept’ vector v i
Case study: How to query? Q: Find users that like ‘Matrix’ A: map the vector into ‘concept space’ – how? Slides by Jure Leskovec38 v1 q q*v 1 q=q= Matrix Alien Serenity Casablanca Amelie v2 Matrix Alien Project into concept space: Inner product with each ‘concept’ vector v i
Case study: How to query? Compactly, we have: q concept = q V E.g.: Slides by Jure Leskovec39 movie-to-concept similarities = SciFi-concept q=q= Matrix Alien Serenity Casablanca Amelie
Case study: How to query? How would the user d that rated (‘Alien’, ‘Serenity’) be handled? d concept = d V E.g.: Slides by Jure Leskovec40 movie-to-concept similarities = SciFi-concept d= Matrix Alien Serenity Casablanca Amelie
Case study: How to query? Observation: User d that rated (‘Alien’, ‘Serenity’) will be similar to query “user” q that rated (‘Matrix’), although d did not rate ‘Matrix’! Slides by Jure Leskovec41 d= SciFi-concept q=q= Matrix Alien Serenity Casablanca Amelie Similarity = 0 Similarity ≠ 0
SVD: Drawbacks + Optimal low-rank approximation: in Frobenius norm - Interpretability problem: – A singular vector specifies a linear combination of all input columns or rows - Lack of sparsity: – Singular vectors are dense! Slides by Jure Leskovec42 U VTVT