Download presentation
Presentation is loading. Please wait.
Published byAvis Walters Modified over 9 years ago
1
Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18
2
Speeding up cosine computation What if we could take our vectors and “pack” them into fewer dimensions (say 50,000 100) while preserving distances? Now, O(nm) to compute cos(d,q) for all d Then, O(km+kn) where k << n,m Two methods: “Latent semantic indexing” Random projection
3
Briefly LSI is data-dependent Create a k-dim subspace by eliminating redundant axes Pull together “related” axes – hopefully car and automobile Random projection is data-independent Choose a k-dim subspace that guarantees good stretching properties with high probability between any pair of points. What about polysemy ?
4
Notions from linear algebra Matrix A, vector v Matrix transpose (A t ) Matrix product Rank Eigenvalues and eigenvector v: Av = v
5
Overview of LSI Pre-process docs using a technique from linear algebra called Singular Value Decomposition Create a new (smaller) vector space Queries handled (faster) in this new space
6
Singular-Value Decomposition Recall m n matrix of terms docs, A. A has rank r m,n Define term-term correlation matrix T=AA t T is a square, symmetric m m matrix Let P be m r matrix of eigenvectors of T Define doc-doc correlation matrix D=A t A D is a square, symmetric n n matrix Let R be n r matrix of eigenvectors of D
7
A’s decomposition Given P (for T, m r) and R (for D, n r) formed by orthonormal columns (unit dot-product) It turns out that A = P R t Where is a diagonal matrix with the eigenvalues of T=AA t in decreasing order. = A P RtRt mnmnmrmr rrrr rnrn
8
For some k << r, zero out all but the k biggest eigenvalues in [choice of k is crucial] Denote by k this new version of , having rank k Typically k is about 100, while r ( A’s rank ) is > 10,000 = P kk RtRt Dimensionality reduction AkAk document useless due to 0-col/0-row of k m x r r x n r k k k 00 0 A m x k k x n
9
Guarantee A k is a pretty good approximation to A: Relative distances are (approximately) preserved Of all m n matrices of rank k, A k is the best approximation to A wrt the following measures: min B, rank(B)=k ||A-B|| 2 = ||A-A k || 2 = k min B, rank(B)=k ||A-B|| F 2 = ||A-A k || F 2 = k 2 k+2 2 r 2 Frobenius norm ||A|| F 2 = 2 2 r 2
10
Reduction X k = k R t is the doc-matrix k x n, hence reduced to k dim Since we are interested in doc/q correlation, we consider: D=A t A =(P R t ) t (P R t ) = ( R t ) t ( R t ) Approx with k, thus get A t A X k t X k (both are n x n matr.) We use X k to define how to project A and Q: X k = k R t, substitute R t = P t A, so get P k t A In fact, k P t = P k t which is a k x m matrix This means that to reduce a doc/query vector is enough to multiply it by P k t thus paying O(km) per doc/query Cost of sim(q,d), for all d, is O(kn+km) instead of O(mn) R,P are formed by orthonormal eigenvectors of the matrices D,T
11
Which are the concepts ? c-th concept = c-th row of P k t (which is k x m) Denote it by P k t [c], whose size is m = #terms P k t [c][i] = strength of association between c-th concept and i-th term Projected document: d’ j = P k t d j d’ j [c] = strenght of concept c in d j Projected query: q’ = P k t q q’ [c] = strenght of concept c in q
12
Random Projections Paolo Ferragina Dipartimento di Informatica Università di Pisa Slides only !
13
An interesting math result f() is called JL-embedding Setting v=0 we also get a bound on f(u)’s stretching!!! Lemma (Johnson-Linderstrauss, ‘82) Let P be a set of n distinct points in m-dimensions. Given > 0, there exists a function f : P IR k such that for every pair of points u,v in P it holds: (1 - ) ||u - v|| 2 ≤ ||f(u) – f(v)|| 2 ≤ (1 + ) ||u-v|| 2 Where k = O( -2 log n)
14
What about the cosine-distance ? f(u)’s, f(v)’s stretching substituting formula above for ||u-v|| 2
15
How to compute a JL-embedding? E[r i,j ] = 0 Var[r i,j ] = 1 If we set R = r i,j to be a random mx k matrix, where the components are independent random variables with one of the following distributions
16
Finally... Random projections hide large constants k (1/ ) 2 * log m, so it may be large… it is simple and fast to compute LSI is intuitive and may scale to any k optimal under various metrics but costly to compute, now good libraries indeed
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.