Download presentation
Presentation is loading. Please wait.
1
Packing to fewer dimensions
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Packing to fewer dimensions Paolo Ferragina Dipartimento di Informatica Università di Pisa
2
Speeding up cosine computation
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Speeding up cosine computation What if we could take our vectors and “pack” them into fewer dimensions (say 50,000100) while preserving distances? Now, O(nm) to compute cos(d,q) for all n docs Then, O(km+kn) where k << n,m Two methods: “Latent semantic indexing” Random projection
3
Briefly LSI is data-dependent Random projection is data-independent
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Briefly LSI is data-dependent Create a k-dim subspace by eliminating redundant axes Pull together “related” axes – hopefully car and automobile Random projection is data-independent Choose a k-dim subspace that guarantees good stretching properties with high probability between any pair of points. What about polysemy ?
4
Latent Semantic Indexing
Sec. 18.4 Latent Semantic Indexing courtesy of Susan Dumais
5
Notions from linear algebra
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Notions from linear algebra Matrix A, vector v Matrix transpose (At) Matrix product Rank Eigenvalues l and eigenvector v: Av = lv Example
6
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Overview of LSI Pre-process docs using a technique from linear algebra called Singular Value Decomposition Create a new (smaller) vector space Queries handled (faster) in this new space
7
Singular-Value Decomposition
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Singular-Value Decomposition Recall m n matrix of terms docs, A. A has rank r m,n Define term-term correlation matrix T=AAt T is a square, symmetric m m matrix Let U be m r matrix of r eigenvectors of T Define doc-doc correlation matrix D=AtA D is a square, symmetric n n matrix Let V be n r matrix of r eigenvectors of D
8
A’s decomposition Vt S U A
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" A’s decomposition Given U (for T, m r) and V (for D, n r) formed by orthonormal columns (unit dot-product) It turns out that A = U S Vt Where S is a diagonal matrix with the eigenvalues of T=AAt in decreasing order. mn = mr rn rr Vt S U A
9
Dimensionality reduction
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Dimensionality reduction Fix some k << r, zero out all but the k biggest eigenvalues in S [choice of k is crucial] Denote by Sk this new version of S, having rank k Typically k is about 100, while r (A’s rank) is > 10,000 document k k k = useless due to 0-col/0-row of Sk r Sk S Vt Ak A U r x n k x n m x r m x k
10
A running example
11
A running example
12
A running example
13
Guarantee Ak is a pretty good approximation to A:
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Guarantee Ak is a pretty good approximation to A: Relative distances are (approximately) preserved Of all m n matrices of rank k, Ak is the best approximation to A wrt the following measures: minB, rank(B)=k ||A-B||2 = ||A-Ak||2 = sk+1 minB, rank(B)=k ||A-B||F2 = ||A-Ak||F2 = sk+12+ sk sr2 Frobenius norm ||A||F2 = s12+ s sr2
14
Reduction We use Xk to define how to project A:
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" U,V are formed by orthonormal eigenvectors of the matrices D,T Reduction Since we are interested in doc/doc correlation, we consider: D=At A =(U S V t)t (U S V t) = (S V t)t (S V t) Hence X = S Vt is a matrix r x n, may play the role of A To reduce its size we set Xk = Sk Vt is a matrix k x n and thus get At A Xkt Xk (both are n x n matrices) We use Xk to define how to project A: Since Xk = Sk Vk t Xk = Ukt A (use def of SVD of A) Since Xk may play role of A, its cols are proj. docs Similarly Q can be interpreted as a new col of A and thus it is enough to multiply Ukt times Q to get the projected query, O(km) time A = U S VT , AT = V S UT , AT U S-1 = V
15
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Which are the concepts ? c-th concept = c-th col of Uk (which is m x k) Uk[i][c] = strength of association between c-th concept and i-th term Vtk[c][j] = strength of association between c-th concept and j-th document Projected document: d’j = Utk dj d’j [c] = strenght of concept c in dj Projected query: q’ = Utk q q’[c] = strenght of concept c in q
16
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Random Projections Paolo Ferragina Dipartimento di Informatica Università di Pisa Slides only !
17
An interesting math result
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" An interesting math result Lemma (Johnson-Linderstrauss, ‘82) Let P be a set of n distinct points in m-dimensions. Given e > 0, there exists a function f : P IRk such that for every pair of points u,v in P it holds: (1 - e) ||u - v||2 ≤ ||f(u) – f(v)||2 ≤ (1 + e) ||u-v||2 Where k = O(e-2 log n) f() is called JL-embedding Setting v=0 we also get a bound on f(u)’s stretching!!!
18
What about the cosine-distance ?
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" What about the cosine-distance ? f(u)’s, f(v)’s stretching substituting formula above for ||u-v||2 2 f(u) * f(v) <= ||f(u)||^2 + ||f(v)||^2 - ||f(u) – f(v)||^2 <= (1+eps) ||u||^2 + (1+eps) ||v||^2 – (1-eps) ||u-v||^2 2f(u)f(v) <= (1+eps) (||u||^2 + ||v||^2) – (1-eps) (||u||^2 + ||v||^2 – 2 uv) = 2 eps (||u||^2 + ||v||^2) + (1-eps) (2 uv)
19
How to compute a JL-embedding?
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" How to compute a JL-embedding? If we set the projection matrix P = pi,j as a random m x k matrix, where its components are independent random variables with one of the following two distributions: 2 E[pi,j] = 0 Var[pi,j] = 1
20
Finally... Random projections hide large constants
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Finally... Random projections hide large constants k (1/e)2 * log n, so k may be large… it is simple and fast to compute LSI is intuitive and may scale to any k optimal under various metrics but costly to compute, do exist good libraries
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.