Download presentation
Presentation is loading. Please wait.
1
Latent Semantic Indexing via a Semi-discrete Matrix Decomposition
2
Papers from the same authors with similar topics 1.Kolda, T.G. & O'Leary, D.P. A semidiscrete matrix decomposition for latent semantic indexing information retrieval ACM Trans. Inf. Syst., 1998, 16, 322-346 2.Kolda, T.G. & O’Leary, D.P. George Cybenko, D.P.O. (ed.) Latentsemantic indexing via a semi-discrete matrix decomposition Springer-Verlag, 1999, 107, 73–80 3.Kolda, T.G. & O'leary, D.P. Algorithm 805: computation and uses of the semidiscrete matrix decomposition ACM Transactions on Mathematical Software, 2000, 26, 415– 435
3
Vector Space Framework Query:
4
Weight of term in a document
6
Motivation for using SDD Singular Value Decomposition (SVD) is used for Latent Semantic Indexing (LSI) to estimate the structure of word usage across documents. Use Semi-discrete Decomposition (SDD) instead of SVD for LSI to save storage space and retrieval time.
7
Why? Claim: SVD has nice theoretical properties but SVD contains a lot of information, probably more than is necessary for this application.
8
SVD vs SDD SVD: SDD:
9
SDD is an approximate representation of the matrix. Repackaging, even without removing anything, might not result in the original matrix. Theorems exist that say that as the number of terms k tends to infinity, slowly you will converge to the original matrix. The speed of convergence depends on the original estimate, used to "initialize" the iterative decomposition algorithm.
10
Result: Storage Space SVDSDD Approximate comparative storage space 10.05 (for same given rank k ) Size per element Double word (64 bits) 2 bits Size per scalar value Double word (64 bits) Single word (32 bits) TOTAL8k(m + n + 1)4k + ¼ k(m + n)
11
Medline test case
12
Results on Medline test case
13
Method for SDD Greedy algorithm to iteratively construct the kth triplet, d k, x k, and y k.
14
Metrics in those papers 1.Kolda, T.G. & O'Leary, D.P. A semidiscrete matrix decomposition for latent semantic indexing information retrieval ACM Trans. Inf. Syst., 1998, 16, 322-346 2.Kolda, T.G. & O’Leary, D.P. George Cybenko, D.P.O. (ed.) Latentsemantic indexing via a semi-discrete matrix decomposition Springer-Verlag, 1999, 107, 73–80 3.Kolda, T.G. & O'leary, D.P. Algorithm 805: computation and uses of the semidiscrete matrix decomposition ACM Transactions on Mathematical Software, 2000, 26, 415–435 NOTE: all y’s above are fixed. x and y are alternatively fixed in each algorithm
15
Greedy Algorithm
16
Notes on the algorithm Starting vector y: every 100 th element is 1 and all the other are 0. A k A as k ∞ Find the minimum F-norm can be simplified to find an optimal x. Improvement threshold may be 0.01. improvement = |new - old| / old
17
1.Fix y. 2.Find optimal d. 3.Remove d by plugging d. 4.Solve x. 5.Use x and y to find d. Finding x and d This is simplified to (and use for the algorithm) 1.Fix y 2.Find optimal x over m x-vectors 3.Given x and y, find d*
19
There are m possible values for J; thus, we only need to check m possible x vectors to determine the optimal solution.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.