Download presentation
Presentation is loading. Please wait.
Published byAmy Poole Modified over 9 years ago
1
2010 © University of Michigan Latent Semantic Indexing SI650: Information Retrieval Winter 2010 School of Information University of Michigan 1
2
2010 © University of Michigan … Latent semantic indexing Singular value decomposition …
3
2010 © University of Michigan Problems with lexical semantics Polysemy –bar, bank, jaguar, hot –tend to reduce precision Synonymy –building/edifice, Large/big, Spicy/hot –tend to reduce recall Relatedness –doctor/patient/nurse/treatment Sparse matrix Need: dimensionality reduction
4
2010 © University of Michigan Problem in Retrieval 4 Query = “information retrieval” Document 1 = “inverted index precision recall” Document 2 = “welcome to ann arbor” Which one should we rank higher? Query vocabulary & doc vocabulary mismatch! Smoothing won’t help here… If only we can represent documents/queries by topics!
5
2010 © University of Michigan Latent Semantic Indexing Motivation –Query vocabulary & doc vocabulary mismatch –Need to match/index based on concepts (or topics) Main idea: –Projects queries and documents into a space with “latent” semantic dimensions –Dimensionality reduction: the latent semantic space has fewer dimensions (semantic concepts) –Exploits co-occurrence: Co-occurring terms are projected onto the same dimensions
6
2010 © University of Michigan 6 Example of “Semantic Concepts” (Slide from C. Faloutsos’s talk)
7
2010 © University of Michigan Concept Space = Dimension Reduction 7 Number of concepts (K) is always smaller than the number of words (N) or number of documents (M). If we represent a document as a N-dimension vector; and the corpus as an M*N matrix… The goal is to reduce the dimension from N to K. But how can we do that?
8
2010 © University of Michigan Techniques for dimensionality reduction Based on matrix decomposition (goal: preserve clusters, explain away variance) A quick review of matrices –Vectors –Matrices –Matrix multiplication
9
2010 © University of Michigan Eigenvectors and eigenvalues An eigenvector is an implicit “direction” for a matrix where v (eigenvector) is non-zero, though λ (eigenvalue) can be any complex number in principle Computing eigenvalues (det = determinant): if A is square (N x N), has r distinct solutions, where 1 <= r <= N For each λ found, you can find v by, or
10
2010 © University of Michigan Eigenvectors and eigenvalues Example: det (A- I) = (-1- )*(- )-3*2=0 Then: + 2 -6=0; 1 =2; 2 =-3 For Solutions: x 1 =x 2
11
2010 © University of Michigan Eigenvectors and eigenvalues Wait, that means there are many eigenvectors for the same eigenvalue… v = (x 1, x 2 ) T ; x 1 = x 2 corresponds to many vectors, e.g., (1, 1) T, (2, 2) T, (650, 650) T … Not surprising … if v is an eigenvector of A, v ’ = c v is also an eigenvector (c is any non-zero constant) 11
12
2010 © University of Michigan Matrix Decomposition If A is a square (N x N) matrix and it has N linearly independent eigenvectors, it can be decomposed into U U -1 where U: matrix of eigenvectors (every column) diagonal matrix of eigenvalues AU = U U -1 AU = A = U U -1
13
2010 © University of Michigan Example
14
2010 © University of Michigan Example Eigenvalues are 3, 2, 0 x is an arbitrary vector, yet Sx depends on the eigenvalues and eigenvectors
15
2010 © University of Michigan 15 What about an arbitrary matrix? A: n x m matrix (n documents, m terms) A = U V T (as opposed to A = U U -1 ) U: n x n matrix; V: m x m matrix n x m diagonal matrix only values on the diagonal can be non-zero. UU T = I; VV T = I
16
2010 © University of Michigan SVD: Singular Value Decomposition A=U V T U is the matrix of orthogonal eigenvectors of AA T V is the matrix of orthogonal eigenvectors of A T A The components of are the eigenvalues of A T A This decomposition exists for all matrices, dense or sparse If A has 5 columns and 3 rows, then U will be 5x5 and V will be 3x3 In Matlab, use [U,S,V] = svd (A)
17
2010 © University of Michigan Term matrix normalization D1 D2 D3 D4 D5
18
2010 © University of Michigan Example (Berry and Browne) T1: baby T2: child T3: guide T4: health T5: home T6: infant T7: proofing T8: safety T9: toddler D1: infant & toddler first aid D2: babies & children’s room (for your home) D3: child safety at home D4: your baby’s health and safety: from infant to toddler D5: baby proofing basics D6: your guide to easy rust proofing D7: beanie babies collector’s guide
19
2010 © University of Michigan Document term matrix
20
2010 © University of Michigan Decomposition u = -0.6976 -0.0945 0.0174 -0.6950 0.0000 0.0153 0.1442 -0.0000 0 -0.2622 0.2946 0.4693 0.1968 -0.0000 -0.2467 -0.1571 -0.6356 0.3098 -0.3519 -0.4495 -0.1026 0.4014 0.7071 -0.0065 -0.0493 -0.0000 0.0000 -0.1127 0.1416 -0.1478 -0.0734 0.0000 0.4842 -0.8400 0.0000 -0.0000 -0.2622 0.2946 0.4693 0.1968 0.0000 -0.2467 -0.1571 0.6356 -0.3098 -0.1883 0.3756 -0.5035 0.1273 -0.0000 -0.2293 0.0339 -0.3098 -0.6356 -0.3519 -0.4495 -0.1026 0.4014 -0.7071 -0.0065 -0.0493 0.0000 -0.0000 -0.2112 0.3334 0.0962 0.2819 -0.0000 0.7338 0.4659 -0.0000 0.0000 -0.1883 0.3756 -0.5035 0.1273 -0.0000 -0.2293 0.0339 0.3098 0.6356 v = -0.1687 0.4192 -0.5986 0.2261 0 -0.5720 0.2433 -0.4472 0.2255 0.4641 -0.2187 0.0000 -0.4871 -0.4987 -0.2692 0.4206 0.5024 0.4900 -0.0000 0.2450 0.4451 -0.3970 0.4003 -0.3923 -0.1305 0 0.6124 -0.3690 -0.4702 -0.3037 -0.0507 -0.2607 -0.7071 0.0110 0.3407 -0.3153 -0.5018 -0.1220 0.7128 -0.0000 -0.0162 -0.3544 -0.4702 -0.3037 -0.0507 -0.2607 0.7071 0.0110 0.3407
21
2010 © University of Michigan Decomposition = 1.5849 0 0 0 0 0 0 0 1.2721 0 0 0 0 0 0 0 1.1946 0 0 0 0 0 0 0 0.7996 0 0 0 0 0 0 0 0.7100 0 0 0 0 0 0 0 0.5692 0 0 0 0 0 0 0 0.1977 0 0 0 0 0 0 0 Spread on the v1 axis
22
2010 © University of Michigan What does this have to do with dimension reduction? Low rank matrix approximation SVD: A [m*n] = U [m*m] m*n V T n*n Remember that is a diagonal matrix of eigenvalues If we only keep the largest r eigenvalues.. A ≈ U [m*r] r*r V T n*r 22
23
2010 © University of Michigan Rank-4 approximation s4 = 1.5849 0 0 0 0 0 0 0 1.2721 0 0 0 0 0 0 0 1.1946 0 0 0 0 0 0 0 0.7996 0 0 0 0 0 0 0 0 0 0
24
2010 © University of Michigan Rank-4 approximation u*s4*v' -0.0019 0.5985 -0.0148 0.4552 0.7002 0.0102 0.7002 -0.0728 0.4961 0.6282 0.0745 0.0121 -0.0133 0.0121 0.0003 -0.0067 0.0052 -0.0013 0.3584 0.7065 0.3584 0.1980 0.0514 0.0064 0.2199 0.0535 -0.0544 0.0535 -0.0728 0.4961 0.6282 0.0745 0.0121 -0.0133 0.0121 0.6337 -0.0602 0.0290 0.5324 -0.0008 0.0003 -0.0008 0.0003 -0.0067 0.0052 -0.0013 0.3584 0.7065 0.3584 0.2165 0.2494 0.4367 0.2282 -0.0360 0.0394 -0.0360 0.6337 -0.0602 0.0290 0.5324 -0.0008 0.0003 -0.0008
25
2010 © University of Michigan Rank-4 approximation u*s4: word vector representation of the concepts/topics -1.1056 -0.1203 0.0207 -0.5558 0 0 0 -0.4155 0.3748 0.5606 0.1573 0 0 0 -0.5576 -0.5719 -0.1226 0.3210 0 0 0 -0.1786 0.1801 -0.1765 -0.0587 0 0 0 -0.4155 0.3748 0.5606 0.1573 0 0 0 -0.2984 0.4778 -0.6015 0.1018 0 0 0 -0.5576 -0.5719 -0.1226 0.3210 0 0 0 -0.3348 0.4241 0.1149 0.2255 0 0 0 -0.2984 0.4778 -0.6015 0.1018 0 0 0
26
2010 © University of Michigan Rank-4 approximation s4*v': new (concept/topic) representation of documents -0.2674 -0.7087 -0.4266 -0.6292 -0.7451 -0.4996 -0.7451 0.5333 0.2869 0.5351 0.5092 -0.3863 -0.6384 -0.3863 -0.7150 0.5544 0.6001 -0.4686 -0.0605 -0.1457 -0.0605 0.1808 -0.1749 0.3918 -0.1043 -0.2085 0.5700 -0.2085 0 0 0 0 0 0 0
27
2010 © University of Michigan Rank-2 approximation s2 = 1.5849 0 0 0 0 0 0 0 1.2721 0 0 0 0 0 0 0 0 0 0 0 0
28
2010 © University of Michigan Rank-2 approximation u*s2*v' 0.1361 0.4673 0.2470 0.3908 0.5563 0.4089 0.5563 0.2272 0.2703 0.2695 0.3150 0.0815 -0.0571 0.0815 -0.1457 0.1204 -0.0904 -0.0075 0.4358 0.4628 0.4358 0.1057 0.1205 0.1239 0.1430 0.0293 -0.0341 0.0293 0.2272 0.2703 0.2695 0.3150 0.0815 -0.0571 0.0815 0.2507 0.2412 0.2813 0.3097 -0.0048 -0.1457 -0.0048 -0.1457 0.1204 -0.0904 -0.0075 0.4358 0.4628 0.4358 0.2343 0.2454 0.2685 0.3027 0.0286 -0.1073 0.0286 0.2507 0.2412 0.2813 0.3097 -0.0048 -0.1457 -0.0048
29
2010 © University of Michigan Rank-2 approximation u*s2: word vector representation of the concepts/topics -1.1056 -0.1203 0 0 0 0 0 -0.4155 0.3748 0 0 0 0 0 -0.5576 -0.5719 0 0 0 0 0 -0.1786 0.1801 0 0 0 0 0 -0.4155 0.3748 0 0 0 0 0 -0.2984 0.4778 0 0 0 0 0 -0.5576 -0.5719 0 0 0 0 0 -0.3348 0.4241 0 0 0 0 0 -0.2984 0.4778 0 0 0 0 0
30
2010 © University of Michigan Rank-2 approximation s2*v': new (concept/topic) representation of documents -0.2674 -0.7087 -0.4266 -0.6292 -0.7451 -0.4996 -0.7451 0.5333 0.2869 0.5351 0.5092 -0.3863 -0.6384 -0.3863 0 0 0 0 0 0 0
31
2010 © University of Michigan 31 Latent Semantic Indexing A [n x m] ≈ U [n x r] r x r] (V [m x r] ) T A: n x m matrix (n documents, m terms) U: n x r matrix (n documents, r concepts) : r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix) V: m x r matrix (m terms, r concepts)
32
2010 © University of Michigan Latent semantic indexing (LSI) Dimensionality reduction = identification of hidden (latent) concepts Query matching in latent space LSI matches documents even if they don’t have words in common; –If they share frequently co-occurring terms
33
2010 © University of Michigan 33 Back to the CS-MED example (Slide from C. Faloutsos’s talk)
34
2010 © University of Michigan 34 Example of LSI data inf retrieval brain lung = CS MD xx CS-concept MD-concept Term rep of concept (Slide adapted from C. Faloutsos’s talk) Strength of CS-concept Dim. Reduction A = U V T
35
2010 © University of Michigan 35 How to Map Query/Doc to the Same Concept Space? q T concept = q T V d T concept = d T V data inf. retrieval brain lung qT=qT= = Similarity with CS-concept dT=dT= 0 1 1 0 01.16 0 (Slide adapted from C. Faloutsos’s talk)
36
2010 © University of Michigan Useful pointers http://lsa.colorado.edu http://lsi.research.telcordia.com http://www.cs.utk.edu/~lsi
37
2010 © University of Michigan Readings MRS18 MRS17, MRS19 MRS20
38
2010 © University of Michigan Problem of LSI Concepts/Topics are hard to interpret New document/query vectors could have negative values Lack of statistical interpretation Probabilistic latent semantic indexing… 38
39
2010 © University of Michigan 39 General Idea of Probabilistic Topic Models Modeling a topic/subtopic/theme with a multinomial distribution (unigram LM) Modeling text data with a mixture model involving multinomial distributions –A document is “generated” by sampling words from some multinomial distribution –Each time, a word may be generated from a different distribution –Many variations of how these multinomial distributions are mixed Topic mining = Fitting the probabilistic model to text Answer topic-related questions by computing various kinds of conditional probabilities based on the estimated model (e.g., p(time | topic), p(time | topic, location))
40
2010 © University of Michigan 40 Document as a Sample of Mixed Topics Applications of topic models: –Summarize themes/aspects –Facilitate navigation/browsing –Retrieve documents –Segment documents –Many others How can we discover these topic word distributions? Topic 1 Topic k Topic 2 … Background B government 0.3 response 0.2... donate 0.1 relief 0.05 help 0.02... city 0.2 new 0.1 orleans 0.05... is 0.05 the 0.04 a 0.03... [ Criticism of government response to the hurricane primarily consisted of criticism of its response to the approach of the storm and its aftermath, specifically in the delayed response ] to the [ flooding of New Orleans. … 80% of the 1.3 million residents of the greater New Orleans metropolitan area evacuated ] …[ Over seventy countries pledged monetary donations or other assistance]. …
41
2010 © University of Michigan 41 Probabilistic Latent Semantic Analysis/Indexing (PLSA/PLSI) [Hofmann 99] Mix k multinomial distributions to generate a document Each document has a potentially different set of mixing weights which captures the topic coverage When generating words in a document, each word may be generated using a DIFFERENT multinomial distribution (this is in contrast with the document clustering model where, once a multinomial distribution is chosen, all the words in a document would be generated using the same model) We may add a background distribution to “attract” background words
42
2010 © University of Michigan PLSI (a.k.a. Aspect Model) Every document is a mixture of underlying (latent) K aspects (topics) with mixture weights p(z|d) –How is this related to LSI? Each aspect is represented by a distribution of words p(w|z) Estimate p(z|d) and p(w|z) using EM algorithm
43
2010 © University of Michigan 43 PLSI as a Mixture Model Topic z 1 Topic z k Topic z 2 … Document d Background B warning 0.3 system 0.2.. aid 0.1 donation 0.05 support 0.02.. statistics 0.2 loss 0.1 dead 0.05.. is 0.05 the 0.04 a 0.03.. kk 11 22 B B W p(z 1 |d) 1 - B “Generating” word w in doc d in the collection Parameters: B =noise-level (manually set) P(z|d) and p(w|z) are estimated with Maximum Likelihood ? ? ? ? ? ? ? ? ? ? ? p(z 2 |d) p(z k |d)
44
2010 © University of Michigan Parameter Estimation using EM Algorithm We have the equation for log-likelihood function from the PLSI model, which we want to maximize: Maximizing likelihood using Expectation Maximization
45
2010 © University of Michigan EM Steps E-Step –Expectation step where expectation of the likelihood function is calculated with the current parameter values M-Step –Update the parameters with the calculated posterior probabilities –Find the parameters that maximizes the likelihood function
46
2010 © University of Michigan E Step It is the probability that a word w occurring in a document d, is explained by topic z
47
2010 © University of Michigan M Step All these equations use p(z|d,w) calculated in E Step Converges to a local maximum of the likelihood function We will see more when we talk about topic modeling
48
2010 © University of Michigan Example of PLSI 48
49
2010 © University of Michigan Topics represented as word distributions Topics are interpretable! 49 - Example of topics found from blog articles about “Hurricane Katrina”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.