Presentation is loading. Please wait.

Presentation is loading. Please wait.

2010 © University of Michigan Latent Semantic Indexing SI650: Information Retrieval Winter 2010 School of Information University of Michigan 1.

Similar presentations


Presentation on theme: "2010 © University of Michigan Latent Semantic Indexing SI650: Information Retrieval Winter 2010 School of Information University of Michigan 1."— Presentation transcript:

1 2010 © University of Michigan Latent Semantic Indexing SI650: Information Retrieval Winter 2010 School of Information University of Michigan 1

2 2010 © University of Michigan … Latent semantic indexing Singular value decomposition …

3 2010 © University of Michigan Problems with lexical semantics Polysemy –bar, bank, jaguar, hot –tend to reduce precision Synonymy –building/edifice, Large/big, Spicy/hot –tend to reduce recall Relatedness –doctor/patient/nurse/treatment Sparse matrix Need: dimensionality reduction

4 2010 © University of Michigan Problem in Retrieval 4 Query = “information retrieval” Document 1 = “inverted index precision recall” Document 2 = “welcome to ann arbor” Which one should we rank higher? Query vocabulary & doc vocabulary mismatch! Smoothing won’t help here… If only we can represent documents/queries by topics!

5 2010 © University of Michigan Latent Semantic Indexing Motivation –Query vocabulary & doc vocabulary mismatch –Need to match/index based on concepts (or topics) Main idea: –Projects queries and documents into a space with “latent” semantic dimensions –Dimensionality reduction: the latent semantic space has fewer dimensions (semantic concepts) –Exploits co-occurrence: Co-occurring terms are projected onto the same dimensions

6 2010 © University of Michigan 6 Example of “Semantic Concepts” (Slide from C. Faloutsos’s talk)

7 2010 © University of Michigan Concept Space = Dimension Reduction 7 Number of concepts (K) is always smaller than the number of words (N) or number of documents (M). If we represent a document as a N-dimension vector; and the corpus as an M*N matrix… The goal is to reduce the dimension from N to K. But how can we do that?

8 2010 © University of Michigan Techniques for dimensionality reduction Based on matrix decomposition (goal: preserve clusters, explain away variance) A quick review of matrices –Vectors –Matrices –Matrix multiplication

9 2010 © University of Michigan Eigenvectors and eigenvalues An eigenvector is an implicit “direction” for a matrix where v (eigenvector) is non-zero, though λ (eigenvalue) can be any complex number in principle Computing eigenvalues (det = determinant): if A is square (N x N), has r distinct solutions, where 1 <= r <= N For each λ found, you can find v by, or

10 2010 © University of Michigan Eigenvectors and eigenvalues Example: det (A- I) = (-1- )*(- )-3*2=0 Then: + 2 -6=0; 1 =2; 2 =-3 For   Solutions: x 1 =x 2

11 2010 © University of Michigan Eigenvectors and eigenvalues Wait, that means there are many eigenvectors for the same eigenvalue… v = (x 1, x 2 ) T ; x 1 = x 2 corresponds to many vectors, e.g., (1, 1) T, (2, 2) T, (650, 650) T … Not surprising … if v is an eigenvector of A, v ’ = c v is also an eigenvector (c is any non-zero constant) 11

12 2010 © University of Michigan Matrix Decomposition If A is a square (N x N) matrix and it has N linearly independent eigenvectors, it can be decomposed into U  U -1 where U: matrix of eigenvectors (every column)  diagonal matrix of eigenvalues AU = U  U -1 AU =  A = U  U -1

13 2010 © University of Michigan Example

14 2010 © University of Michigan Example Eigenvalues are 3, 2, 0 x is an arbitrary vector, yet Sx depends on the eigenvalues and eigenvectors

15 2010 © University of Michigan 15 What about an arbitrary matrix? A: n x m matrix (n documents, m terms) A = U  V T (as opposed to A = U  U -1 ) U: n x n matrix; V: m x m matrix  n x m diagonal matrix  only values on the diagonal can be non-zero. UU T = I; VV T = I

16 2010 © University of Michigan SVD: Singular Value Decomposition A=U  V T U is the matrix of orthogonal eigenvectors of AA T V is the matrix of orthogonal eigenvectors of A T A The components of  are the eigenvalues of A T A This decomposition exists for all matrices, dense or sparse If A has 5 columns and 3 rows, then U will be 5x5 and V will be 3x3 In Matlab, use [U,S,V] = svd (A)

17 2010 © University of Michigan Term matrix normalization D1 D2 D3 D4 D5

18 2010 © University of Michigan Example (Berry and Browne) T1: baby T2: child T3: guide T4: health T5: home T6: infant T7: proofing T8: safety T9: toddler D1: infant & toddler first aid D2: babies & children’s room (for your home) D3: child safety at home D4: your baby’s health and safety: from infant to toddler D5: baby proofing basics D6: your guide to easy rust proofing D7: beanie babies collector’s guide

19 2010 © University of Michigan Document term matrix

20 2010 © University of Michigan Decomposition u = -0.6976 -0.0945 0.0174 -0.6950 0.0000 0.0153 0.1442 -0.0000 0 -0.2622 0.2946 0.4693 0.1968 -0.0000 -0.2467 -0.1571 -0.6356 0.3098 -0.3519 -0.4495 -0.1026 0.4014 0.7071 -0.0065 -0.0493 -0.0000 0.0000 -0.1127 0.1416 -0.1478 -0.0734 0.0000 0.4842 -0.8400 0.0000 -0.0000 -0.2622 0.2946 0.4693 0.1968 0.0000 -0.2467 -0.1571 0.6356 -0.3098 -0.1883 0.3756 -0.5035 0.1273 -0.0000 -0.2293 0.0339 -0.3098 -0.6356 -0.3519 -0.4495 -0.1026 0.4014 -0.7071 -0.0065 -0.0493 0.0000 -0.0000 -0.2112 0.3334 0.0962 0.2819 -0.0000 0.7338 0.4659 -0.0000 0.0000 -0.1883 0.3756 -0.5035 0.1273 -0.0000 -0.2293 0.0339 0.3098 0.6356 v = -0.1687 0.4192 -0.5986 0.2261 0 -0.5720 0.2433 -0.4472 0.2255 0.4641 -0.2187 0.0000 -0.4871 -0.4987 -0.2692 0.4206 0.5024 0.4900 -0.0000 0.2450 0.4451 -0.3970 0.4003 -0.3923 -0.1305 0 0.6124 -0.3690 -0.4702 -0.3037 -0.0507 -0.2607 -0.7071 0.0110 0.3407 -0.3153 -0.5018 -0.1220 0.7128 -0.0000 -0.0162 -0.3544 -0.4702 -0.3037 -0.0507 -0.2607 0.7071 0.0110 0.3407

21 2010 © University of Michigan Decomposition  = 1.5849 0 0 0 0 0 0 0 1.2721 0 0 0 0 0 0 0 1.1946 0 0 0 0 0 0 0 0.7996 0 0 0 0 0 0 0 0.7100 0 0 0 0 0 0 0 0.5692 0 0 0 0 0 0 0 0.1977 0 0 0 0 0 0 0 Spread on the v1 axis

22 2010 © University of Michigan What does this have to do with dimension reduction? Low rank matrix approximation SVD: A [m*n] = U [m*m]   m*n  V T  n*n  Remember that  is a diagonal matrix of eigenvalues If we only keep the largest r eigenvalues.. A ≈ U [m*r]   r*r  V T  n*r  22

23 2010 © University of Michigan Rank-4 approximation s4 = 1.5849 0 0 0 0 0 0 0 1.2721 0 0 0 0 0 0 0 1.1946 0 0 0 0 0 0 0 0.7996 0 0 0 0 0 0 0 0 0 0

24 2010 © University of Michigan Rank-4 approximation u*s4*v' -0.0019 0.5985 -0.0148 0.4552 0.7002 0.0102 0.7002 -0.0728 0.4961 0.6282 0.0745 0.0121 -0.0133 0.0121 0.0003 -0.0067 0.0052 -0.0013 0.3584 0.7065 0.3584 0.1980 0.0514 0.0064 0.2199 0.0535 -0.0544 0.0535 -0.0728 0.4961 0.6282 0.0745 0.0121 -0.0133 0.0121 0.6337 -0.0602 0.0290 0.5324 -0.0008 0.0003 -0.0008 0.0003 -0.0067 0.0052 -0.0013 0.3584 0.7065 0.3584 0.2165 0.2494 0.4367 0.2282 -0.0360 0.0394 -0.0360 0.6337 -0.0602 0.0290 0.5324 -0.0008 0.0003 -0.0008

25 2010 © University of Michigan Rank-4 approximation u*s4: word vector representation of the concepts/topics -1.1056 -0.1203 0.0207 -0.5558 0 0 0 -0.4155 0.3748 0.5606 0.1573 0 0 0 -0.5576 -0.5719 -0.1226 0.3210 0 0 0 -0.1786 0.1801 -0.1765 -0.0587 0 0 0 -0.4155 0.3748 0.5606 0.1573 0 0 0 -0.2984 0.4778 -0.6015 0.1018 0 0 0 -0.5576 -0.5719 -0.1226 0.3210 0 0 0 -0.3348 0.4241 0.1149 0.2255 0 0 0 -0.2984 0.4778 -0.6015 0.1018 0 0 0

26 2010 © University of Michigan Rank-4 approximation s4*v': new (concept/topic) representation of documents -0.2674 -0.7087 -0.4266 -0.6292 -0.7451 -0.4996 -0.7451 0.5333 0.2869 0.5351 0.5092 -0.3863 -0.6384 -0.3863 -0.7150 0.5544 0.6001 -0.4686 -0.0605 -0.1457 -0.0605 0.1808 -0.1749 0.3918 -0.1043 -0.2085 0.5700 -0.2085 0 0 0 0 0 0 0

27 2010 © University of Michigan Rank-2 approximation s2 = 1.5849 0 0 0 0 0 0 0 1.2721 0 0 0 0 0 0 0 0 0 0 0 0

28 2010 © University of Michigan Rank-2 approximation u*s2*v' 0.1361 0.4673 0.2470 0.3908 0.5563 0.4089 0.5563 0.2272 0.2703 0.2695 0.3150 0.0815 -0.0571 0.0815 -0.1457 0.1204 -0.0904 -0.0075 0.4358 0.4628 0.4358 0.1057 0.1205 0.1239 0.1430 0.0293 -0.0341 0.0293 0.2272 0.2703 0.2695 0.3150 0.0815 -0.0571 0.0815 0.2507 0.2412 0.2813 0.3097 -0.0048 -0.1457 -0.0048 -0.1457 0.1204 -0.0904 -0.0075 0.4358 0.4628 0.4358 0.2343 0.2454 0.2685 0.3027 0.0286 -0.1073 0.0286 0.2507 0.2412 0.2813 0.3097 -0.0048 -0.1457 -0.0048

29 2010 © University of Michigan Rank-2 approximation u*s2: word vector representation of the concepts/topics -1.1056 -0.1203 0 0 0 0 0 -0.4155 0.3748 0 0 0 0 0 -0.5576 -0.5719 0 0 0 0 0 -0.1786 0.1801 0 0 0 0 0 -0.4155 0.3748 0 0 0 0 0 -0.2984 0.4778 0 0 0 0 0 -0.5576 -0.5719 0 0 0 0 0 -0.3348 0.4241 0 0 0 0 0 -0.2984 0.4778 0 0 0 0 0

30 2010 © University of Michigan Rank-2 approximation s2*v': new (concept/topic) representation of documents -0.2674 -0.7087 -0.4266 -0.6292 -0.7451 -0.4996 -0.7451 0.5333 0.2869 0.5351 0.5092 -0.3863 -0.6384 -0.3863 0 0 0 0 0 0 0

31 2010 © University of Michigan 31 Latent Semantic Indexing A [n x m] ≈ U [n x r]   r x r] (V [m x r] ) T A: n x m matrix (n documents, m terms) U: n x r matrix (n documents, r concepts)  : r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix) V: m x r matrix (m terms, r concepts)

32 2010 © University of Michigan Latent semantic indexing (LSI) Dimensionality reduction = identification of hidden (latent) concepts Query matching in latent space LSI matches documents even if they don’t have words in common; –If they share frequently co-occurring terms

33 2010 © University of Michigan 33 Back to the CS-MED example (Slide from C. Faloutsos’s talk)

34 2010 © University of Michigan 34 Example of LSI data inf retrieval brain lung = CS MD xx CS-concept MD-concept Term rep of concept (Slide adapted from C. Faloutsos’s talk) Strength of CS-concept Dim. Reduction A = U  V T

35 2010 © University of Michigan 35 How to Map Query/Doc to the Same Concept Space? q T concept = q T V d T concept = d T V data inf. retrieval brain lung qT=qT= = Similarity with CS-concept dT=dT= 0 1 1 0 01.16 0 (Slide adapted from C. Faloutsos’s talk)

36 2010 © University of Michigan Useful pointers http://lsa.colorado.edu http://lsi.research.telcordia.com http://www.cs.utk.edu/~lsi

37 2010 © University of Michigan Readings MRS18 MRS17, MRS19 MRS20

38 2010 © University of Michigan Problem of LSI Concepts/Topics are hard to interpret New document/query vectors could have negative values Lack of statistical interpretation Probabilistic latent semantic indexing… 38

39 2010 © University of Michigan 39 General Idea of Probabilistic Topic Models Modeling a topic/subtopic/theme with a multinomial distribution (unigram LM) Modeling text data with a mixture model involving multinomial distributions –A document is “generated” by sampling words from some multinomial distribution –Each time, a word may be generated from a different distribution –Many variations of how these multinomial distributions are mixed Topic mining = Fitting the probabilistic model to text Answer topic-related questions by computing various kinds of conditional probabilities based on the estimated model (e.g., p(time | topic), p(time | topic, location))

40 2010 © University of Michigan 40 Document as a Sample of Mixed Topics Applications of topic models: –Summarize themes/aspects –Facilitate navigation/browsing –Retrieve documents –Segment documents –Many others How can we discover these topic word distributions? Topic  1 Topic  k Topic  2 … Background B government 0.3 response 0.2... donate 0.1 relief 0.05 help 0.02... city 0.2 new 0.1 orleans 0.05... is 0.05 the 0.04 a 0.03... [ Criticism of government response to the hurricane primarily consisted of criticism of its response to the approach of the storm and its aftermath, specifically in the delayed response ] to the [ flooding of New Orleans. … 80% of the 1.3 million residents of the greater New Orleans metropolitan area evacuated ] …[ Over seventy countries pledged monetary donations or other assistance]. …

41 2010 © University of Michigan 41 Probabilistic Latent Semantic Analysis/Indexing (PLSA/PLSI) [Hofmann 99] Mix k multinomial distributions to generate a document Each document has a potentially different set of mixing weights which captures the topic coverage When generating words in a document, each word may be generated using a DIFFERENT multinomial distribution (this is in contrast with the document clustering model where, once a multinomial distribution is chosen, all the words in a document would be generated using the same model) We may add a background distribution to “attract” background words

42 2010 © University of Michigan PLSI (a.k.a. Aspect Model) Every document is a mixture of underlying (latent) K aspects (topics) with mixture weights p(z|d) –How is this related to LSI? Each aspect is represented by a distribution of words p(w|z) Estimate p(z|d) and p(w|z) using EM algorithm

43 2010 © University of Michigan 43 PLSI as a Mixture Model Topic z 1 Topic z k Topic z 2 … Document d Background B warning 0.3 system 0.2.. aid 0.1 donation 0.05 support 0.02.. statistics 0.2 loss 0.1 dead 0.05.. is 0.05 the 0.04 a 0.03.. kk 11 22 B B W p(z 1 |d) 1 - B “Generating” word w in doc d in the collection Parameters: B =noise-level (manually set) P(z|d) and p(w|z) are estimated with Maximum Likelihood ? ? ? ? ? ? ? ? ? ? ? p(z 2 |d) p(z k |d)

44 2010 © University of Michigan Parameter Estimation using EM Algorithm We have the equation for log-likelihood function from the PLSI model, which we want to maximize: Maximizing likelihood using Expectation Maximization

45 2010 © University of Michigan EM Steps E-Step –Expectation step where expectation of the likelihood function is calculated with the current parameter values M-Step –Update the parameters with the calculated posterior probabilities –Find the parameters that maximizes the likelihood function

46 2010 © University of Michigan E Step It is the probability that a word w occurring in a document d, is explained by topic z

47 2010 © University of Michigan M Step All these equations use p(z|d,w) calculated in E Step Converges to a local maximum of the likelihood function We will see more when we talk about topic modeling

48 2010 © University of Michigan Example of PLSI 48

49 2010 © University of Michigan Topics represented as word distributions Topics are interpretable! 49 - Example of topics found from blog articles about “Hurricane Katrina”


Download ppt "2010 © University of Michigan Latent Semantic Indexing SI650: Information Retrieval Winter 2010 School of Information University of Michigan 1."

Similar presentations


Ads by Google