2010 © University of Michigan Latent Semantic Indexing SI650: Information Retrieval Winter 2010 School of Information University of Michigan 1.

Slides:



Advertisements
Similar presentations
Eigen Decomposition and Singular Value Decomposition
Advertisements

Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Information retrieval – LSI, pLSI and LDA
Microarray Data Analysis (Lecture for CS397-CXZ Algorithms in Bioinformatics) March 19, 2004 ChengXiang Zhai Department of Computer Science University.
Dimensionality Reduction PCA -- SVD
Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI) Jasminka Dobša Faculty of organization and informatics,
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Hinrich Schütze and Christina Lioma
Text Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Data Mining Multimedia Databases Text databases Image and video.
CS347 Lecture 4 April 18, 2001 ©Prabhakar Raghavan.
ISP 433/633 Week 10 Vocabulary Problem & Latent Semantic Indexing Partly based on G.Furnas SI503 slides.
1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University
Dimensional reduction, PCA
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
Latent Dirichlet Allocation a generative model for text
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
Vector Space Model Any text object can be represented by a term vector Examples: Documents, queries, sentences, …. A query is viewed as a short document.
1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing 1.
Information Retrieval: Models and Methods October 15, 2003 CMSC Gina-Anne Levow.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
Singular Value Decomposition and Data Management
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
CS276A Text Retrieval and Mining Lecture 15 Thanks to Thomas Hoffman, Brown University for sharing many of these slides.
Chapter 2 Dimensionality Reduction. Linear Methods
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
Latent Semantic Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Automated Essay Grading Resources: Introduction to Information Retrieval, Manning, Raghavan, Schutze (Chapter 06 and 18) Automated Essay Scoring with e-rater.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar.
Matrix Factorization and Latent Semantic Indexing 1 Lecture 13: Matrix Factorization and Latent Semantic Indexing Web Search and Mining.
Introduction to Information Retrieval Lecture 19 LSI Thanks to Thomas Hofmann for some slides.
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Pandu Nayak.
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
Text Categorization Moshe Koppel Lecture 12:Latent Semantic Indexing Adapted from slides by Prabhaker Raghavan, Chris Manning and TK Prasad.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 7. Topic Extraction.
SINGULAR VALUE DECOMPOSITION (SVD)
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Pandu Nayak.
Latent Semantic Indexing
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.
Latent Semantic Indexing and Probabilistic (Bayesian) Information Retrieval.
LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION
CpSc 881: Machine Learning PCA and MDS. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources.
Latent Dirichlet Allocation
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
10.0 Latent Semantic Analysis for Linguistic Processing References : 1. “Exploiting Latent Semantic Information in Statistical Language Modeling”, Proceedings.
ITCS 6265 Information Retrieval & Web Mining Lecture 16 Latent semantic indexing Thanks to Thomas Hofmann for some slides.
Web Search and Data Mining Lecture 4 Adapted from Manning, Raghavan and Schuetze.
Information Retrieval Search Engine Technology (7) Prof. Dragomir R. Radev.
14.0 Linguistic Processing and Latent Topic Analysis.
Hierarchical Clustering & Topic Models
Document Clustering Based on Non-negative Matrix Factorization
Probabilistic Topic Model
Information Retrieval (7)
LSI, SVD and Data Management
Latent Dirichlet Analysis
Probabilistic Models with Latent Variables
CS246: Latent Dirichlet Analysis
Topic Models in Text Processing
Restructuring Sparse High Dimensional Data for Effective Retrieval
Latent Semantic Analysis
Presentation transcript:

2010 © University of Michigan Latent Semantic Indexing SI650: Information Retrieval Winter 2010 School of Information University of Michigan 1

2010 © University of Michigan … Latent semantic indexing Singular value decomposition …

2010 © University of Michigan Problems with lexical semantics Polysemy –bar, bank, jaguar, hot –tend to reduce precision Synonymy –building/edifice, Large/big, Spicy/hot –tend to reduce recall Relatedness –doctor/patient/nurse/treatment Sparse matrix Need: dimensionality reduction

2010 © University of Michigan Problem in Retrieval 4 Query = “information retrieval” Document 1 = “inverted index precision recall” Document 2 = “welcome to ann arbor” Which one should we rank higher? Query vocabulary & doc vocabulary mismatch! Smoothing won’t help here… If only we can represent documents/queries by topics!

2010 © University of Michigan Latent Semantic Indexing Motivation –Query vocabulary & doc vocabulary mismatch –Need to match/index based on concepts (or topics) Main idea: –Projects queries and documents into a space with “latent” semantic dimensions –Dimensionality reduction: the latent semantic space has fewer dimensions (semantic concepts) –Exploits co-occurrence: Co-occurring terms are projected onto the same dimensions

2010 © University of Michigan 6 Example of “Semantic Concepts” (Slide from C. Faloutsos’s talk)

2010 © University of Michigan Concept Space = Dimension Reduction 7 Number of concepts (K) is always smaller than the number of words (N) or number of documents (M). If we represent a document as a N-dimension vector; and the corpus as an M*N matrix… The goal is to reduce the dimension from N to K. But how can we do that?

2010 © University of Michigan Techniques for dimensionality reduction Based on matrix decomposition (goal: preserve clusters, explain away variance) A quick review of matrices –Vectors –Matrices –Matrix multiplication

2010 © University of Michigan Eigenvectors and eigenvalues An eigenvector is an implicit “direction” for a matrix where v (eigenvector) is non-zero, though λ (eigenvalue) can be any complex number in principle Computing eigenvalues (det = determinant): if A is square (N x N), has r distinct solutions, where 1 <= r <= N For each λ found, you can find v by, or

2010 © University of Michigan Eigenvectors and eigenvalues Example: det (A- I) = (-1- )*(- )-3*2=0 Then: =0; 1 =2; 2 =-3 For   Solutions: x 1 =x 2

2010 © University of Michigan Eigenvectors and eigenvalues Wait, that means there are many eigenvectors for the same eigenvalue… v = (x 1, x 2 ) T ; x 1 = x 2 corresponds to many vectors, e.g., (1, 1) T, (2, 2) T, (650, 650) T … Not surprising … if v is an eigenvector of A, v ’ = c v is also an eigenvector (c is any non-zero constant) 11

2010 © University of Michigan Matrix Decomposition If A is a square (N x N) matrix and it has N linearly independent eigenvectors, it can be decomposed into U  U -1 where U: matrix of eigenvectors (every column)  diagonal matrix of eigenvalues AU = U  U -1 AU =  A = U  U -1

2010 © University of Michigan Example

2010 © University of Michigan Example Eigenvalues are 3, 2, 0 x is an arbitrary vector, yet Sx depends on the eigenvalues and eigenvectors

2010 © University of Michigan 15 What about an arbitrary matrix? A: n x m matrix (n documents, m terms) A = U  V T (as opposed to A = U  U -1 ) U: n x n matrix; V: m x m matrix  n x m diagonal matrix  only values on the diagonal can be non-zero. UU T = I; VV T = I

2010 © University of Michigan SVD: Singular Value Decomposition A=U  V T U is the matrix of orthogonal eigenvectors of AA T V is the matrix of orthogonal eigenvectors of A T A The components of  are the eigenvalues of A T A This decomposition exists for all matrices, dense or sparse If A has 5 columns and 3 rows, then U will be 5x5 and V will be 3x3 In Matlab, use [U,S,V] = svd (A)

2010 © University of Michigan Term matrix normalization D1 D2 D3 D4 D5

2010 © University of Michigan Example (Berry and Browne) T1: baby T2: child T3: guide T4: health T5: home T6: infant T7: proofing T8: safety T9: toddler D1: infant & toddler first aid D2: babies & children’s room (for your home) D3: child safety at home D4: your baby’s health and safety: from infant to toddler D5: baby proofing basics D6: your guide to easy rust proofing D7: beanie babies collector’s guide

2010 © University of Michigan Document term matrix

2010 © University of Michigan Decomposition u = v =

2010 © University of Michigan Decomposition  = Spread on the v1 axis

2010 © University of Michigan What does this have to do with dimension reduction? Low rank matrix approximation SVD: A [m*n] = U [m*m]   m*n  V T  n*n  Remember that  is a diagonal matrix of eigenvalues If we only keep the largest r eigenvalues.. A ≈ U [m*r]   r*r  V T  n*r  22

2010 © University of Michigan Rank-4 approximation s4 =

2010 © University of Michigan Rank-4 approximation u*s4*v'

2010 © University of Michigan Rank-4 approximation u*s4: word vector representation of the concepts/topics

2010 © University of Michigan Rank-4 approximation s4*v': new (concept/topic) representation of documents

2010 © University of Michigan Rank-2 approximation s2 =

2010 © University of Michigan Rank-2 approximation u*s2*v'

2010 © University of Michigan Rank-2 approximation u*s2: word vector representation of the concepts/topics

2010 © University of Michigan Rank-2 approximation s2*v': new (concept/topic) representation of documents

2010 © University of Michigan 31 Latent Semantic Indexing A [n x m] ≈ U [n x r]   r x r] (V [m x r] ) T A: n x m matrix (n documents, m terms) U: n x r matrix (n documents, r concepts)  : r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix) V: m x r matrix (m terms, r concepts)

2010 © University of Michigan Latent semantic indexing (LSI) Dimensionality reduction = identification of hidden (latent) concepts Query matching in latent space LSI matches documents even if they don’t have words in common; –If they share frequently co-occurring terms

2010 © University of Michigan 33 Back to the CS-MED example (Slide from C. Faloutsos’s talk)

2010 © University of Michigan 34 Example of LSI data inf retrieval brain lung = CS MD xx CS-concept MD-concept Term rep of concept (Slide adapted from C. Faloutsos’s talk) Strength of CS-concept Dim. Reduction A = U  V T

2010 © University of Michigan 35 How to Map Query/Doc to the Same Concept Space? q T concept = q T V d T concept = d T V data inf. retrieval brain lung qT=qT= = Similarity with CS-concept dT=dT= (Slide adapted from C. Faloutsos’s talk)

2010 © University of Michigan Useful pointers

2010 © University of Michigan Readings MRS18 MRS17, MRS19 MRS20

2010 © University of Michigan Problem of LSI Concepts/Topics are hard to interpret New document/query vectors could have negative values Lack of statistical interpretation Probabilistic latent semantic indexing… 38

2010 © University of Michigan 39 General Idea of Probabilistic Topic Models Modeling a topic/subtopic/theme with a multinomial distribution (unigram LM) Modeling text data with a mixture model involving multinomial distributions –A document is “generated” by sampling words from some multinomial distribution –Each time, a word may be generated from a different distribution –Many variations of how these multinomial distributions are mixed Topic mining = Fitting the probabilistic model to text Answer topic-related questions by computing various kinds of conditional probabilities based on the estimated model (e.g., p(time | topic), p(time | topic, location))

2010 © University of Michigan 40 Document as a Sample of Mixed Topics Applications of topic models: –Summarize themes/aspects –Facilitate navigation/browsing –Retrieve documents –Segment documents –Many others How can we discover these topic word distributions? Topic  1 Topic  k Topic  2 … Background B government 0.3 response donate 0.1 relief 0.05 help city 0.2 new 0.1 orleans is 0.05 the 0.04 a [ Criticism of government response to the hurricane primarily consisted of criticism of its response to the approach of the storm and its aftermath, specifically in the delayed response ] to the [ flooding of New Orleans. … 80% of the 1.3 million residents of the greater New Orleans metropolitan area evacuated ] …[ Over seventy countries pledged monetary donations or other assistance]. …

2010 © University of Michigan 41 Probabilistic Latent Semantic Analysis/Indexing (PLSA/PLSI) [Hofmann 99] Mix k multinomial distributions to generate a document Each document has a potentially different set of mixing weights which captures the topic coverage When generating words in a document, each word may be generated using a DIFFERENT multinomial distribution (this is in contrast with the document clustering model where, once a multinomial distribution is chosen, all the words in a document would be generated using the same model) We may add a background distribution to “attract” background words

2010 © University of Michigan PLSI (a.k.a. Aspect Model) Every document is a mixture of underlying (latent) K aspects (topics) with mixture weights p(z|d) –How is this related to LSI? Each aspect is represented by a distribution of words p(w|z) Estimate p(z|d) and p(w|z) using EM algorithm

2010 © University of Michigan 43 PLSI as a Mixture Model Topic z 1 Topic z k Topic z 2 … Document d Background B warning 0.3 system aid 0.1 donation 0.05 support statistics 0.2 loss 0.1 dead is 0.05 the 0.04 a kk 11 22 B B W p(z 1 |d) 1 - B “Generating” word w in doc d in the collection Parameters: B =noise-level (manually set) P(z|d) and p(w|z) are estimated with Maximum Likelihood ? ? ? ? ? ? ? ? ? ? ? p(z 2 |d) p(z k |d)

2010 © University of Michigan Parameter Estimation using EM Algorithm We have the equation for log-likelihood function from the PLSI model, which we want to maximize: Maximizing likelihood using Expectation Maximization

2010 © University of Michigan EM Steps E-Step –Expectation step where expectation of the likelihood function is calculated with the current parameter values M-Step –Update the parameters with the calculated posterior probabilities –Find the parameters that maximizes the likelihood function

2010 © University of Michigan E Step It is the probability that a word w occurring in a document d, is explained by topic z

2010 © University of Michigan M Step All these equations use p(z|d,w) calculated in E Step Converges to a local maximum of the likelihood function We will see more when we talk about topic modeling

2010 © University of Michigan Example of PLSI 48

2010 © University of Michigan Topics represented as word distributions Topics are interpretable! 49 - Example of topics found from blog articles about “Hurricane Katrina”