An Introduction to Latent Semantic Analysis. 2 Matrix Decompositions Definition: The factorization of a matrix M into two or more matrices M 1, M 2,…,

Slides:



Advertisements
Similar presentations
Eigen Decomposition and Singular Value Decomposition
Advertisements

Eigen Decomposition and Singular Value Decomposition
3D Geometry for Computer Graphics
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
Dimensionality Reduction PCA -- SVD
PCA + SVD.
INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING Crista Lopes.
Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI) Jasminka Dobša Faculty of organization and informatics,
Lecture 19 Singular Value Decomposition
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Latent Semantic Analysis
Hinrich Schütze and Christina Lioma
Some useful linear algebra. Linearly independent vectors span(V): span of vector space V is all linear combinations of vectors v i, i.e.
An Introduction to Latent Semantic Analysis
1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
Singular Value Decomposition in Text Mining Ram Akella University of California Berkeley Silicon Valley Center/SC Lecture 4b February 9, 2011.
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
Indexing by Latent Semantic Analysis Scot Deerwester, Susan Dumais,George Furnas,Thomas Landauer, and Richard Harshman Presented by: Ashraf Khalil.
Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction
Singular Value Decomposition
1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing 1.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
3D Geometry for Computer Graphics
Lecture 20 SVD and Its Applications Shang-Hua Teng.
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
1 CS 430 / INFO 430 Information Retrieval Lecture 9 Latent Semantic Indexing.
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
SVD(Singular Value Decomposition) and Its Applications
Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])
Chapter 2 Dimensionality Reduction. Linear Methods
CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.
Latent Semantic Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Automated Essay Grading Resources: Introduction to Information Retrieval, Manning, Raghavan, Schutze (Chapter 06 and 18) Automated Essay Scoring with e-rater.
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
Indices Tomasz Bartoszewski. Inverted Index Search Construction Compression.
SVD: Singular Value Decomposition
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
Text Categorization Moshe Koppel Lecture 12:Latent Semantic Indexing Adapted from slides by Prabhaker Raghavan, Chris Manning and TK Prasad.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
SINGULAR VALUE DECOMPOSITION (SVD)
CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2013.
Alternative IR models DR.Yeni Herdiyeni, M.Kom STMIK ERESHA.
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
1 CS 430: Information Discovery Lecture 11 Latent Semantic Indexing.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
ITCS 6265 Information Retrieval & Web Mining Lecture 16 Latent semantic indexing Thanks to Thomas Hofmann for some slides.
Web Search and Data Mining Lecture 4 Adapted from Manning, Raghavan and Schuetze.
DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.
Unsupervised Learning II Feature Extraction
Vector Semantics Dense Vectors.
Search Engine and Optimization 1. Agenda Indexing Algorithms Latent Semantic Indexing 2.
CSE 554 Lecture 8: Alignment
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
LSI, SVD and Data Management
Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction
Information Retrieval and Web Search
Restructuring Sparse High Dimensional Data for Effective Retrieval
Latent Semantic Analysis
Presentation transcript:

An Introduction to Latent Semantic Analysis

2 Matrix Decompositions Definition: The factorization of a matrix M into two or more matrices M 1, M 2,…, M n, such that M = M 1 M 2 …M n. Many decompositions exist… – QR Decomposition – Orthogonal and Triangular: LLS, eigenvalue algorithm – LU Decomposition – Lower and Upper Triangular: Solve systems and find determinants – Etc. One is special…

3 Singular Value Decomposition [Strang]: Any m by n matrix A may be factored such that A = U  V T U: m by m, orthogonal, columns are the eigenvectors of AA T V: n by n, orthogonal, columns are the eigenvectors of A T A  : m by n, diagonal, r singular values are the square roots of the eigenvalues of both AA T and A T A

4 SVD Example From [Strang]:

5 SVD Properties U, V give us orthonormal bases for the subspaces of A: – 1st r columns of U: Column space of A – Last m - r columns of U: Left nullspace of A – 1st r columns of V: Row space of A – 1st n - r columns of V: Nullspace of A IMPLICATION: Rank(A) = r

6 Application: Pseudoinverse Given y = Ax, x = A + y For square A, A + = A -1 For any A… A + = V  -1 U T A + is called the pseudoinverse of A. x = A + y is the least-squares solution of y = Ax.

7 Rank One Decomposition Given an m by n matrix A:  n  m with singular values {s 1,...,s r } and SVD A = U  V T, define U = {u 1 | u 2 |... |u m } V = {v 1 | v 2 |... |v n } T Then… A may be expressed as the sum of r rank one matrices

8 Matrix Approximation Let A be an m by n matrix such that Rank(A) = r If s 1  s 2 ...  s r are the singular values of A, then B, rank q approximation of A that minimizes ||A - B|| F, is Proof: S. J. Leon, Linear Algebra with Applications, 5th Edition, p. 414 [Will]

9 Application: Image Compression Uncompressed m by n pixel image: m×n numbers Rank q approximation of image: – q singular values – The first q columns of U (m-vectors) – The first q columns of V (n-vectors) – Total: q × (m + n + 1) numbers

10 Example: Yogi (Uncompressed) Source: [Will] Yogi: Rock photographed by Sojourner Mars mission. 256 × 264 grayscale bitmap  256 × 264 matrix M Pixel values  [0,1] ~ numbers

11 Example: Yogi (Compressed) M has 256 singular values Rank 81 approximation of M: 81 × ( ) = ~ numbers

12 Example: Yogi (Both)

13 Application: Noise Filtering Data compression: Image degraded to reduce size Noise Filtering: Lower-rank approximation used to improve data. – Noise effects primarily manifest in terms corresponding to smaller singular values. – Setting these singular values to zero removes noise effects.

14 Example: Microarrays Source: [Holter] Expression profiles for yeast cell cycle data from characteristic nodes (singular values). 14 characteristic nodes Left to right: Microarrays for 1, 2, 3, 4, 5, all characteristic nodes, respectively.

15 Research Directions Latent Semantic Indexing [Berry] – SVD used to approximate document retrieval matrices. Pseudoinverse – Applications to bioinformatics via Support Vector Machines and microarrays.

The Problem Information Retrieval in the 1980s Given a collection of documents: retrieve documents that are relevant to a given query Match terms in documents to terms in query Vector space method

The Problem The vector space method – term (rows) by document (columns) matrix, based on occurrence – translate into vectors in a vector space one vector for each document – cosine to measure distance between vectors (documents) small angle = large cosine = similar large angle = small cosine = dissimilar

The Problem A quick diversion Standard measures in IR – Precision: portion of selected items that the system got right – Recall: portion of the target items that the system selected

The Problem Two problems that arose using the vector space model: – synonymy: many ways to refer to the same object, e.g. car and automobile leads to poor recall – polysemy: most words have more than one distinct meaning, e.g.model, python, chip leads to poor precision

The Problem Example: Vector Space Model – (from Lillian Lee) auto engine bonnet tyres lorry boot car emissions hood make model trunk make hidden Markov model emissions normalize Synonymy Will have small cosine but are related Polysemy Will have large cosine but not truly related

The Problem Latent Semantic Indexing was proposed to address these two problems with the vector space model for Information Retrieval

Some History Latent Semantic Indexing was developed at Bellcore (now Telcordia) in the late 1980s (1988). It was patented in

LSA But first: What is the difference between LSI and LSA??? – LSI refers to using it for indexing or information retrieval. – LSA refers to everything else.

LSA Idea (Deerwester et al): “We would like a representation in which a set of terms, which by itself is incomplete and unreliable evidence of the relevance of a given document, is replaced by some other set of entities which are more reliable indicants. We take advantage of the implicit higher-order (or latent) structure in the association of terms and documents to reveal such relationships.”

LSA Implementation: four basic steps – term by document matrix (more generally term by context) tend to be sparce – convert matrix entries to weights, typically: L(i,j) * G(i): local and global a_ij -> log(freq(a_ij)) divided by entropy for row (- sum (p logp), over p: entries in the row) – weight directly by estimated importance in passage – weight inversely by degree to which knowing word occurred provides information about the passage it appeared in

LSA Four basic steps – Rank-reduced Singular Value Decomposition (SVD) performed on matrix all but the k highest singular values are set to 0 produces k-dimensional approximation of the original matrix (in least-squares sense) this is the “semantic space” – Compute similarities between entities in semantic space (usually with cosine)

LSA SVD – unique mathematical decomposition of a matrix into the product of three matrices: two with orthonormal columns one with singular values on the diagonal – tool for dimension reduction – similarity measure based on co-occurrence – finds optimal projection into low-dimensional space

LSA SVD – can be viewed as a method for rotating the axes in n-dimensional space, so that the first axis runs along the direction of the largest variation among the documents the second dimension runs along the direction with the second largest variation and so on – generalized least-squares method

A Small Example Technical Memo Titles c1: Human machine interface for ABC computer applications c2: A survey of user opinion of computer system response time c3: The EPS user interface management system c4: System and human system engineering testing of EPS c5: Relation of user perceived response time to error measurement m1: The generation of random, binary, ordered trees m2: The intersection graph of paths in trees m3: Graph minors IV: Widths of trees and well-quasi-ordering m4: Graph minors: A survey

A Small Example – 2 r (human.user) = -.38r (human.minors) = -.29

A Small Example – 3 Singular Value Decomposition {A}={U}{ S }{V} T Dimension Reduction {~A}~={~U}{~ S }{~V} T

A Small Example – 4 {U} =

A Small Example – 5 { S } =

A Small Example – 6 {V} =

A Small Example – 7 r (human.user) =.94r (human.minors) = -.83

A Small Example – 2 reprise r (human.user) = -.38r (human.minors) = -.29

Correlation Raw data

Summary Some Issues – SVD Algorithm complexity O(n^2k^3) n = number of terms k = number of dimensions in semantic space (typically small ~50 to 350) for stable document collection, only have to run once dynamic document collections: might need to rerun SVD, but can also “fold in” new documents

Summary Some issues – Finding optimal dimension for semantic space precision-recall improve as dimension is increased until hits optimal, then slowly decreases until it hits standard vector model run SVD once with big dimension, say k = 1000 – then can test dimensions <= k in many tasks works well, still room for research

Summary Some issues – SVD assumes normally distributed data term occurrence is not normally distributed matrix entries are weights, not counts, which may be normally distributed even when counts are not

Summary Has proved to be a valuable tool in many areas of NLP as well as IR – summarization – cross-language IR – topics segmentation – text classification – question answering – more

Summary Ongoing research and extensions include – Bioinformatics – Security – Search Engines – Probabilistic LSA (Hofmann) – Iterative Scaling (Ando and Lee) – Psychology model of semantic knowledge representation model of semantic word learning