Information Retrieval Latent Semantic Indexing. Speeding up cosine computation What if we could take our vectors and “pack” them into fewer dimensions.

Slides:



Advertisements
Similar presentations
Eigen Decomposition and Singular Value Decomposition
Advertisements

3D Geometry for Computer Graphics
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Covariance Matrix Applications
Latent Semantic Analysis
Dimensionality Reduction PCA -- SVD
From last time What’s the real point of using vector spaces?: A user’s query can be viewed as a (very) short document. Query becomes a vector in the same.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Hinrich Schütze and Christina Lioma
CS347 Lecture 4 April 18, 2001 ©Prabhakar Raghavan.
1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction
Singular Value Decomposition
1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing 1.
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
Lecture 20 SVD and Its Applications Shang-Hua Teng.
Crawling Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 20.1, 20.2 and 20.3.
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
Boot Camp in Linear Algebra Joel Barajas Karla L Caballero University of California Silicon Valley Center October 8th, 2008.
CS276A Text Retrieval and Mining Lecture 15 Thanks to Thomas Hoffman, Brown University for sharing many of these slides.
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.
Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18.
Compiled By Raj G. Tiwari
Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])
Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.
Linear Algebra Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.
Latent Semantic Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar.
Matrix Factorization and Latent Semantic Indexing 1 Lecture 13: Matrix Factorization and Latent Semantic Indexing Web Search and Mining.
Introduction to Information Retrieval Lecture 19 LSI Thanks to Thomas Hofmann for some slides.
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Pandu Nayak.
SVD: Singular Value Decomposition
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
Text Categorization Moshe Koppel Lecture 12:Latent Semantic Indexing Adapted from slides by Prabhaker Raghavan, Chris Manning and TK Prasad.
SINGULAR VALUE DECOMPOSITION (SVD)
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Pandu Nayak.
Latent Semantic Indexing
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
Introduction to Linear Algebra Mark Goldman Emily Mackevicius.
Review of Matrix Operations Vector: a sequence of elements (the order is important) e.g., x = (2, 1) denotes a vector length = sqrt(2*2+1*1) orientation.
ITCS 6265 Information Retrieval & Web Mining Lecture 16 Latent semantic indexing Thanks to Thomas Hofmann for some slides.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Natural Language Processing Topics in Information Retrieval August, 2002.
Web Search and Data Mining Lecture 4 Adapted from Manning, Raghavan and Schuetze.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Unsupervised Learning II Feature Extraction
Boot Camp in Linear Algebra TIM 209 Prof. Ram Akella.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Review of Matrix Operations
Packing to fewer dimensions
CS276A Text Information Retrieval, Mining, and Exploitation
LSI, SVD and Data Management
Packing to fewer dimensions
Singular Value Decomposition
Feature space tansformation methods
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Lecture 13: Singular Value Decomposition (SVD)
Packing to fewer dimensions
Presentation transcript:

Information Retrieval Latent Semantic Indexing

Speeding up cosine computation What if we could take our vectors and “pack” them into fewer dimensions (say 50,000  100) while preserving distances? Two methods: “Latent semantic indexing” Random projection

Two approaches LSI is data-dependent Create a k-dim subspace by eliminating redundant axes Pull together “related” axes – hopefully car and automobile Random projection is data-independent Choose a k-dim subspace that guarantees probable stretching properties between pair of points.

Notions from linear algebra Matrix A, vector v Matrix transpose (A t ) Matrix product Rank Eigenvalues  and eigenvector v: Av = v

Overview of LSI Pre-process docs using a technique from linear algebra called Singular Value Decomposition Create a new (smaller) vector space Queries handled in this new vector space

Example 16 terms 17 docs

Intuition (contd) More than dimension reduction: Derive a set of new uncorrelated features (roughly, artificial concepts), one per dimension. Docs with lots of overlapping terms stay together Terms also get pulled together onto the same dimension Each term or document is then characterized by a vector of weights indicating its strength of association with each of these underlying concepts Ex. car and automobile get pulled together, since co-occur in docs with tires, radiator, cylinder,… Here comes “semantic” !!!

Singular-Value Decomposition Recall m  n matrix of terms  docs, A. A has rank r  m,n Define term-term correlation matrix T=AA t T is a square, symmetric m  m matrix Let P be m  r matrix of eigenvectors of T Define doc-doc correlation matrix D=A t A D is a square, symmetric n  n matrix Let R be n  r matrix of eigenvectors of D

A’s decomposition Do exist matrices P (for T, m  r) and R (for D, n  r) formed by orthonormal columns (unit dot-product) It turns out that A = P  R t Where  is a diagonal matrix with the eigenvalues of T=AA t in decreasing order. = A P  RtRt mnmnmrmr rrrr rnrn

 For some k << r, zero out all but the k biggest eigenvalues in  [choice of k is crucial] Denote by  k this new version of , having rank k Typically k is about 100, while r ( A’s rank ) is > 10,000 = P kk RtRt Dimensionality reduction AkAk document useless due to 0-col/0-row of  k m x r r x n r k k k 00 0 A m x k k x n

Guarantee A k is a pretty good approximation to A: Relative distances are (approximately) preserved Of all m  n matrices of rank k, A k is the best approximation to A wrt the following measures: min B, rank(B)=k ||A-B|| 2 = ||A-A k || 2 =  k  min B, rank(B)=k ||A-B|| F 2 = ||A-A k || F 2 =  k  2  k+2 2  r 2 Frobenius norm ||A|| F 2 =   2   2  r 2

Reduction X k =  k R t is the doc-matrix reduced to k<n dim Take the doc-correlation matrix: It is D=A t A =(P  R t ) t (P  R t ) = (  R t ) t (  R t ) Approx  with  k, and thus get A t A  X k t X k We use X k to approx A: X k =  k R t = P k t A. This means that to reduce a doc/query vector is enough to multiply it by P k t (i.e. k x m matrix) Cost of sim(q,d), for all d, is O(kn+km) instead of O(mn) R,P are formed by orthonormal eigenvectors of the matrices D,T

Which are the concepts ? c-th concept = c-th row of P k t (which is k x m) Denote it by P k t [c], note its size is m = #terms P k t [c][i] = strength of association between c-th concept and i-th term Projected document: d’ j = P k t d j d’ j [c] = strenght of concept c in d j

Information Retrieval Random Projection

An interesting math result! Setting v=0 we also get a bound on f(u)’s stretching!!!

What about the cosine-distance ? f(u)’s, f(v)’s stretching

Defining the projection matrix R’s columns k

Concentration bound!!! Is R a JL-embedding?

Gaussians are good!! NOTE: Every col of R is unitary and uniformly distributed over the unit-sphere; moreover, the k cols of R are orthonormal on average.

A practical-theoretical idea !!! E[r i,j ] = 0 Var[r i,j ] = 1

Question !! Various theoretical results known. What about practical cases?