Packing to fewer dimensions

Slides:

Advertisements

Similar presentations

Eigen Decomposition and Singular Value Decomposition

Advertisements

3D Geometry for Computer Graphics

Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.

Covariance Matrix Applications

Latent Semantic Analysis

Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18.

Dimensionality Reduction PCA -- SVD

INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING Crista Lopes.

From last time What’s the real point of using vector spaces?: A user’s query can be viewed as a (very) short document. Query becomes a vector in the same.

What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.

Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.

CS347 Lecture 4 April 18, 2001 ©Prabhakar Raghavan.

1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005

TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.

1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006

Lecture 20 SVD and Its Applications Shang-Hua Teng.

E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:

DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD

CS276A Text Retrieval and Mining Lecture 15 Thanks to Thomas Hoffman, Brown University for sharing many of these slides.

Information Retrieval Latent Semantic Indexing. Speeding up cosine computation What if we could take our vectors and “pack” them into fewer dimensions.

5.1 Orthogonality.

Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.

Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18.

Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])

1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.

CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.

Latent Semantic Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.

Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar.

Introduction to Information Retrieval Lecture 19 LSI Thanks to Thomas Hofmann for some slides.

SVD: Singular Value Decomposition

CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:

Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.

Text Categorization Moshe Koppel Lecture 12:Latent Semantic Indexing Adapted from slides by Prabhaker Raghavan, Chris Manning and TK Prasad.

SINGULAR VALUE DECOMPOSITION (SVD)

Latent Semantic Indexing

1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret

1 CS 430: Information Discovery Lecture 11 Latent Semantic Indexing.

ITCS 6265 Information Retrieval & Web Mining Lecture 16 Latent semantic indexing Thanks to Thomas Hofmann for some slides.

Web Search and Data Mining Lecture 4 Adapted from Manning, Raghavan and Schuetze.

Unsupervised Learning II Feature Extraction

Boot Camp in Linear Algebra TIM 209 Prof. Ram Akella.

CS246 Linear Algebra Review. A Brief Review of Linear Algebra Vector and a list of numbers Addition Scalar multiplication Dot product Dot product as a.

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"

Eigen & Singular Value Decomposition

CS479/679 Pattern Recognition Dr. George Bebis

Review of Matrix Operations

Packing to fewer dimensions

CS276A Text Information Retrieval, Mining, and Exploitation

Lecture: Face Recognition and Feature Reduction

LSI, SVD and Data Management

Packing to fewer dimensions

Singular Value Decomposition

Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"

SVD: Physical Interpretation and Applications

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"

Recitation: SVD and dimensionality reduction

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"

Symmetric Matrices and Quadratic Forms

Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)

Dimension versus Distortion a.k.a. Euclidean Dimension Reduction

Lecture 13: Singular Value Decomposition (SVD)

Latent Semantic Indexing

Eigenvalues and Eigenvectors

Lecture 20 SVD and Its Applications

Symmetric Matrices and Quadratic Forms

Latent Semantic Analysis

Presentation transcript:

Packing to fewer dimensions Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Packing to fewer dimensions Paolo Ferragina Dipartimento di Informatica Università di Pisa

Speeding up cosine computation Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Speeding up cosine computation What if we could take our vectors and “pack” them into fewer dimensions (say 50,000100) while preserving distances? Now, O(nm) to compute cos(d,q) for all n docs Then, O(km+kn) where k << n,m Two methods: “Latent semantic indexing” Random projection

Briefly LSI is data-dependent Random projection is data-independent Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Briefly LSI is data-dependent Create a k-dim subspace by eliminating redundant axes Pull together “related” axes – hopefully car and automobile Random projection is data-independent Choose a k-dim subspace that guarantees good stretching properties with high probability between any pair of points. What about polysemy ?

Latent Semantic Indexing Sec. 18.4 Latent Semantic Indexing courtesy of Susan Dumais

Notions from linear algebra Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Notions from linear algebra Matrix A, vector v Matrix transpose (At) Matrix product Rank Eigenvalues l and eigenvector v: Av = lv Example

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Overview of LSI Pre-process docs using a technique from linear algebra called Singular Value Decomposition Create a new (smaller) vector space Queries handled (faster) in this new space

Singular-Value Decomposition Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Singular-Value Decomposition Recall m  n matrix of terms  docs, A. A has rank r  m,n Define term-term correlation matrix T=AAt T is a square, symmetric m  m matrix Let U be m  r matrix of r eigenvectors of T Define doc-doc correlation matrix D=AtA D is a square, symmetric n  n matrix Let V be n  r matrix of r eigenvectors of D

A’s decomposition Vt S U A Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" A’s decomposition Given U (for T, m  r) and V (for D, n  r) formed by orthonormal columns (unit dot-product) It turns out that A = U S Vt Where S is a diagonal matrix with the eigenvalues of T=AAt in decreasing order. mn = mr rn rr Vt S U A

Dimensionality reduction Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Dimensionality reduction Fix some k << r, zero out all but the k biggest eigenvalues in S [choice of k is crucial] Denote by Sk this new version of S, having rank k Typically k is about 100, while r (A’s rank) is > 10,000 document k k k = useless due to 0-col/0-row of Sk r Sk S Vt Ak A U r x n k x n m x r m x k

A running example

A running example

A running example

Guarantee Ak is a pretty good approximation to A: Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Guarantee Ak is a pretty good approximation to A: Relative distances are (approximately) preserved Of all m  n matrices of rank k, Ak is the best approximation to A wrt the following measures: minB, rank(B)=k ||A-B||2 = ||A-Ak||2 = sk+1 minB, rank(B)=k ||A-B||F2 = ||A-Ak||F2 = sk+12+ sk+22+...+ sr2 Frobenius norm ||A||F2 = s12+ s22+...+ sr2

Reduction We use Xk to define how to project A: Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" U,V are formed by orthonormal eigenvectors of the matrices D,T Reduction Since we are interested in doc/doc correlation, we consider: D=At A =(U S V t)t (U S V t) = (S V t)t (S V t) Hence X = S Vt is a matrix r x n, may play the role of A To reduce its size we set Xk = Sk Vt is a matrix k x n and thus get At A  Xkt Xk (both are n x n matrices) We use Xk to define how to project A: Since Xk = Sk Vk t  Xk = Ukt A (use def of SVD of A) Since Xk may play role of A, its cols are proj. docs Similarly Q can be interpreted as a new col of A and thus it is enough to multiply Ukt times Q to get the projected query, O(km) time A = U S VT , AT = V S UT , AT U S-1 = V

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Which are the concepts ? c-th concept = c-th col of Uk (which is m x k) Uk[i][c] = strength of association between c-th concept and i-th term Vtk[c][j] = strength of association between c-th concept and j-th document Projected document: d’j = Utk dj d’j [c] = strenght of concept c in dj Projected query: q’ = Utk q q’[c] = strenght of concept c in q

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Random Projections Paolo Ferragina Dipartimento di Informatica Università di Pisa Slides only !

An interesting math result Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" An interesting math result Lemma (Johnson-Linderstrauss, ‘82) Let P be a set of n distinct points in m-dimensions. Given e > 0, there exists a function f : P  IRk such that for every pair of points u,v in P it holds: (1 - e) ||u - v||2 ≤ ||f(u) – f(v)||2 ≤ (1 + e) ||u-v||2 Where k = O(e-2 log n) f() is called JL-embedding Setting v=0 we also get a bound on f(u)’s stretching!!!

What about the cosine-distance ? Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" What about the cosine-distance ? f(u)’s, f(v)’s stretching substituting formula above for ||u-v||2 2 f(u) * f(v) <= ||f(u)||^2 + ||f(v)||^2 - ||f(u) – f(v)||^2 <= (1+eps) ||u||^2 + (1+eps) ||v||^2 – (1-eps) ||u-v||^2 2f(u)f(v) <= (1+eps) (||u||^2 + ||v||^2) – (1-eps) (||u||^2 + ||v||^2 – 2 uv) = 2 eps (||u||^2 + ||v||^2) + (1-eps) (2 uv)

How to compute a JL-embedding? Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" How to compute a JL-embedding? If we set the projection matrix P = pi,j as a random m x k matrix, where its components are independent random variables with one of the following two distributions: 2 E[pi,j] = 0 Var[pi,j] = 1

Finally... Random projections hide large constants Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Finally... Random projections hide large constants k  (1/e)2 * log n, so k may be large… it is simple and fast to compute LSI is intuitive and may scale to any k optimal under various metrics but costly to compute, do exist good libraries