Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction

Slides:



Advertisements
Similar presentations
Eigen Decomposition and Singular Value Decomposition
Advertisements

Introduction to Information Retrieval Outline ❶ Latent semantic indexing ❷ Dimensionality reduction ❸ LSI in information retrieval 1.
Eigen Decomposition and Singular Value Decomposition
3D Geometry for Computer Graphics
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Dimensionality Reduction PCA -- SVD
Lecture 19 Singular Value Decomposition
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Hinrich Schütze and Christina Lioma
Lecture 16 Graphs and Matrices in Practice Eigenvalue and Eigenvector Shang-Hua Teng.
3D Geometry for Computer Graphics
Unsupervised Learning - PCA The neural approach->PCA; SVD; kernel PCA Hertz chapter 8 Presentation based on Touretzky + various additions.
Lecture 19 Quadratic Shapes and Symmetric Positive Definite Matrices Shang-Hua Teng.
1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
Singular Value Decomposition in Text Mining Ram Akella University of California Berkeley Silicon Valley Center/SC Lecture 4b February 9, 2011.
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
Singular Value Decomposition
1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing 1.
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
3D Geometry for Computer Graphics
Lecture 20 SVD and Its Applications Shang-Hua Teng.
Lecture 18 Eigenvalue Problems II Shang-Hua Teng.
Singular Value Decomposition and Data Management
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
Lecture 16 Cramer’s Rule, Eigenvalue and Eigenvector Shang-Hua Teng.
CS276A Text Retrieval and Mining Lecture 15 Thanks to Thomas Hoffman, Brown University for sharing many of these slides.
Information Retrieval Latent Semantic Indexing. Speeding up cosine computation What if we could take our vectors and “pack” them into fewer dimensions.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
SVD(Singular Value Decomposition) and Its Applications
Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18.
Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])
Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.
Chapter 2 Dimensionality Reduction. Linear Methods
CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.
Latent Semantic Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Matrix Factorization and Latent Semantic Indexing 1 Lecture 13: Matrix Factorization and Latent Semantic Indexing Web Search and Mining.
Introduction to Information Retrieval Lecture 19 LSI Thanks to Thomas Hofmann for some slides.
Indices Tomasz Bartoszewski. Inverted Index Search Construction Compression.
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Pandu Nayak.
An Introduction to Latent Semantic Analysis. 2 Matrix Decompositions Definition: The factorization of a matrix M into two or more matrices M 1, M 2,…,
SVD: Singular Value Decomposition
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
Text Categorization Moshe Koppel Lecture 12:Latent Semantic Indexing Adapted from slides by Prabhaker Raghavan, Chris Manning and TK Prasad.
SINGULAR VALUE DECOMPOSITION (SVD)
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Pandu Nayak.
Latent Semantic Indexing
Non-Linear Dimensionality Reduction
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
10.0 Latent Semantic Analysis for Linguistic Processing References : 1. “Exploiting Latent Semantic Information in Statistical Language Modeling”, Proceedings.
1. Systems of Linear Equations and Matrices (8 Lectures) 1.1 Introduction to Systems of Linear Equations 1.2 Gaussian Elimination 1.3 Matrices and Matrix.
ITCS 6265 Information Retrieval & Web Mining Lecture 16 Latent semantic indexing Thanks to Thomas Hofmann for some slides.
Web Search and Data Mining Lecture 4 Adapted from Manning, Raghavan and Schuetze.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Document Clustering Based on Non-negative Matrix Factorization
LSI, SVD and Data Management
Lecture 16 Cramer’s Rule, Eigenvalue and Eigenvector
Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction
Recitation: SVD and dimensionality reduction
Lecture 13: Singular Value Decomposition (SVD)
Lecture 20 SVD and Its Applications
Presentation transcript:

Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction Shang-Hua Teng

Singular Value Decomposition where u1 …ur are the r orthonormal vectors that are basis of C(A) and v1 …vr are the r orthonormal vectors that are basis of C(AT )

Low Rank Approximation and Reduction

The Singular Value Decomposition · · A U S VT = m x n m x m m x n n x n A U S VT = m x n m x r r x r r x n

The Singular Value Reduction · · A U VT m x n m x r r x r r x n = S Ak Uk S VkT = m x n m x k k x k k x n

How Much Information Lost?

Distance between Two Matrices Frobenius Norm of a matrix A. Distance between two matrices A and B

How Much Information Lost?

Approximation Theorem [Schmidt 1907; Eckart and Young 1939] Among all m by n matrices B of rank at most k, Ak is the one that minimizes

Application: Image Compression Uncompressed m by n pixel image: m×n numbers Rank k approximation of image: k singular values The first k columns of U (m-vectors) The first k columns of V (n-vectors) Total: k × (m + n + 1) numbers

Example: Yogi (Uncompressed) Source: [Will] Yogi: Rock photographed by Sojourner Mars mission. 256 × 264 grayscale bitmap  256 × 264 matrix M Pixel values  [0,1] ~ 67584 numbers

Example: Yogi (Compressed) M has 256 singular values Rank 81 approximation of M: 81 × (256 + 264 + 1) = ~  42201 numbers

Example: Yogi (Both)

Eigenface Patented by MIT Utilizes two dimentional, global grayscale images Face is mapped to numbers Create an image subspace(face space) which best discriminated between faces Can be used in properly lit and only frontal images

The Face Database Set of normalized face images Used ORL Face DB

Two-dimensional Embedding We also conducted the locality preserving projections on the face manifold, which are based on the face images of one person with different poses and expressions. The horizontal axis demonstrates the expression variation; The vertical axis shows the pose variation;

EigenFaces Eigenface (PCA) 2’ In this slide, the small images are the first 10 Eigenfaces, Fisherfaces and Laplacianfaces calculated from the face images in the YALE database. We can see the eigenfaces are somewhat like a face since they are derived in the sense of reconstruction; while the fisherfaces and laplacianfaces are not so clear as the eigenfaces. We conducted different experiments to evaluate our algorithm compared with the traditional linear subspace algorithms Eigenfaces, Fisherfaces. There are three databases are evaluated: YALE, PIE and the database collected at Microsoft Research Asia. For the PIE database, we only used the 5 poses near to the frontal view. On the Yale database, we conducted two types of experiments, Leave one out ant the experiment using 6 images each person for training and the other 5 for testing. we use the nearest neighbor classifier for classification. Since Laplacianface method preserves local neighborhood structure. We can see that in all the experiments, Fisherfaces outperform Eigenfaces, and Laplacianfaces perform better than Fisherfaces especially on the MSRA database. I think it is because that there are sufficient samples for each person in the database and manifold structure can be well represented by these samples.

Latent Semantic Analysis (LSA) Latent Semantic Indexing (LSI) Principal Component Analysis (PCA)

Term-Document Matrix Index each document (by human or by computer) fij counts, frequencies, weights, etc Each document can be regarded as a point in m dimensions

Document-Term Matrix Index each document (by human or by computer) fij counts, frequencies, weights, etc Each document can be regarded as a point in n dimensions

Term Occurrence Matrix

c1 c2 c3 c4 c5 m1 m2 m3 m4 human 1 1 interface 1 1 computer 1 1 user 1 1 1 system 1 1 2 response 1 1 time 1 1 EPS 1 1 survey 1 1 trees 1 1 1 graph 1 1 1 minors 1 1

Another Example

Term Document Matrix

LSI using k=2… T D “applications & algorithms” LSI Factor 2 LSI Factor 1 “differential equations” T Each term’s coordinates specified in first K values of its row. Each doc’s coordinates specified in first K values of its column. D

Positive Definite Matrices and Quadratic Shapes

Positive Definite Matrices and Quadratic Shapes For any m x n matrix A, all eigenvalues of AAT and ATA are non-negative Symmetric matrices that have positive eigenvalues are called Positive Definite matrices Symmetric matrices that have non-negative eigenvalues are called Positive semi-definite matrices