Download presentation
Presentation is loading. Please wait.
1
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval. SIAM 1999. Reading Assignment: Chapter 4.
2
Outline Matrix Decompositions QR Factorization Singular Value Decomposition Updating Techniques
3
Matrix Decomposition To produce a reduced-rank approximation of the m n term by document matrix A, one must identify the dependence between columns or rows of the matrix A. For a rank-k matrix, the k basis vectors of its column space serve in place of its n column vectors to represent its column space.
4
QR Factorization The QR factorization of matrix A is defined as where Q is an m m orthogonal matrix A square matrix is orthogonal if its columns are orthonormal. i.e., if q j denotes a column of the orthogonal matrix Q, then q j has unit Euclidean norm (|| q j || 2 = 1) for j = 1,2, …, m and it is orthogonal to all other columns of Q ((q j T q i )1/2 = 0 for all i ≠ j). The rows of Q are also orthonormal, i.e. Q T Q = QQ T = I. Such factorization exists for any matrix A. There are many ways to do the factorization.
5
QR Factorization Given A = QR, the columns of the matrix A are all linear combinations of the columns of Q. Thus, a subset of k of the columns of Q form a basis for the column space of A, where k = rank(A)
6
QR Factorization: Example
8
QR Factorization of the previous example can be represented as Note that the first 7 columns of Q, Q 1, are orthonormal And hence constitute a basis for the column space of A. The bottom zero submatrix of R is not always guaranteed to be generated automatically from the QR factorization, and hence may need to apply column pivoting in order to guarantee the zero submatrix. Q 2 does not contribute to producing any nonzero value in A
9
QR Factorization One motivation for using QR factorization is that the basis vectors can be used to describe the semantic content of the corresponding text collection. The cosines of the angles j between a query vector q and document vectors a j Note that for the query “Child Proofing” it gives exactly the same cosines. Why?
10
Frobenius Matrix Norm Definition: The Frobenius matrix norm of an m n matrix B = [b ij ], ||.|| F is defined by
11
Low Rank Approximation for QR Factorization Initially, the rank of A is not known. However, after performing the QR factorization, its rank is obviously the rank of _______ With column pivoting, we know that there exists a permutation matrix P such that AP = QR where the larger entries of R are moved to the upper left corner. Such arrangement, if possible, partitions R where the smallest entries are isolated in the bottom submatrix.
12
Low Rank Approximation for QR Factorization
13
Computing Redefining R 22 to be the 4 2 zero matrix, the modified upper triangular matrix R has rank 5 rather than 7. Hence, the matrix has rank ____ Show that ||E|| F = ||R 22 || F. Show that ||E|| F / ||A|| F = || R 22 || F / ||R|| F = 0.3237 Therefore, the relative change in R, 32.37%, yields the same relative change in A. With r=4, the relative change is 76%.
14
Low Rank Approximation for QR Factorization: Example
15
Comparing Cosine Similarities for the Query: “Child Proofing” DocAr=5r=4 20.408 3 0.50.309 4000.184 50.5 6
16
Comparing Cosine Similarities for the Query: “Child Home Safety” DocAr=5r=4 20.667 310.8160.756 40.25800.1 5000 6000 7000.356
17
Singular Value Decomposition While QR factorization provides a reduced rank basis for the column space, no information is provided about the row space of A. SVD can provide reduced rank approximation for both spaces rank-k approximation to A of minimal change for any value of k.
18
Singular Value Decomposition A = U V T where U: m m orthogonal matrix whose columns define the left singular vectors of A V: n n orthogonal matrix whose columns define the right singular vectors of A : m n diagonal matrix containing singular values 1 2 … min{m,n} Such factorization exists for any matrix A.
19
Component Matrices of the SVD
20
SVD vs. QR What is the relationship between the rank of A and the ranks of the matrices in both factorizations? In QR, the first r A columns of Q form a basis for the column space, so do the first r A columns of U. The first r A rows of V T form a basis for the row space of A. The low rank-k approximation in SVD can be done by setting all but the k largest singular values in to zero.
21
SVD Theorem: The low rank-k approximation of SVD is the closest rank-k approximation to A Proven by Eckart and Young It showed that the error in approximating A by A k is given by where A k = U k k V k T Hence, the error in approximating the original matrix is determined by singular values ( k+1, k+2,…, rank(A) )
22
SVD: Example
24
||A – A 6 || F = …… Hence, the relative change in the matrix A is … Therefore, rank-5 approximation may be appropriate in our case. Determining the best rank approximation for any database depends on empirical testing For very large databases, the number could be between 100 and 300. Computational feasibility, rather than accuracy, determines the rank reduction k-rank approximation% Change Rank-67.4% Rank-522.67% Rank-432.49% Rank-356.45%
25
Low Rank Approximations Visual comparison of rank-reduced approximations to A can be misleading Check rank-4 QR approximation vs. the more accurate rank-4 SVD approximation. Rank-4 SVD approximation shows associations made with terms, not originally in the document title e.g. Term 4 (Health) and Term 8 (Safety) in Document 1 (Infant & Toddler First Aid).
27
Query Matching Given the query vector q, to be compared with the columns of the reduced-rank matrix A k. Let e j denotes the j th canonical vector in I n. Then, A k e j represents _______________ It is easy to show that where
28
Query Matching An alternate formula for the cosine computation is Note that which means that the number of retrieved documents using this query matching technique is larger.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.