CS246 Topic-Based Models
Motivation Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector model? Q: Is it desirable? Q: What can we do?
Topic-Based Models Index documents based on “topics” not by individual terms Return a document if it shares the same topic with the query We can return a document with “automobile” for the query “car” Much fewer “topics” than “terms” Topic-based index can be more compact than term-based index
Example (1) Two topics: “Car”, “Movies” Four terms: car, automobile, movie, theater Topic-term matrix Document-topic matrix Topiccarautomobilemovietheater “Car” “Movie” “Car”“Movie” doc101 doc210 doc
Example (2) But what we have is document-term matrix!!! How are the three matrices related? carautomobilemovietheater doc doc doc
Linearity Assumption A document is generated as a topic-weighted linear combination of topic-term vectors A simplifying assumption on document generation doc1 = 0 (1,0.9, 0,0) + 1 (0,0,1,0.8) = ( 0, 0, 1, 0.8) doc3 = 0.8 (1,0.9, 0,0) (0,0,1,0.8) = (0.8,0.72, 0.2, 0.16) Topiccarautomobilemovietheater “Car” “Movie” carautomobilemovietheater doc doc doc “Car”“Movie” doc101 doc210 doc
Topic-Based Index as Matrix Decomposition
# topics << # terms, # topics << # docs Decompose (doc-term) matrix to two matrices of rank-K (K: # topics) Of course, decomposition will be approximate for real data doc topic term topic = X
Topic-Based Index as Rank-K Approximation Q: How to choose the two decomposed matrices? What is the “best” decomposition? Latent Semantic Index (LSI) Find the decomposition that is the “closest” to the original matrix Singular-Value Decomposition (SVD) A decomposition method that leads to the best rank-K approximation We will spend the next few hours to learn about SVD and its meaning Basic understanding of linear algebra will be very useful for both IR and datamining
A Brief Review of Linear Algebra Vector and a list of numbers Addition Scalar multiplication Dot product Dot product as a projection Q: (1, 0) vs (0, 1). Are they the same vectors? A: Choice of basis determines the “meaning” of the numbers Matrix Matrix multiplication Four ways to look at matrix multiplication Matrix as vector transformation
Change of Coordinates (1) Two coordinate systems Q: What are the coordinates of (2,0) under the second coordinate system? Q: What about (1,1)?
Change of Coordinates (2) In general, we get the new coordinates of a vector under the new basis vectors by multiplying the original coordinates with the following matrix Verify with previous example Q: What does the above matrix look like? How can we identify a coordinate-change matrix?
Matrix and Change of Coordinates vectors are orthonormal to each other Orthonormal matrix: An orthonormal matrix can be interpreted as change-of- coordinate transformation The rows of the matrix Q are the new basis vectors
Linear Transformation Linear transformation Every linear transformation can be represented as a matrix By selecting appropriate basis vectors Matrix form of a linear transformation can be obtained simply by learning how the basis vectors transform Verify with 45 degree rotation. What transformations are possible for linear transformation?
Linear Transformation that We Know Rotation Stretching Anything else? Claim: Any linear transformation is a stretching followed by a rotation “Meaning” of singular value decomposition An important result of linear algebra Let us learn why this is the case
Rotation Matrix form of rotation? What property will it have? Remember Rotation matrix R Orthonormal matrix ’s are unit basis vectors as well Orthonormal matrix Change of coordinates Rotation
Stretching (1) Q: Matrix form of stretching by 3 along x, y, z axes in 3D? Q: Matrix form of stretching by 3 along x axis and by 2 along y axis in 3D. Q: Stretching matrix diagonal matrix?
Stretching (2) Q: Matrix form of stretching by 3 along and by 2 along ? Verify by transforming (1,1) and (-1, 1) Decomposition of T = Q T’ Q T shows the transformation in a different coordinate system Under the matrix form, the simplicity of the stretching transformation may not be obvious Q: What if we chose as the basis?
Stretching (3) Under a good choice of basis vectors, orthogonal- stretching transformation can always be represented as a diagonal matrix Q: How can we tell whether a matrix corresponds to an orthogonal-stretching transformation?
Stretching – Orthogonal Stretching (1) Remember that this is orthogonal-stretching along If a transformation is orthogonal stretching, we should always be able to represent it as QDQ T for some Q, where Q shows the stretching axes Q: What is the matrix form of the transformation that stretches by 5 along (4/5, 3/5) and by 4 along (-3/5, 4/5)?
Stretching – Orthogonal Stretching (2) Q: Given a matrix, how do we know whether it is orthogonal-stretching? A: When it can be decomposed to T = QDQ T A: Spectral Theorem Any symmetric matrix T can always be decomposed into T = QDQ T Symmetric matrix orthogonal stretching Q: How can we decompose T to QDQ T ? A: If T stretches along X, then TX = X for some. X: eigenvector of T : eigenvalue of T Solve the equation for and X
Eigen Values, Eigen Vectors and Orthogonal Stretching Eigenvector: stretching axis Eigenvalue: stretching factor All eigenvectors are orthogonal Orthogonal stretching Symmetric matrix (spectral theorem) Example Q: What transformation is this?
Singular Value Decomposition (SVD) Any linear transformation T can be decomposed to T = R S (R: rotation, S: orthogonal stretching) One of the basic results of linear algebra In matrix form, any matrix T can be decomposed to Diagonal entries in D: singular values Example Q: What transformation is this?
Singular Value Decomposition (2) Q: For (n x m) matrix T, what will be the dimension of the three matrices after SVD? Q: What is the meaning of non-square diagonal matrix? The diagonal matrix is also responsible for projection (or dimension padding).
Singular Values vs Eigenvalues Q: What is this transformation? A: Q 1 – eigenvectors of T T T D – square root of eigenvalues of T T T. Similarly, Q 2 – eigenvectors of TT T D – square root of eigenvalues of TT T. SVD can be done by computing eigenvalues and eigenvectors of T T T and TT T
SVD as Matrix Approximation Q: If we want to reduce the rank of T to 2, what will be a good choice? The best rank-k approximation of any matrix T is to keep the first-k entries of its SVD.
SVD Approximation Example: 1000 x 1000 matrix with (0…255)
Image of original matrix 1000x1000
SVD. Rank 1 approximation
SVD. Rank 10 approximation
SVD. Rank 100 approximation
Original vs Rank 100 approximation Q: How many numbers do we keep for each?
Back to LSI LSI: decompose (doc-term) matrix to two matrices of rank-K Our goal is to find the “best” rank-K approximation Apply SVD, keep the top-K singular values, meaning that we keep the first K column and the first K rows of the first and third matrix after SVD. doc topic term topic = X
LSI and SVD LSI doc topic term topic = X doc term = SVD
LSI and SVD LSI summary Formulate the topic-based indexing problem as rank-K matrix approximation problem Use SVD to find the best rank-K approximation When applied to real data, 10-20% improvement reported Using LSI was the road to fame for Excite in early days
Limitations of LSI Q: Any problems with LSI? Problems with LSI Scalability SVD is known to be difficult to perform for a large data Interpretability Extracted document-topic matrix is impossible to interpret Difficult to understand why we get good/bad results from LSI for some queries Q: Any way to develop more interpretable topic-based indexing? Topic for next lecture
Summary Topic-based indexing Synonym and polyseme problem Index documents by topic, not by terms Latent Semantic Index (LSI) Document is a linear combination of its topic vector and the topic- term vectors Formulate the problem as a rank-K matrix approximation problem Uses SVD to find the best approximation Basic linear algebra Linear transformation, matrix, stretching and rotation Orthogonal stretching, diagonal matrix, symmetric matrix, eigenvalues and eigenvectors Rotation, change of coordinate, and orthonormal matrix SVD and its implication as a linear transformation