Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.

Slides:



Advertisements
Similar presentations
Eco 5385 Predictive Analytics For Economists Spring 2014 Professor Tom Fomby Director, Richard B. Johnson Center for Economic Studies Department of Economics.
Advertisements

Eigen Decomposition and Singular Value Decomposition
Ch 7.7: Fundamental Matrices
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Covariance Matrix Applications
Latent Semantic Analysis
Matrices A matrix is a rectangular array of quantities (numbers, expressions or function), arranged in m rows and n columns x 3y.
Tensors and Component Analysis Musawir Ali. Tensor: Generalization of an n-dimensional array Vector: order-1 tensor Matrix: order-2 tensor Order-3 tensor.
Dimensionality Reduction PCA -- SVD
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Maths for Computer Graphics
Symmetric Matrices and Quadratic Forms
Sampling algorithms for l 2 regression and applications Michael W. Mahoney Yahoo Research (Joint work with P. Drineas.
Some useful linear algebra. Linearly independent vectors span(V): span of vector space V is all linear combinations of vectors v i, i.e.
Canonical correlations
Singular Value Decomposition
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing 1.
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Lecture 20 SVD and Its Applications Shang-Hua Teng.
Module 6 Matrices & Applications Chapter 26 Matrices and Applications I.
Matrices CS485/685 Computer Vision Dr. George Bebis.
CE 311 K - Introduction to Computer Methods Daene C. McKinney
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 5QF Introduction to Vector and Matrix Operations Needed for the.
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.
Determinants 2 x 2 and 3 x 3 Matrices. Matrices A matrix is an array of numbers that are arranged in rows and columns. A matrix is “square” if it has.
Chapter 2 Dimensionality Reduction. Linear Methods
CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.
D. van Alphen1 ECE 455 – Lecture 12 Orthogonal Matrices Singular Value Decomposition (SVD) SVD for Image Compression Recall: If vectors a and b have the.
Overview Definitions Basic matrix operations (+, -, x) Determinants and inverses.
SINGULAR VALUE DECOMPOSITION (SVD)
Scientific Computing Singular Value Decomposition SVD.
Latent Semantic Indexing
Matrices Matrices A matrix (say MAY-trix) is a rectan- gular array of objects (usually numbers). An m  n (“m by n”) matrix has exactly m horizontal.
2009/9 1 Matrices(§3.8)  A matrix is a rectangular array of objects (usually numbers).  An m  n (“m by n”) matrix has exactly m horizontal rows, and.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 5. Document Representation and Information Retrieval.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.
Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
ITCS 6265 Information Retrieval & Web Mining Lecture 16 Latent semantic indexing Thanks to Thomas Hofmann for some slides.
Instructor: Mircea Nicolescu Lecture 8 CS 485 / 685 Computer Vision.
STROUD Worked examples and exercises are in the text Programme 5: Matrices MATRICES PROGRAMME 5.
2.5 – Determinants and Multiplicative Inverses of Matrices.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Chapter 13 Discrete Image Transforms
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
STROUD Worked examples and exercises are in the text PROGRAMME 5 MATRICES.
Reduced echelon form Matrix equations Null space Range Determinant Invertibility Similar matrices Eigenvalues Eigenvectors Diagonabilty Power.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Unsupervised Learning II Feature Extraction
CS246 Linear Algebra Review. A Brief Review of Linear Algebra Vector and a list of numbers Addition Scalar multiplication Dot product Dot product as a.
Principal Component Analysis
CS479/679 Pattern Recognition Dr. George Bebis
Eco 6380 Predictive Analytics For Economists Spring 2014
CS485/685 Computer Vision Dr. George Bebis
Eco 6380 Predictive Analytics For Economists Spring 2016
Eco 6380 Predictive Analytics For Economists Spring 2016
CSCI N207 Data Analysis Using Spreadsheet
Recitation: SVD and dimensionality reduction
Outline Singular Value Decomposition Example of PCA: Eigenfaces.
Symmetric Matrices and Quadratic Forms
Lecture 13: Singular Value Decomposition (SVD)
Principal Component Analysis
Linear Algebra: Matrix Eigenvalue Problems – Part 2
Symmetric Matrices and Quadratic Forms
Presentation transcript:

Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU

Presentation 18 Brief Introduction to Text Mining And The Singular Value Decomposition Research Assistant: Chad Baldwin

Full SVD for a Term-Document Frequency Matrix

The i-th column of the matrix A, say, contains the counts that occur for each of the m terms in the i-th document, i = 1,2,…,n. The m left-singular vectors of the A matrix are in the columns of the U matrix. Equivalently the left-singular vectors in U are the unit-length eigenvectors of the AA T matrix. The right- singular vectors of the A matrix are represented by the rows of the V T matrix. Equivalently, these right- singular vectors are the unit-length eigenvectors of the A T A matrix. Finally, the singular values arrayed along the diagonal of the Σ matrix are the square roots of the eigenvalues from AA T or A T A. The eigenvalues of these two matrices are the same. These eigenvalues are arranged in the Σ matrix from highest value to lowest value running down the diagonal.

What we want to do is approximate the A matrix by using the first k (k < n) columns of U resulting in a m x k matrix, the first k singular values of Σ resulting in a k x k matrix and the first k rows of resulting in a k x k matrix. Therefore, the k-dimensional SVD approximation of A is

Approximating the Term-Document Frequency Matrix

Let denote the i-th column of the Term- Document Frequency matrix. In our SVD analysis, we want to compress the m x 1 Term- Count vector for the i-th document,,to a k x 1 dimensional vector, say, whose first element is, second element is, …, and k-th element is resulting in the k-dimensional representation of the i-th document as.

By this SVD decomposition we have reduced the m x n Term- Document Frequency matrix to the k x n matrix Then these columns of SVD k can be used as k explanatory variables in classification models (like the logistic model) to help us to discriminate between documents, whether they be binary, nominal, or ordinal classified documents, e.g., (0 = no fraud, 1 = fraud document), (1 = sports article, 0 = non-sports article), or (0 = document containing no praise, all criticism, 1 = document containing mixed praise and criticism, and 2 = document containing all praise).

Some Comments The SVD scores are often supplemented with a few specific (“pet”) words and their document frequency counts. For example, the Wall Street Journal has a text mining index made up of a frequency count of the usage of the word “recession” in a set of financial publications. If the frequency count of the recession word goes up in these certain financial documents, then possibly there is a greater chance of an actual recession developing in the future. Like any input variable, one can initially apply a variable- selection routine to determine which of the “pet” words to include along with the SVD scores.

10 An Example Application of the SVD to Scoring the Federalist Papers of Unknown Authorship SVD Transformation Logistic Regression

Primary References “Taming Text with the SVD” by Russ Albright, Ph.D., SAS Institute Inc., Cary, NC, January, 2004 “Mining Textual Data Using SAS Text Miner for SAS 9 Course Notes”, SAS Inc., Cary, NC