Bryan Kelly Asaf Manela Alan Moreira

Slides:

Advertisements

Similar presentations

Dummy Dependent variable Models

Advertisements

Qualitative and Limited Dependent Variable Models Chapter 18.

Tests of Static Asset Pricing Models

Chapter 2 Describing Contingency Tables Reported by Liu Qi.

COMM 472: Quantitative Analysis of Financial Decisions

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.

Linear Regression.

Component Analysis (Review)

2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)

Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.

Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11

Correlation and regression

Model Assessment, Selection and Averaging

What is Statistical Modeling

Visual Recognition Tutorial

Econometric Details -- the market model Assume that asset returns are jointly multivariate normal and independently and identically distributed through.

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

Simple Linear Regression

Today Linear Regression Logistic Regression Bayesians v. Frequentists

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Probability theory 2010 Conditional distributions  Conditional probability:  Conditional probability mass function: Discrete case  Conditional probability.

MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.

1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.

Chapter 13: Inference in Regression

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.

The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.

Chapter 6 & 7 Linear Regression & Correlation

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.

“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington.

Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Discrete Choice Modeling William Greene Stern School of Business New York University.

Discrete Choice Modeling William Greene Stern School of Business New York University.

CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.

Multiple Logistic Regression STAT E-150 Statistical Methods.

Machine Learning 5. Parametric Methods.

Options, Futures, and Other Derivatives, 4th edition © 1999 by John C. Hull 14.1 Value at Risk Chapter 14.

[Part 5] 1/43 Discrete Choice Modeling Ordered Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.

Stats Methods at IC Lecture 3: Regression.

The simple linear regression model and parameter estimation

Bayesian Semi-Parametric Multiple Shrinkage

Chapter 7. Classification and Prediction

Chapter 3: Maximum-Likelihood Parameter Estimation

DEEP LEARNING BOOK CHAPTER to CHAPTER 6

Discrete Random Variables

Author: Konstantinos Drakos Journal: Economica

William Greene Stern School of Business New York University

CH 5: Multivariate Methods

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

10701 / Machine Learning.

Correlation and Simple Linear Regression

Introduction to microeconometrics

CHAPTER 29: Multiple Regression*

Hidden Markov Models Part 2: Algorithms

Tabulations and Statistics

Correlation and Simple Linear Regression

Undergraduated Econometrics

What is Regression Analysis?

Lecture 4: Econometric Foundations

Linear Model Selection and regularization

Chapter 3 Statistical Concepts.

Generally Discriminant Analysis

Product moment correlation

Parametric Methods Berlin Chen, 2005 References:

Multivariate Methods Berlin Chen

Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.

Multivariate Methods Berlin Chen, 2005 References:

Correlation and Simple Linear Regression

Presentation transcript:

Bryan Kelly Asaf Manela Alan Moreira Text Selection Bryan Kelly Asaf Manela Alan Moreira

Summary Propose a new regression methodology for text-based regression Model that perform well in high dimensional data (Number of variables) Model fit text that phrase counts are sparse(most phrases have a count of zero) Application of the method

Two attributes of text Text data is high-dimensional Phrase counts are zero require machine learning techniques

standard econometric approach for descripting counts data Multinomial logistic regression dependent is nominal && more than 2 categories Problem: the number of categories is extremely large Taddy (2015) shows that one can overcome this dimensionality problem by approximating the multinomial with cleverly shifted independent Poisson regressions

DMR model (distributed multinomial regression) Assume response variable Y has a Poisson distribution has the great advantage that the Poisson regression can be distributed across parallel computing units Problem: tends to be a much higher proportion of phrases having zero counts than the Poisson distribution allows

HDMR model Consist of 2 components that are tailored to high dimensionality and sparsity of text data Selection equation (producer’s choice of whether or not to use a particular phrase) Positive count model (describes the choice of how many times a word is used )

DMR model Let ci be a vector of counts in d categories, summing to mi = Σ j cij (counts of word or phrase) let vi be a pv-vector of covariates on observation i = 1 . . . n. (attributes : author, date, etc) An econometrician confronted with modeling counts data may first consider using a multinomial logistic regression:

multinomial can be decomposed into independent Poissons conditional Motivated by this decomposition, Taddy (2015) develops the distributed multinomial regression (DMR), a parallel (independent) Poisson plug-in approximation to the multinomial

so on so forth The parameters for each category j can then be estimated independently with negative log likelihood Approximation removes the communication bottleneck of re-computing summation term and allows for fast and scalable distributed estimation. Taddy (2013, 2015) uses the DMR to estimate a low dimensional sufficient reduction projection

Example Suppose viy is the first element of vi , which is available in a subsample, but needs to be predicted for other subsamples. The first step would be to run a multinomial inverse regression of word counts on the covariates v in the training sample to estimate ϕ Second, estimate a forward regression (linear or higher order) Finally, the forward regression can be used to predict viy using text and the remaining covariates vi,−y.(Ziy is the projection of phrase count in the direction of Viy)

HDMR model the Poisson is a poor description of word counts cij Left: mass zero point Right: truncate to approximate

replace the independent Poissons with a two part hurdle model for counts cij , which we label the hurdle distributed multiple regression (HDMR): Two choices : include or exclude word j Only observe included words

So on so force again Let π0 denote the discrete density for zeros Natural choices for π0 are the probit and logit binary choice models. Let P + denote the model for positive counts, so that conditional on inclusion, Joint density:

Negative log likelihood:

Exclusion is the only source of zero counts κj and δj , only depend on word j indicators hj and on the covariates w αj and ϕj , only depend on positive word counts cj > 0 HDMR therefore allows one to estimate text selection in Big Data applications of previously impossible scale, by distributing computation across categories and across the two parts of the selection model

Regularization N << P the feature space (words) is much larger than the number of observations the penalty is normalized by the number of such documents n+

Projection(recover low dimension text indices that track variables of interest) Inverse regression for prediction Predict Viy using text and the remaining covariates

Application He, Kelly, and Manela (2017) find a simple two-factor model that includes the excess stock market return and the aggregate capital ratio of major financial intermediaries can explain cross-sectional variation in expected returns across a wide array of asset classes. Conclusions are limited Calculations are precluded prior to 1970 Conjecture: publication catering to investors (text that appears on the front page of the Wall Street Journal would be informative about the aggregate state of the intermediary sector)

Data Wall Street Journal from July 1926 to February 2016

covariates match this data with the monthly intermediary capital ratio icrt (prediction target ) of He, Kelly, and Manela the log price over annualized dividend ratio pdt log realized variance rvt monthly average number of pages in section A, biznewspres (backfill the ICR as far back as possible)

Clean the data (omit words or phrases that rarely appear in the sample) Compare MSE from 10-fold cross validation exercise

Using serial variant of cross-validation (different time lead to different prediction)

Loadings of the variables Icr: capital ratio rv: realized variance pd: price over annualized dividend ratio biznewspres: monthly average number of pages in section A Loadings of the variables

regress future stock market excess returns on lagged sufficient reduction projections Z0 t−1 , Z+ t−1 , and covariates rvt−1, pdt−1 and biznewsprest−1 projection z0 t−1 is a strong predictor of future market returns,

Conclusion Develop high-dimensional model that fit sparse count data especially useful in cases where the cover/no-cover choice is separate or more interesting than the coverage quantity choice

Discussions New method? Improvement? (advantage increases with sparsity of data) Other methods