Lecture 4: Econometric Foundations

Slides:



Advertisements
Similar presentations
Advanced topics in Financial Econometrics Bas Werker Tilburg University, SAMSI fellow.
Advertisements

Tests of Static Asset Pricing Models
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Part 12: Asymptotics for the Regression Model 12-1/39 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Time Varying Coefficient Models; A Proposal for selecting the Coefficient Driver Sets Stephen G. Hall, P. A. V. B. Swamy and George S. Tavlas,
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Omitted Variable Bias Methods of Economic Investigation Lecture 7 1.
Chapter 2: Lasso for linear models
Chapter 10 Simple Regression.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Chapter 11 Multiple Regression.
Estimation and Inference by the Method of Projection Minimum Distance Òscar Jordà Sharon Kozicki U.C. Davis Bank of Canada.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Outline Separating Hyperplanes – Separable Case
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
1 MODELING MATTER AT NANOSCALES 4. Introduction to quantum treatments The variational method.
Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.
CpSc 881: Machine Learning
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Diversity Loss in General Estimation of Distribution Algorithms J. L. Shapiro PPSN (Parallel Problem Solving From Nature) ’06 BISCuit 2 nd EDA Seminar.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Maximum Likelihood. Much estimation theory is presented in a rather ad hoc fashion. Minimising squared errors seems a good idea but why not minimise the.
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
Charles University FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences.
Econometric analysis of CVM surveys. Estimation of WTP The information we have depends on the elicitation format. With the open- ended format it is relatively.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Esman M. Nyamongo Central Bank of Kenya
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
Oliver Schulte Machine Learning 726
Deep Feedforward Networks
STATISTICS POINT ESTIMATION
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
LECTURE 11: Advanced Discriminant Analysis
Lecture 04: Logistic Regression
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Boosting and Additive Trees (2)
Ch3: Model Building through Regression
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Computer vision: models, learning and inference
10701 / Machine Learning.
Machine Learning Basics
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Confidence Interval Estimation and Statistical Inference
Methods of Economic Investigation Lecture 12
Collaborative Filtering Matrix Factorization Approach
10701 / Machine Learning Today: - Cross validation,
Lecture 1: Introduction to Machine Learning Methods
OVERVIEW OF LINEAR MODELS
OVERVIEW OF LINEAR MODELS
Econometrics Chengyuan Yin School of Mathematics.
Generally Discriminant Analysis
CSCI B609: “Foundations of Data Science”
Mathematical Foundations of BME Reza Shadmehr
Parametric Methods Berlin Chen, 2005 References:
#10 The Central Limit Theorem
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Lecture 4: Econometric Foundations Stephen P. Ryan Olin Business School Washington University in St. Louis

Big Question: How to Perform Inference? Naturally write many problems as solving: 𝑀 𝛼,𝜂 =0 In many econometrics + ML settings, we have injected a predictor with unknown properties We typically are predicting a nuisance parameter (𝜂) using a ML method Moment forests: using classification trees to predict the assignment of parameters to observations IV-Lasso: using LASSO to predict which instruments should be included in IV regression Natural language processing: using deep learning to parse text into data How do we perform inference on the 𝜶, parameters of interest?

Chernozhukov, Hansen, Spindler (2015)

Basic ideas Low-dimensional parameter of interest High-dimensional (infinite?) nuisance parameter estimated using selection or regularization methods Provide set of high-level conditions for regular inference Key condition: immunized or orthogonal estimating equations Intuition: set up moments so that estimating equations are locally insensitive to small mistakes in nuisance parameters Application: affine-quadratic models, IV with many regressors and instruments

Setting: High-dimensional models High-dimensional models, where number of parameters is large relative to sample size, increasingly common/used Big data -> many covariates Basis functions -> dictionary of terms for even low-dimensional X Regularization -> reduction in dimension to focus on “key” components required Need to account for this regularization when performing inference General approach -> account for model search (we will talk about this more in moment forests later)

Orthogonality / immunization condition The key theoretical contribution is to show the properties of models with orthogonality / immunization allow for regular inference on 𝛼 in: 𝑀 𝛼,𝜂 =0 The key condition is: 𝜕𝜂𝑀 𝛼, 𝜂 =0 Basically, derivative of system of equations is zero in local area with respect to nuisance parameter This condition can be generally established in many settings Neyman classic orthogonalized score in likelihood settings (Neyman 1959, 1979) Will show extension to GMM settings

High quality estimators of 𝜂 Process will require high quality estimator of nuisance parameters Approximate sparsity -> 𝜂 can be approximated by sparse vector, such as Lasso Reminder, Lasso solves: 𝜂 = arg min 𝑙(data, 𝜂) +𝜆 𝑗=1 𝑝 𝜓 𝑗 𝜂 𝑗 Where 𝑙 is some loss function, 𝜆 is a penalty parameter, and 𝜓 are penalty loadings Leading example is linear model: 𝜂 = arg min 𝑖=1 𝑛 𝑦 𝑖 − 𝑥 𝑖 ′ 𝛽 2 +𝜆 𝑗=1 𝑝 𝜂 𝑗 However: Conditions here do not require approximate sparsity -> require rate of convergence, usually 𝑛 1/4 .

Setup We want to solve system of equations: 𝑀 𝛼, 𝜂 0 =0 𝑀 𝛼, 𝜂 0 =0 Where 𝑀= 𝑀 𝑙 𝑙=1 𝑘 is a measureable map from 𝐴×𝐻 to ℛ 𝑘 and 𝐴 ×𝐻 are convex subsets of ℛ 𝑑 × ℛ 𝑝 Note that 𝑑 assumed fixed, while 𝑝 may grow with 𝑛 Given appropriate estimator 𝜂 , estimator is: Often 𝑀 is a moment:

Adaptivity Condition We would like to test some hypothesis about the true parameter vector Inverting test gives us confidence region Adaptivity is key condition for validity of this inversion: Key requirement for this to be true is orthogonality / immunization:

Conditions for valid inference Suppose we have: for some positive-definite Note that these are at the true values, and are basically telling us the underlying DGP and estimating equations are well-behaved Suppose there exists a high-quality estimator for the variance:

Then we get something cool The following score statistic is asymptotically normal: and the quadratic form: The first equation is what we would like to use in practice!

Proposition 1

Valid Inference via Adaptive Estimation Assumptions: Derivatives of moment functions exist: Parameters are on interior of space Problem is locally identified Central limit theorem holds Variance is not too nuts Stochastic equicontinuity and continuity requirements (bounding variation in underlying functions) Uniform convergence requirements on estimators, underlying smoothness (usually rate of convergence on 𝜂) Orthogonality condition (new requirement here)

Achieving Orthogonality Idea: project score that identifies parameter of interest onto orthocomplement of the tangent space for the nuisance parameter (obviously) Actually, kind of intuitive partialing out of the nuisance parameter Suppose we have a likelihood function and: Consider the following moment function: with:

Orthogonality with Maximum Likelihood We have both: and

Under true specification If we assume model is correct, get a nice result: Leading to:

Lemma 1 [Neyman’s Orthogonalization]

Details

Orthogonal GMM Version:

Estimator and Variance

IV with Many Controls and Instruments Consider a typical IV model, but with many controls (X) and instruments (Z): Where:

IV with Many Controls and Instruments Consider a typical IV model, but with many controls (X) and instruments (Z): Where: Question: how to estimate with many Z and X, given we care about 𝛼

Note that we have a bunch of nuisance parameters here Letting X and Z be correlated (so that 𝑧 𝑖 =Π x i + 𝜉 𝑖 ), we can rewrite equations as: With:

Sparsity Since dim 𝜂 0 >𝑛, have to do something to reduce dimensionality Assume approximate sparsity (i.e. low dimension approximation = close enough): Decomposing:

Condition A2 + Estimator Assume that non-sparse component grows at sufficiently slow rate (by the way, what does that mean in practice?) Then, orthogonalized equations are: Can verify that orthogonality condition holds.

Lasso estimator for 𝜂 0

Simulation Results

Berry Logit Example ala BLP Simple aggregate logit model of demand for cars: Concern: price correlated with unobserved quality BLP instruments: functions of characteristics of other products BLP reduced dimensionality to:

This method applied here In principle, all functions of other product characteristics will be valid instruments Which ones to use? CHS propose all first-order interactions of baseline, quadratics, cubics and a time trend Apply the IV estimator described previously

Conclusion This paper provides general conditions under which one can perform regular inference after regularizing a high-dimensional set of nuisance parameters This applies to a very wide set of models Increasing number of X due to big data Increasing number of X due to dictionary of basis functions Two worked examples Likelihood GMM Application to IV This should be very useful to you all in your own applied work, fairly easy to modify the standard approaches to use