Download presentation
Presentation is loading. Please wait.
1
Lecture 4: Econometric Foundations
Stephen P. Ryan Olin Business School Washington University in St. Louis
2
Big Question: How to Perform Inference?
Naturally write many problems as solving: π πΌ,π =0 In many econometrics + ML settings, we have injected a predictor with unknown properties We typically are predicting a nuisance parameter (π) using a ML method Moment forests: using classification trees to predict the assignment of parameters to observations IV-Lasso: using LASSO to predict which instruments should be included in IV regression Natural language processing: using deep learning to parse text into data How do we perform inference on the πΆ, parameters of interest?
3
Chernozhukov, Hansen, Spindler (2015)
4
Basic ideas Low-dimensional parameter of interest
High-dimensional (infinite?) nuisance parameter estimated using selection or regularization methods Provide set of high-level conditions for regular inference Key condition: immunized or orthogonal estimating equations Intuition: set up moments so that estimating equations are locally insensitive to small mistakes in nuisance parameters Application: affine-quadratic models, IV with many regressors and instruments
5
Setting: High-dimensional models
High-dimensional models, where number of parameters is large relative to sample size, increasingly common/used Big data -> many covariates Basis functions -> dictionary of terms for even low-dimensional X Regularization -> reduction in dimension to focus on βkeyβ components required Need to account for this regularization when performing inference General approach -> account for model search (we will talk about this more in moment forests later)
6
Orthogonality / immunization condition
The key theoretical contribution is to show the properties of models with orthogonality / immunization allow for regular inference on πΌ in: π πΌ,π =0 The key condition is: πππ πΌ, π =0 Basically, derivative of system of equations is zero in local area with respect to nuisance parameter This condition can be generally established in many settings Neyman classic orthogonalized score in likelihood settings (Neyman 1959, 1979) Will show extension to GMM settings
7
High quality estimators of π
Process will require high quality estimator of nuisance parameters Approximate sparsity -> π can be approximated by sparse vector, such as Lasso Reminder, Lasso solves: π = arg min π(data, π) +π π=1 π π π π π Where π is some loss function, π is a penalty parameter, and π are penalty loadings Leading example is linear model: π = arg min π=1 π π¦ π β π₯ π β² π½ π π=1 π π π However: Conditions here do not require approximate sparsity -> require rate of convergence, usually π 1/4 .
8
Setup We want to solve system of equations: π πΌ, π 0 =0
π πΌ, π 0 =0 Where π= π π π=1 π is a measureable map from π΄Γπ» to β π and π΄ Γπ» are convex subsets of β π Γ β π Note that π assumed fixed, while π may grow with π Given appropriate estimator π , estimator is: Often π is a moment:
9
Adaptivity Condition We would like to test some hypothesis about the true parameter vector Inverting test gives us confidence region Adaptivity is key condition for validity of this inversion: Key requirement for this to be true is orthogonality / immunization:
10
Conditions for valid inference
Suppose we have: for some positive-definite Note that these are at the true values, and are basically telling us the underlying DGP and estimating equations are well-behaved Suppose there exists a high-quality estimator for the variance:
11
Then we get something cool
The following score statistic is asymptotically normal: and the quadratic form: The first equation is what we would like to use in practice!
12
Proposition 1
13
Valid Inference via Adaptive Estimation
Assumptions: Derivatives of moment functions exist: Parameters are on interior of space Problem is locally identified Central limit theorem holds Variance is not too nuts Stochastic equicontinuity and continuity requirements (bounding variation in underlying functions) Uniform convergence requirements on estimators, underlying smoothness (usually rate of convergence on π) Orthogonality condition (new requirement here)
15
Achieving Orthogonality
Idea: project score that identifies parameter of interest onto orthocomplement of the tangent space for the nuisance parameter (obviously) Actually, kind of intuitive partialing out of the nuisance parameter Suppose we have a likelihood function and: Consider the following moment function: with:
16
Orthogonality with Maximum Likelihood
We have both: and
17
Under true specification
If we assume model is correct, get a nice result: Leading to:
18
Lemma 1 [Neymanβs Orthogonalization]
19
Details
20
Orthogonal GMM Version:
21
Estimator and Variance
22
IV with Many Controls and Instruments
Consider a typical IV model, but with many controls (X) and instruments (Z): Where:
23
IV with Many Controls and Instruments
Consider a typical IV model, but with many controls (X) and instruments (Z): Where: Question: how to estimate with many Z and X, given we care about πΌ
24
Note that we have a bunch of nuisance parameters here
Letting X and Z be correlated (so that π§ π =Ξ x i + π π ), we can rewrite equations as: With:
25
Sparsity Since dim π 0 >π, have to do something to reduce dimensionality Assume approximate sparsity (i.e. low dimension approximation = close enough): Decomposing:
26
Condition A2 + Estimator
Assume that non-sparse component grows at sufficiently slow rate (by the way, what does that mean in practice?) Then, orthogonalized equations are: Can verify that orthogonality condition holds.
27
Lasso estimator for π 0
28
Simulation Results
31
Berry Logit Example ala BLP
Simple aggregate logit model of demand for cars: Concern: price correlated with unobserved quality BLP instruments: functions of characteristics of other products BLP reduced dimensionality to:
32
This method applied here
In principle, all functions of other product characteristics will be valid instruments Which ones to use? CHS propose all first-order interactions of baseline, quadratics, cubics and a time trend Apply the IV estimator described previously
34
Conclusion This paper provides general conditions under which one can perform regular inference after regularizing a high-dimensional set of nuisance parameters This applies to a very wide set of models Increasing number of X due to big data Increasing number of X due to dictionary of basis functions Two worked examples Likelihood GMM Application to IV This should be very useful to you all in your own applied work, fairly easy to modify the standard approaches to use
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.