Download presentation
Presentation is loading. Please wait.
1
Econometrics ITFD Week 8
2
Material So Far 1. Introduction * Stock & Watson, Chapter 1
* Mostly Harmless, Chapters 1 and 2 (pages 1-16). 2. Ordinary Least Squares * Stock & Watson, Chapters 4-9 Mostly Harmless, Chapter 3 3. Discrete choice models * Stock & Watson, Chapter 11 4. Experiments * Stock & Watson, Chapter 13 Mostly Harmless, Chapter 2 5. Instrumental Variables * Stock & Watson, Chapter 12 Mostly Harmless, Chapter 4
3
What we have left 6. Panel data and difference-in-differences
* Stock & Watson, Chapter 10 Mostly Harmless, Chapter 5 7. Regression Discontinuity Designs * Stock & Watson, Chapter 13.4 Mostly Harmless, Chapter 6
4
Next: Difference-in-differences
Panel Data: Fixed Effects and DiD Stock & Watson, Chapter 10 (and Chapter 13.4, pages ). Mostly Harmless, Chapter 5.
5
Introduction We already have a few tools for addressing causal questions. Multiple regression But usually omitted variable bias if observational data Randomized controlled trials. But expensive to run, sometimes unfeasible Instrumental variables But good instruments are hard to find
6
An additional technique
We will now use data with a time or cohort dimension. Where the same unit or individual is observed more than once. This allows us to control for unobserved omitted variables! As long as they are fixed over time for a given individual.
7
1. Panel data Data for n entities observed at T different time periods. (Xit, Yit), i=1,…,n and t=1,…,n
8
“Before and After” Comparisons
By analyzing changes instead of levels, we can “get rid of” a certain type of omitted variables, even if we can’t observe them. Variables that are constant over time for a given individual. Show with an example: traffic deaths and alcohol taxes.
9
Example Can taxes reduce traffic fatalities?
Or, the price of alcohol and drunk driving. Data: traffic fatality rates and alcohol taxes by state in the US, Traffic fatality rate: Annual number of traffic deaths per 10,000 people. Alcohol tax: Tax on a case of beer, in 1988 $
10
A simple regression With 1988 data only:
11
Scatterplot
12
OLS without controls
13
OLS with controls
14
Interpretation Positive and significant coefficient!
But: many omitted variables! Can we observe them? Not all! Cultural acceptance of drinking and driving? An alternative: estimate the regression in differences. Why does this work?
15
2. “First differences” regression
Say we have data on T=2 time periods. We can use changes in Y as the dependent variable. Equation in levels with i-specific, omitted controls In periods 1 and 2: The influence of Xi can be eliminated by analyzing the “first-differenced” equation”! As long as X did not change between 1 and 2.
16
First-differencing the data
17
Regression in first differences
18
Interpretation The estimated effect is now negative and significant!
A $1 increase in beer taxes is associated with 1.04 fewer deaths per 10,000 people! Large! Should we believe it? There are probably relevant factors that do change over time. Add controls!
19
First differences with controls
20
Extensions of “first differences”
What if we have more than 2 periods? Shouldn’t we use all the variation that we have? Fixed-effects regression
21
3. Fixed Effects Regression
A method for controlling for omitted variables when they vary across individuals but not over time. It can be used when each unit is observed two or more times. The fixed effects regression model has one intercept for each individual or unit. These absorb all omitted variables that do not vary with t.
22
The model Equation: We want to estimate b1, but Zi is unobserved.
We can rewrite the equation as one with n intercepts: This is the fixed-effects model. The a’s are unknown intercepts to be estimated.
23
Estimation We can estimate those unit-specific intercepts via binary variables. The equation is now: These are equivalent ways of writing the fixed-effects model. The model can be estimated by OLS. But, if the number of units is large, it can be hard to implement. Statistical packages have special fixed-effects routines that simplify the estimation.
24
First differences versus binary variables
When T=2, the two specifications are exactly equivalent! If the differences model is estimated without an intercept.
25
Example Alcohol taxes and traffic deaths. We have data for 7 years.
Let’s estimate the fixed-effects regression using all years of data. N*T=336 observations (states*years)
26
Regression in first differences
27
Fixed-effects regression
29
Results Negative and significant coefficient.
The opposite of what we found in the cross section. More years of data lead to more precise estimates. Lower standard errors. However, still some relevant omitted variables remain. During the 80’s, cars were getting safer and more people are using seatbelts. If taxes were rising, those effects could get confounded.
30
Time Fixed Effects We can still do better.
And “control for” variables that evolve over time, but at the same pace across all states (units). By including “time fixed effects”. Account for variables that are the same across units but evolve over time. Example: policies at the national level (safety improvements in new cars, etc).
31
The equation with time effects
We can just include year dummies. “Each time period has its own intercept”. We can include both stat and year fixed effects at once.
32
Unit and time fixed effects model
Write down the two versions:
34
Example Note we would still want to control for additional variables.
That vary across states AND over time. Main ones are probably changes in policies at the state level.
35
4. The fixed effects regression assumptions
The error term has conditional mean 0, given all T values of X for that unit. E(uit | Xi1, Xi2,…, XiT, ai)=0 Current uit is uncorrelated with past, present, or future values of X. A bit stronger than the regular OLS assumption! This implies that there is no omitted variable bias.
36
2. The variables for one unit i are distributed identically to, but independently of, the variables for all other units (Xi1,…XiT, ui1,…uiT), i=1,…,n are i.i.d. draws from their joint distribution The variables are i.i.d. across entities for i=1,2,…,n This holds if entities (i’s) are selected by simple random sampling from the population. Notice observations do NOT have to be independent for a given unit over time. Xit can be correlated over time within an entity
37
3. Large outliers are unlikely
4. There is no perfect multicolinearity (These two are analogous to assumptions 3 and 4 for cross-sectional data)
38
The f-e regression assumptions
If the 4 conditions hold, then: The fixed effects estimator is consistent, And normally distributed when n is large So we can do inference as usual
39
5. Autocorrelation in the error term
In panel data, the regression error can be correlated over time within an entity. Xit is said to be autocorrelated, or serially correlated, if it is correlated over time for a given i. This is very common in panel data! What happens one years tends to be correlated with surrounding years. This correlation does not introduce bias, but it does affect the standard errors. We want to calculate standard errors that are robust to heteroscedasticity, but also to autocorrelation over time within an entity!
40
Example The alcohol tax is serially correlated. The error term?
State laws don’t change every year, they are persistent! The error term? Relevant omitted variables probably autocorrelated, too. State economic conditions A major road improvement project But not all. A particularly bad winter. As long as some omitted variables are autocorrelated, uit will be autocorrelated.
41
What should we do? If the errors are autocorrelated, the usual robust standard errors are not valid. Because they are derived under the assumption of no autocorrelation. We can allow for autocorrelation by constructing clustered standard errors. That allow the regression errors to have an arbitrary correlation within a group or cluster. But assume that they are uncorrelated across clusters. Clustered std errors are valid whether there is heteroscedasticity, or autocorrelation, or both, or none.
42
Inference If n is large, inference using clustered standard errors can proceed as usual. t and F statistics Even if T is small Clustered standard errors can be very different from std errors that do not allow for autocorrelation!
43
Example
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.