Econometrics ITFD Week 8.

Slides:



Advertisements
Similar presentations
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Advertisements

Economics 20 - Prof. Anderson1 Panel Data Methods y it = x it k x itk + u it.
Economics 20 - Prof. Anderson
Managerial Economics in a Global Economy
Random Assignment Experiments
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Econometric Details -- the market model Assume that asset returns are jointly multivariate normal and independently and identically distributed through.
Multiple Linear Regression Model
Economics Prof. Buckles1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Chapter 10 Simple Regression.
Simple Linear Regression
Additional Topics in Regression Analysis
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Economics 20 - Prof. Anderson
Topic 3: Regression.
1Prof. Dr. Rainer Stachuletz Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Business Statistics - QBM117 Statistical inference for regression.
1Prof. Dr. Rainer Stachuletz Panel Data Methods y it =  0 +  1 x it  k x itk + u it.
Simple Linear Regression Analysis
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Regression with Panel Data.  Panel Data  Panel Data with Two Periods  Fixed Effects Regression The Model Estimation Regression with Time Fixed Effects.
Application 3: Estimating the Effect of Education on Earnings Methods of Economic Investigation Lecture 9 1.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
Economics 20 - Prof. Anderson1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Experimental Evaluations Methods of Economic Investigation Lecture 4.
PO 141: INTRODUCTION TO PUBLIC POLICY Summer I (2015) Claire Leavitt Boston University.
Financial Econometrics Lecture Notes 5
Chapter 13 Simple Linear Regression
Ch5 Relaxing the Assumptions of the Classical Model
Linear Regression with One Regression
Vera Tabakova, East Carolina University
Linear Regression.
Multiple Regression Analysis: Further Issues
Pooling Cross Sections across Time: Simple Panel Data Methods
business analytics II ▌panel data models
Difference-in-Differences
Chapter 11 Simple Regression
Chapter 15 Panel Data Analysis.
Pure Serial Correlation
...Relax... 9/21/2018 ST3131, Lecture 3 ST5213 Semester II, 2000/2001
Simple Linear Regression - Introduction
Correlation and Simple Linear Regression
Pooling Cross Sections across Time: Simple Panel Data Methods
Advanced Panel Data Methods
Chapter 6: MULTIPLE REGRESSION ANALYSIS
Review for Exam 2 Some important themes from Chapters 6-9
Economics 20 - Prof. Anderson
Correlation and Simple Linear Regression
Serial Correlation and Heteroscedasticity in
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Simple Linear Regression
Simple Linear Regression and Correlation
Chapter 7: The Normality Assumption and Inference with OLS
BEC 30325: MANAGERIAL ECONOMICS
Linear Panel Data Models
Multiple Regression Analysis: OLS Asymptotics
Multiple Regression Analysis: OLS Asymptotics
Chapter 9 Dummy Variables Undergraduated Econometrics Page 1
BEC 30325: MANAGERIAL ECONOMICS
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Serial Correlation and Heteroscedasticity in
Advanced Panel Data Methods
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Econometrics ITFD Week 8

Material So Far 1. Introduction * Stock & Watson, Chapter 1 * Mostly Harmless, Chapters 1 and 2 (pages 1-16). 2. Ordinary Least Squares * Stock & Watson, Chapters 4-9 Mostly Harmless, Chapter 3 3. Discrete choice models * Stock & Watson, Chapter 11 4. Experiments * Stock & Watson, Chapter 13 Mostly Harmless, Chapter 2 5. Instrumental Variables * Stock & Watson, Chapter 12 Mostly Harmless, Chapter 4

What we have left 6. Panel data and difference-in-differences * Stock & Watson, Chapter 10 Mostly Harmless, Chapter 5 7. Regression Discontinuity Designs * Stock & Watson, Chapter 13.4 Mostly Harmless, Chapter 6

Next: Difference-in-differences Panel Data: Fixed Effects and DiD Stock & Watson, Chapter 10 (and Chapter 13.4, pages 490-493). Mostly Harmless, Chapter 5.

Introduction We already have a few tools for addressing causal questions. Multiple regression But usually omitted variable bias if observational data Randomized controlled trials. But expensive to run, sometimes unfeasible Instrumental variables But good instruments are hard to find

An additional technique We will now use data with a time or cohort dimension. Where the same unit or individual is observed more than once. This allows us to control for unobserved omitted variables! As long as they are fixed over time for a given individual.

1. Panel data Data for n entities observed at T different time periods. (Xit, Yit), i=1,…,n and t=1,…,n

“Before and After” Comparisons By analyzing changes instead of levels, we can “get rid of” a certain type of omitted variables, even if we can’t observe them. Variables that are constant over time for a given individual. Show with an example: traffic deaths and alcohol taxes.

Example Can taxes reduce traffic fatalities? Or, the price of alcohol and drunk driving. Data: traffic fatality rates and alcohol taxes by state in the US, 1982-1988. Traffic fatality rate: Annual number of traffic deaths per 10,000 people. Alcohol tax: Tax on a case of beer, in 1988 $

A simple regression With 1988 data only:

Scatterplot

OLS without controls

OLS with controls

Interpretation Positive and significant coefficient! But: many omitted variables! Can we observe them? Not all! Cultural acceptance of drinking and driving? An alternative: estimate the regression in differences. Why does this work?

2. “First differences” regression Say we have data on T=2 time periods. We can use changes in Y as the dependent variable. Equation in levels with i-specific, omitted controls In periods 1 and 2: The influence of Xi can be eliminated by analyzing the “first-differenced” equation”! As long as X did not change between 1 and 2.

First-differencing the data

Regression in first differences

Interpretation The estimated effect is now negative and significant! A $1 increase in beer taxes is associated with 1.04 fewer deaths per 10,000 people! Large! Should we believe it? There are probably relevant factors that do change over time. Add controls!

First differences with controls

Extensions of “first differences” What if we have more than 2 periods? Shouldn’t we use all the variation that we have? Fixed-effects regression

3. Fixed Effects Regression A method for controlling for omitted variables when they vary across individuals but not over time. It can be used when each unit is observed two or more times. The fixed effects regression model has one intercept for each individual or unit. These absorb all omitted variables that do not vary with t.

The model Equation: We want to estimate b1, but Zi is unobserved. We can rewrite the equation as one with n intercepts: This is the fixed-effects model. The a’s are unknown intercepts to be estimated.

Estimation We can estimate those unit-specific intercepts via binary variables. The equation is now: These are equivalent ways of writing the fixed-effects model. The model can be estimated by OLS. But, if the number of units is large, it can be hard to implement. Statistical packages have special fixed-effects routines that simplify the estimation.

First differences versus binary variables When T=2, the two specifications are exactly equivalent! If the differences model is estimated without an intercept.

Example Alcohol taxes and traffic deaths. We have data for 7 years. Let’s estimate the fixed-effects regression using all years of data. N*T=336 observations (states*years)

Regression in first differences

Fixed-effects regression

Results Negative and significant coefficient. The opposite of what we found in the cross section. More years of data lead to more precise estimates. Lower standard errors. However, still some relevant omitted variables remain. During the 80’s, cars were getting safer and more people are using seatbelts. If taxes were rising, those effects could get confounded.

Time Fixed Effects We can still do better. And “control for” variables that evolve over time, but at the same pace across all states (units). By including “time fixed effects”. Account for variables that are the same across units but evolve over time. Example: policies at the national level (safety improvements in new cars, etc).

The equation with time effects We can just include year dummies. “Each time period has its own intercept”. We can include both stat and year fixed effects at once.

Unit and time fixed effects model Write down the two versions:

Example Note we would still want to control for additional variables. That vary across states AND over time. Main ones are probably changes in policies at the state level.

4. The fixed effects regression assumptions The error term has conditional mean 0, given all T values of X for that unit. E(uit | Xi1, Xi2,…, XiT, ai)=0 Current uit is uncorrelated with past, present, or future values of X. A bit stronger than the regular OLS assumption! This implies that there is no omitted variable bias.

2. The variables for one unit i are distributed identically to, but independently of, the variables for all other units (Xi1,…XiT, ui1,…uiT), i=1,…,n are i.i.d. draws from their joint distribution The variables are i.i.d. across entities for i=1,2,…,n This holds if entities (i’s) are selected by simple random sampling from the population. Notice observations do NOT have to be independent for a given unit over time. Xit can be correlated over time within an entity

3. Large outliers are unlikely 4. There is no perfect multicolinearity (These two are analogous to assumptions 3 and 4 for cross-sectional data)

The f-e regression assumptions If the 4 conditions hold, then: The fixed effects estimator is consistent, And normally distributed when n is large So we can do inference as usual

5. Autocorrelation in the error term In panel data, the regression error can be correlated over time within an entity. Xit is said to be autocorrelated, or serially correlated, if it is correlated over time for a given i. This is very common in panel data! What happens one years tends to be correlated with surrounding years. This correlation does not introduce bias, but it does affect the standard errors. We want to calculate standard errors that are robust to heteroscedasticity, but also to autocorrelation over time within an entity!

Example The alcohol tax is serially correlated. The error term? State laws don’t change every year, they are persistent! The error term? Relevant omitted variables probably autocorrelated, too. State economic conditions A major road improvement project But not all. A particularly bad winter. As long as some omitted variables are autocorrelated, uit will be autocorrelated.

What should we do? If the errors are autocorrelated, the usual robust standard errors are not valid. Because they are derived under the assumption of no autocorrelation. We can allow for autocorrelation by constructing clustered standard errors. That allow the regression errors to have an arbitrary correlation within a group or cluster. But assume that they are uncorrelated across clusters. Clustered std errors are valid whether there is heteroscedasticity, or autocorrelation, or both, or none.

Inference If n is large, inference using clustered standard errors can proceed as usual. t and F statistics Even if T is small Clustered standard errors can be very different from std errors that do not allow for autocorrelation!

Example