Analysis of Experimental Data III Christoph Engel.

Slides:



Advertisements
Similar presentations
Cointegration and Error Correction Models
Advertisements

Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Differences-in-Differences and A Brief Introduction to Panel Data
Econometric Analysis of Panel Data Panel Data Analysis – Random Effects Assumptions GLS Estimator Panel-Robust Variance-Covariance Matrix ML Estimator.
PANEL DATA 1. Dummy Variable Regression 2. LSDV Estimator
Data organization.
Properties of Least Squares Regression Coefficients
Economics 20 - Prof. Anderson
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
AMMBR - final stuff xtmixed (and xtreg) (checking for normality, random slopes)
AMMBR from xtreg to xtmixed (+checking for normality, random slopes)
The Multiple Regression Model.
Toolkit + “show your skills” AMMBR from xtreg to xtmixed (+checking for normality, and random slopes, and cross-classified models, and then we are almost.
Lecture 8 (Ch14) Advanced Panel Data Method
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Specification Error II
Instrumental Variables Estimation and Two Stage Least Square
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
Random effects estimation RANDOM EFFECTS REGRESSIONS When the observed variables of interest are constant for each individual, a fixed effects regression.
8.4 Weighted Least Squares Estimation Before the existence of heteroskedasticity-robust statistics, one needed to know the form of heteroskedasticity -Het.
Variance and covariance M contains the mean Sums of squares General additive models.
Chapter 13 Additional Topics in Regression Analysis
Prof. Dr. Rainer Stachuletz
Additional Topics in Regression Analysis
Chapter 9 Simultaneous Equations Models. What is in this Chapter? In Chapter 4 we mentioned that one of the assumptions in the basic regression model.
12.3 Correcting for Serial Correlation w/ Strictly Exogenous Regressors The following autocorrelation correction requires all our regressors to be strictly.
ANCOVA Psy 420 Andrew Ainsworth. What is ANCOVA?
Topic 3: Regression.
Business Statistics - QBM117 Statistical inference for regression.
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
Objectives of Multiple Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Empirical Example Walter Sosa Escudero Universidad de San Andres - UNLP.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
1 MADE WHAT IF SOME OLS ASSUMPTIONS ARE NOT FULFILED?
Model Building III – Remedial Measures KNNL – Chapter 11.
Error Component Models Methods of Economic Investigation Lecture 8 1.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 5: Choices for Longitudinal Data Analysis.
Panel Data Models ECON 6002 Econometrics I Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
3.4 The Components of the OLS Variances: Multicollinearity We see in (3.51) that the variance of B j hat depends on three factors: σ 2, SST j and R j 2.
Analysis of Covariance (ANCOVA)
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Chapter 15 Panel Data Models Walter R. Paczkowski Rutgers University.
When should you use fixed effects estimation rather than random effects estimation, or vice versa? FIXED EFFECTS OR RANDOM EFFECTS? 1 NLSY 1980–1996 Dependent.
Analysis of Experimental Data IV Christoph Engel.
Analysis of Experimental Data II Christoph Engel.
1 Empirical methods: endogeneity, instrumental variables and panel data Advanced Corporate Finance Semester
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
Chapter 15 Panel Data Models.
Comparing Two Means Prof. Andy Field.
Inference for Least Squares Lines
Difference-in-Differences
PANEL DATA 1. Dummy Variable Regression 2. LSDV Estimator
Instrumental Variables and Two Stage Least Squares
Advanced Panel Data Methods
Chapter 6: MULTIPLE REGRESSION ANALYSIS
Instrumental Variables and Two Stage Least Squares
Migration and the Labour Market
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Linear Panel Data Models
Chapter 13 Additional Topics in Regression Analysis
Instrumental Variables Estimation and Two Stage Least Squares
Linear Regression Summer School IFPRI
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Advanced Panel Data Methods
Presentation transcript:

Analysis of Experimental Data III Christoph Engel

independence problems I.repeated measurement II.cluster III.(time series) IV.panel V.nested data

I. repeated measurement  simple most case  once repeated  each participant is observed  untreated  treated  dgp  5 + 2*treat + erroruid + errorresid

safe solution  dgp  5 + 2*treat + erroruid + errorresid  invites an obvious solution for removing the individual specific error  interest is in the treatment effect  i.e. in individual reactions to manipulation  generate  dv(post) – dv(pre)  test whether significantly different from 0

first differences

non-parametric works with ranks (as Mann Whitney) but ranks (first) differences

parametric assumes normality mean ≈ effect size

regression  of first differences  correct  but complicated  but not very informative  Gauss Markov assumptions  independence  exogeneity  error|iv = 0  no multicollinearity  iv matrix has full rank  (no heteroskedasticity)  note  normality not assumed !

alternative more informative, but not more effective than ttest t-value of treatment exactly the same

technically the same as too conservative if we can safely assume that erroruid = random

more efficient in the concrete case small gain coefficients totally unaffected (additional assumption should be tested)

II. cluster  typical application  stranger design  interaction in matching groups might violate independence  introductory example  one level of dependence only  dgp  dv = 5 +.5*level + erroruid + errorresid  intuitively  “experiment with x treatments”

technically σ1σ σ2σ σ3σ σ4σ σ5σ σ6σ6 σ1σ1 σ σ1σ σ2σ2 σ σ2σ σ3σ3 σ σ3σ3 robust cluster

technically σu+σeσu+σe σu+σeσu+σe σu+σeσu+σe σu+σeσu+σe σ σ00 00σ σ random effects fixed effects

comparison of approaches  cluster  most conservative  only assumes  covariance outside clusters = 0  random / fixed effect  assumes more structure  fixed effects  (implicitly) estimates additional coefficient  random effects  estimates additional error term  assumes off diagonal cov = 0

practically  discuss assumptions  random / fixed effect and cluster can be combined  if random effects justified  coefficients should not be affected  consistent  but standard errors should be larger with clustering

III. time series  very unusual for experiment  rare application  evolution of average behaviour of participants over time  dgp  dv = 4 + error if t < 11  replace dv = 4 +.2*L10.dv + error if t > 10

graphical representation

estimation simple  OLS  y = cons + Lx.y + eps  if exact duration of lag unknown / not predicted from theory  one may use significance for selection

lag selection

more sophisticated  partial autocorrelation  autocorrelation, conditional on all earlier lags  significantly different from 0?  pac dv, lags(number)

IV. panel  very frequent  all participants are tested repeatedly  (for the moment: no strategic interaction)  dgp  dv = 5 + 2*treat +.5*level + erroruid + error

estimation options  pooled OLS  ignore dependence  random effects  allow for within dependence  but assume  random  independent from ivs  independent from residual error  fixed effects  (implicitly) estimate coefficient for each unit  (cluster)

pooled OLS coefficients do not seem biased but standard errors are exaggerated

random effects

fixed effects

time-invariant regressors  why do they drop out?  model uses differencing for removing erroruid  could be first differences  Θ loss of 1 observation per participant  alternative: demeaning  dv* = dv t – (mean)dv

why not random effects?  advantages  more efficient  time invariant regressors are estimated  but  additional assumption  individual specific term is  random  uncorrelated with residual error and ivs

test of this assumption  straightforward  if assumption is valid  then coefficients of time variant regressors should not differ  random  may differ per individual  but there may not be systematic differences  shift in level is OK  constant may differ

Hausman Test  can be done by hand  store coef from one model  use Wald test to see whether coef from alternative model is significantly different  but tedious with > 1 time dependent variable  use Stata procedure

Hausman test  xtreg dv treat level, fe  est sto fe  xtreg dv treat level  est sto re  hausman fe re

what if Hausman test is significant?  in experimental dataset relatively frequent  mainly due to interactive component  certain participants react in a systematically different way to the actions of others

example dgp

fe estimates Hausman p =.0051

Hausman Taylor

 estimation  single out ivs suspected to be endogenous  i.e. correlated with random effect  (but uncorrelated with residuals)  check with second Hausman test  if insignificant, endogeneity problem is solved

second Hausman test Baltagi Bretton EcLet 2003, 361

what does Hausman Taylor do?  remove endogeneity of time dependent variables  by mean differencing  create consistent estimates of time invariant regressors  adjust standard errors  technically most difficult  GLS  (check literature)

iv step  alternative interpretation of fixed effects estimator  all time variant regressors are instrumented  instrument  deviation from the individual specific mean  correlated with time variant regressor  uncorrelated with individual specific error  since it has been removed by demeaning

iv step  fixed effects is safe, but radical  all time variant regressors are instrumented  even if only some are endogenous  time invariant regressors are removed  even if none of them is endogenous

iv step  invites solution  if only some time-variant regressors are endogenous  instrument only those  recover time invariant regressors  if also some time-invariant regressors are endogenous  (use exogenous instruments)  use mean deviation from individual specific mean of exogenous time-variant regressors as instrument

iv step  use residuals from step 1  regression of mean differenced model  create mean residual for each uid as dv  explain dv  by time invariant regressors  as instrumented by  exogenous time invariant regressors  instrument themselves  within subject mean of time variant regressors  >= one per endogenous time invariant regressor

practical matter  Stata wants at least one time variant exogenous variable  although strictly speaking only necessary if at least one time invariant regressor is endogenous  usually  use time trend

V. nested data  very frequent  most economic experiments are interactive  partner design  group  stranger design  matching group  (if you have forgotten to define matching groups: entire sessions)

typical dgps  3 layers  choice  individual  group  4 layers  reaction to other group members’ choices  period  individual  group

cluster σ1σ1 σ σ1σ σ2σ2 σ σ2σ σ3σ3 σ σ3σ3 σ1σ1 σ σ1σ σ2σ2 σ σ2σ σ3σ3 σ σ3σ3 individual clustergroup cluster

cluster  SE are not too small  but are likely to be too big  cluster ignores additional information about structure

mixed effects model  y git = X*beta + u g + u gi + e git  u g captures group ideosyncrasies  u gi captures individual ideosyncrasies  conditional on group ideosyncrasies being controlled for  e git is residual error

estimation  xtmixed dv treat level || group:, || uid:,

data structure  defined by  xtmixed dv treat level || group:, || uid:,  could also involve random slopes  xtmixed dv treat level || group: level || uid:,  covariance structure can be changed  default: “independent”  assumes covariances across units to be zero

Hausman test  same argument as before  and same test  if both random effects are indeed random  coefficients on time variant regressors should not significantly differ  compared with (one) fixed effect

Hausman test necessary to tell Stata what to compare