1 Results from hsb_subset.do. 2 Example of Kloeck problem Two-stage sample of high school sophomores 1 st school is selected, then students are picked,

Slides:



Advertisements
Similar presentations
1 Tobit models Econ Bill Evans. 2 Example: Bias in censored models Bivariate regression x i and ε are drawn from N(0,1) y i = α + x i β + ε i Let.
Advertisements

Sociology 601 Class 24: November 19, 2009 (partial) Review –regression results for spurious & intervening effects –care with sample sizes for comparing.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.16 Original citation: Dougherty, C. (2012) EC220 - Introduction.
1 FE Panel Data assumptions. 2 Assumption #1: E(u it |X i1,…,X iT,  i ) = 0.
Heteroskedasticity The Problem:
Lecture 4 (Chapter 4). Linear Models for Correlated Data We aim to develop a general linear model framework for longitudinal data, in which the inference.
HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
1 Nonlinear Regression Functions (SW Chapter 8). 2 The TestScore – STR relation looks linear (maybe)…
TigerStat ECOTS Understanding the population of rare and endangered Amur tigers in Siberia. [Gerow et al. (2006)] Estimating the Age distribution.
Advanced Panel Data Techniques
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: exercise 3.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Lecture 6: Repeated Measures Analyses Elizabeth Garrett Child Psychiatry Research Methods Lecture Series.
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Adaptive expectations and partial adjustment Presented by: Monika Tarsalewska Piotrek Jeżak Justyna Koper Magdalena Prędota.
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
Shall we take Solow seriously?? Empirics of growth Ania Nicińska Agnieszka Postępska Paweł Zaboklicki.
Sociology 601 Class 25: November 24, 2009 Homework 9 Review –dummy variable example from ASR (finish) –regression results for dummy variables Quadratic.
Multilevel Models 2 Sociology 8811, Class 24
Multilevel Models 2 Sociology 229A, Class 18
Multilevel Models 1 Sociology 229A Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Regression Discontinuity Design 1. Motivating example Many districts have summer school to help kids improve outcomes between grades –Enrichment, or –Assist.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Regression Discontinuity Design 1. 2 Z Pr(X i =1 | z) 0 1 Z0Z0 Fuzzy Design Sharp Design.
1 Michigan.do. 2. * construct new variables;. gen mi=state==26;. * michigan dummy;. gen hike=month>=33;. * treatment period dummy;. gen treatment=hike*mi;
Sociology 601 Class 23: November 17, 2009 Homework #8 Review –spurious, intervening, & interactions effects –stata regression commands & output F-tests.
A trial of incentives to attend adult literacy classes Carole Torgerson, Greg Brooks, Jeremy Miles, David Torgerson Classes randomised to incentive or.
Interpreting Bi-variate OLS Regression
1 Zinc Data EPP 245 Statistical Analysis of Laboratory Data.
Sociology 601 Class 26: December 1, 2009 (partial) Review –curvilinear regression results –cubic polynomial Interaction effects –example: earnings on married.
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
Confidence intervals were treated at length in the Review chapter and their application to regression analysis presents no problems. We will not repeat.
Returning to Consumption
Country Gini IndexCountryGini IndexCountryGini IndexCountryGini Index Albania28.2Georgia40.4Mozambique39.6Turkey38 Algeria35.3Germany28.3Nepal47.2Turkmenistan40.8.
Serial Correlation and the Housing price function Aka “Autocorrelation”
1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
Behavior in blind environmental dilemmas - An experimental study Martin Beckenkamp Max-Planck-Institute for the Research on Collective Goods Bonn – Germany.
Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.
Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.
Difference in Difference 1. Preliminaries Office Hours: Fridays 4-5pm 32Lif, 3.01 I will post slides from class on my website
Two-stage least squares 1. D1 S1 2 P Q D1 D2D2 S1 S2 Increase in income Increase in costs 3.
Panel Data. Assembling the Data insheet using marriage-data.csv, c d u "background-data", clear d u "experience-data", clear u "wage-data", clear d reshape.
POSSIBLE DIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY 1 What can you do about multicollinearity if you encounter it? We will discuss some possible.
Special topics. Importance of a variable Death penalty example. sum death bd- yv Variable | Obs Mean Std. Dev. Min Max
Lecture 5. Linear Models for Correlated Data: Inference.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: exercise 6.13 Original citation: Dougherty, C. (2012) EC220 - Introduction.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.
Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins Andy Bogart, MS Jack Goldberg, PhD.
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
1 In the Monte Carlo experiment in the previous sequence we used the rate of unemployment, U, as an instrument for w in the price inflation equation. SIMULTANEOUS.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE In this sequence we will investigate the consequences of including an irrelevant variable.
QM222 Class 9 Section A1 Coefficient statistics
From t-test to multilevel analyses Del-2
QM222 Class 8 Section A1 Using categorical data in regression
QM222 Your regressions and the test
QM222 Class 15 Section D1 Review for test Multicollinearity
Table 4. Panel Regression with Fixed Effects
Financial Econometrics Fin. 505
Introduction to Econometrics, 5th edition
Presentation transcript:

1 Results from hsb_subset.do

2 Example of Kloeck problem Two-stage sample of high school sophomores 1 st school is selected, then students are picked, both at random This sample, 10 students each from 498 high schools Y is =β 0 + X is β 1 + Z s γ + v is

3 Variables in data set * outcome variable; *soph_scr; * variables that vary by school: *west, south, midwest, cath_sch, urban, rural; * school id variable; *schoolid; * variable that vary across students; *age, female, siblings, black, hispanic, both_parents; *parent_ed1-parent_ed4, family_inc1-family_inc6;

4. xtreg soph_scr west south midwest urban rural cath_sch, i(schoolid) re; Random-effects GLS regression Number of obs = 4980 Group variable: schoolid Number of groups = 498 R-sq: within = Obs per group: min = 10 between = avg = 10.0 overall = max = 10 Random effects u_i ~ Gaussian Wald chi2(6) = corr(u_i, X) = 0 (assumed) Prob > chi2 = soph_scr | Coef. Std. Err. z P>|z| [95% Conf. Interval] west | south | midwest | urban | rural | cath_sch | _cons | sigma_u | sigma_e | rho | (fraction of variance due to u_i)

5 In random effects model, ρ=% of total variance explained between-group ρ = σ 2 u /(σ 2 u + σ 2 e ) = 0.14 Bias of OLS variance is 1+ ρ(T-1) T=10, so bias = (9) = 2.26 Standard error should be too large by a factor of = 1.50

6 OLSRERatio XOLSStd errorStd err RE/OLS Std error west south midwest urban rural cath_sch _cons

Now add some covariates X’s – characteristics that vary across kids and school Will explain some of the persistent between school difference in outcomes Therefore ρ = σ 2 u /(σ 2 u + σ 2 e ) should decline 7

8 * run ols model of test score on only school characteristics; * this is a model similar to the one discussed in Kloeck, econometrica, 1981; reg soph_scr west south midwest urban rural cath_sch; now run a random effects model to get the estimate of rho; xtreg soph_scr west south midwest urban rural cath_sch, i(schoolid) re; * run OLS, Random effect and OLS with clustered standard errors; * in this case, add in the variables that vary by individual; *ols; reg soph_scr age female siblings both_parents parent_ed0-parent_ed3 family_inc0-family_inc6 west south midwest urban rural cath_sch; *random effects; xtreg soph_scr age female siblings both_parents parent_ed0-parent_ed3 family_inc0-family_inc6 west south midwest urban rural cath_sch, re i(schoolid); * ols with standard errros clustered on the school; reg soph_scr age female siblings both_parents parent_ed0-parent_ed3 family_inc0-family_inc6 west south midwest urban rural cath_sch, cluster(schoolid);

9. xtreg soph_scr age female siblings both_parents parent_ed0-parent_ed3 > family_inc0-family_inc6 west south midwest urban rural cath_sch, re i(schoolid); Random-effects GLS regression Number of obs = 4980 Group variable: schoolid Number of groups = 498 R-sq: within = Obs per group: min = 10 between = avg = 10.0 overall = max = 10 Random effects u_i ~ Gaussian Wald chi2(21) = corr(u_i, X) = 0 (assumed) Prob > chi2 = soph_scr | Coef. Std. Err. z P>|z| [95% Conf. Interval] age | female | Delete a bunch of results urban | rural | cath_sch | _cons | sigma_u | sigma_e | rho | (fraction of variance due to u_i) * ols with standard errros clustered on the school;. reg soph_scr age female siblings both_parents parent_ed0-parent_ed3 > family_inc0-family_inc6 west south midwest urban rural cath_sch, cluster(schoolid);

10 ρ = σ 2 u /(σ 2 u + σ 2 e ) = Bias of OLS variance is 1+ ρ(T-1) T=10, so bias = (9) = Standard error should be too large by a factor of =

11 OLSRERatio XOLSStd errorREStd error RE/OLS Std errors age female siblings both_parents parent_ed parent_ed parent_ed parent_ed family_inc

12 OLSRERatio XOLSStd errorREStd error RE/OLS Std errors west south midwest urban rural cath_sch

13 *ols; reg soph_scr age female siblings both_parents parent_ed0-parent_ed3 family_inc0-family_inc6 west south midwest urban rural cath_sch; *random effects; xtreg soph_scr age female siblings both_parents parent_ed0-parent_ed3 family_inc0-family_inc6 west south midwest urban rural cath_sch, re i(schoolid); * ols with standard errros clustered on the school; reg soph_scr age female siblings both_parents parent_ed0-parent_ed3 family_inc0-family_inc6 west south midwest urban rural cath_sch, cluster(schoolid);

14 OLSREHuberRatio XOLSStd errorStd errStd errorRE/OLSHu/OLS west south midwest urban rural cath_sch

15 OLSREHuberRatio XOLSStd errorStd errStd errorRE/OLSHu/OLS age female siblings both_parents parent_ed parent_ed

16 Bertrand et al. Identify high type I error rate in Diff-in-diff models through ‘placebo’ regression CPS—monthly data of 160K people, 60K households People in survey same 4 months in a two year period (e.g., April – July 2001 and 2002)

17 ¼ of the households exit the survey either temporarily (month 4) or permanently (month 8) This outgoing group answers detailed questions about job –Weekly/hourly earnings –Usual hours of work –Union status

18 Authors take (21 years) worth of data from 4 th month Construct average weekly earnings of women aged w/ + earnings by state 51 states x 21 years = 1050 cells Regress cell avg. wages on state/year effects Regress residuals on 1 st three lags Autocorrelation coefs are 0.51, 0.44, 0.22

19 Placebo laws Draw year at random from Select 25 states to receive treatment for all years after that year in previous step I st =1 if state received treatment in year t Y ist = I st β + u s + v t + ε ist Run this experiment couple hundred times Calculate % Reject H 0 : β=0

20 With micro data reject null hypothesis 67.5% of time With aggregate data at the state/year cell Rejection rate falls somewhat but it is still high

21 High Type I error rate in standard DnD model Type I error falls almost to expected levels with Huber-type correction Type I error rate ↑ as # of groups ↓

22 bootstrap_example.do *run simple regression reg ln_weekly_earn age age2 years_educ nonwhite union * now boostrap the data. takes N obs with replacement * save results in stata file bs-results.dta bootstrap, saving(bs-results.dta, replace) rep(999) : regress ln_weekly_earn age age2 years_educ union

23. *run simple regression. reg ln_weekly_earn age age2 years_educ nonwhite union Source | SS df MS Number of obs = F( 5, 19900) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = ln_weekly_~n | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | age2 | years_educ | nonwhite | union | _cons |

24.. * now boostrap the data. takes N obs with replacement. * save results in stata file bs-results.dta.. bootstrap, saving(bs-results.dta, replace) rep(999) : regress ln_weekly_earn age age2 years_educ union (running regress on estimation sample) (note: file bs-results.dta not found) Bootstrap replications (999) Delete some results Linear regression Number of obs = Replications = 999 Wald chi2(4) = Prob > chi2 = R-squared = Adj R-squared = Root MSE = | Observed Bootstrap Normal-based ln_weekly_~n | Coef. Std. Err. z P>|z| [95% Conf. Interval] age | age2 | years_educ | union | _cons |

ln_weekly_~n | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | age2 | years_educ | nonwhite | union | _cons | OLS | Observed Bootstrap Normal-based ln_weekly_~n | Coef. Std. Err. z P>|z| [95% Conf. Interval] age | age2 | years_educ | union | _cons | BOOTSTRAP

26

27

28. * run ols without clustered std errors, just for comparison;. reg carton_market_share _I* real_tax; Source | SS df MS Number of obs = F( 42, 1001) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = carton_mar~e | Coef. Std. Err. t P>|t| [95% Conf. Interval] _Istate_2 | _Istate_3 | DELETE SOME RESULTS _Imonth_11 | _Imonth_12 | _Iyear_2005 | _Iyear_2006 | real_tax | _cons |

29. * now run ols and cluster at the state level;. reg carton_market_share _I* real_tax, cluster(state); Linear regression Number of obs = 1044 F( 13, 28) =. Prob > F =. R-squared = Root MSE = (Std. Err. adjusted for 29 clusters in state) | Robust carton_mar~e | Coef. Std. Err. t P>|t| [95% Conf. Interval] _Istate_2 | _Istate_3 | DELETE SOME RESULTS _Imonth_11 | _Imonth_12 | _Iyear_2005 | _Iyear_2006 | real_tax | _cons |

30. di "Number BS reps = $bootreps"; Number BS reps = 999. di "P-value from clustered standard errors = `p_value_main'"; P-value from clustered standard errors = di "P-value from wild boostrap = `p_value_wild'"; P-value from wild boostrap =