Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAMPLE SELECTION Cheti Nicoletti ISER, University of Essex 2009.

Similar presentations


Presentation on theme: "SAMPLE SELECTION Cheti Nicoletti ISER, University of Essex 2009."— Presentation transcript:

1 SAMPLE SELECTION Cheti Nicoletti ISER, University of Essex 2009

2 Wage equation and labour participation for women Gourieroux C. (2000), Econometrics of Qualitative Dependent Variables, Cambridge University Press, Cambridge Let y* be the potential offered wage and let w be the reservation wage then the observed wage y is given by Let us consider the following very simple earnings profile equation

3 Women in the labour force are not a random sample “Women’s labour force participation rates are highly dependent on age.” Gourieroux (2000) Labour participation is in general lower for women aged: –16-20 because some women are still studying –25-44 for work interruption linked to children –55-60 because some women prefer to retire early Presumably the earnings observed for women aged –16-20 are lower than if all women worked –25-44 are higher because women with higher earnings are less incline to work interruptions –55-60 are higher because women with higher earnings are less incline to retire early

4

5 Sample selection model Labour participation equation Probit model for labour participation

6 Joint model for the log-earnings and the labour participation equations Generalized TOBIT MODEL Possible candidates for x: education dummies, age, work experience Possible candidates for z: age, education, number of children, dummies for the presence of children <5, for cohabiting, for widow, regional unemployment rate.

7 Bivariate normal

8 Truncated Normal Suggestions for the proof

9 Sample selection problem E(y*|d=1,x,z)=x  +E(  |d=1,x,z) E(  |d=1,x,z)= E(  |u>-zδ )= E(y*|d=1,x,z)= X 

10 Two-step estimation 1 STEP: estimation of a probit model for the probability to be in the labour market, Π Pr(d i =1|z i ) di Pr(d i =0|z i ) 1-di = Π  (z i  ) di  (-z i  ) 1-di 2 STEP: estimation of the regression model with an additional variable (the inverse Mill’s ratio) using the subsample of individuals with d i =1 (and using some IV restrictions)

11 Testing selectivity If the error terms  and u are uncorrelated, then the selection problem is ignorable. H 0 : σ  u =0 Verifying H 0 is equivalent to verify whether the coefficient of the additional variable in the equation is zero (using for ex. a Wald test) Notice that the errors are heteroskedastic so a proper estimation should be adopted to estimate the standard errors

12 Generalized Tobit: Maximum Likelihood Estimation

13 heckman The heckman command is used to estimate Generalized Tobit or Tobit of the 2 nd type using ML estimation (default option) or the two-step estimation (option [twostep]) heckman y x 1 x 2 … x k, select(z 1 z 2 … z s ) heckman y x 1 x 2 … x k, select(d = z 1 z 2 … z s ) heckman y x 1 x 2 … x k, select(z 1 z 2 … z s ) twostep

14 Generalized Tobit: Maximum Likelihood Estimation

15 Joint modelTwo-step estimation VariableCoeffp-valueCoeffp-value LABOUR PARTICIPATION MODEL Constant-0.570.06-0.990.04 No. children <18-0.120.00-0.130.00 No. children <4-0.090.00-0.070.00 log husband's wage-0.100.04-0.080.06 Years of education0.150.000.140.00 Age0.810.010.910.02 Age square-0.120.03-0.140.01 Correlation between error0.350.00 Inverse Mill's ration0.290.00 WAGE MODEL Constant4.500.024.700.01 Years of education0.110.010.100.00 Work experience0.130.010.080.01 Work experience square0.000.020.010.00

16 Joint model for log-income and response probability Possible candidates for x: education dummies, age, work experience d* is the propensity to respond to the earnings question Z: mode of interview, education, gender, age, etc.

17 Item nonresponse for income equation or poverty model in cross section sample surveys: Potential explanatory variables: Socio-demographic variables: age, gender, level of education, number of adults, number of children. Situational economic circumstance: labour status activity. Data collection characteristics: mode of the interview, number of visits, duration of the interview. (These are plausible IV)

18 Maximum Likelihood estimation of the joint model VariableCoeffp-valueVariableCoeffp-value RESPONSE MODELINCOME MODEL Constant2.130.00Constant2.100.00 Duration of the interview-0.340.12Years of education0.020.00 No. of interview attempts-0.020.01Labour Status Mode (type) of interviewInactive-0.130.01 Face to face interview0.150.00Self-employed-0.210.02 Telephone interview0.050.00Unemployed-0.560.00 Reference category: Post interview Reference category: employed Age-0.020.01Age0.020.00 Age square0.000.45Age square-0.000.00 Female Gender0.310.01 Years of education0.020.05 Correlation between errors-0.230.00

19 Attrition in panel surveys has two possible causes: failed contact and refusal The potential variables explaining attrition (contact and cooperation) are lagged variables observed in the last wave. The equation of interest has to use lagged variables (otherwise we have missing explanatory variables too) Socio-demographic variables: age, gender, level of education, number of adults, number of children. Social-integration: talking often to neighbours, cohabitation, house ownership. Situational economic circumstance: labour status activity, household equalised income. Data collection characteristics: mode of the interview, number of visits, duration of the interview, same interviewer across wave, duration of the panel, length of the fieldwork. (These are plausible IV)

20 Attrition due to lack of cooperation (BHPS 1994-96) VariablesCoefficientsTestp-value Wave 1996 0.171082.210.027 Workload -0.01619-22.040.000 Item nonresponse by interviewer -3.08725-3.740.000 Co-operation rate by interviewer 1.627724.850.000 Age 35 or less -0.05109-0.580.560 Age 60 or more -0.01904-0.150.882 Female 0.209942.770.006 Living without a spouse -0.15878-1.900.057 No. of children -0.03666-0.960.337 No. of adults -0.06812-1.680.092 Unemployed -0.38718-3.000.003 Inactive 0.162811.640.100 No. of visits -0.02887-2.330.020 Same interviewer 0.611587.780.000 Item nonresponse 0.041940.200.843 Constant 1.547517.300.000 Wald joint significance test2068.9No. obs.14265

21 Weighted estimation

22

23 Conditioning and integrating out (marginalizing) with respect to z E Z (E[x’(y*-xβ)dπ -1 ]|x,z) =E Z (E[x’(y*-xβ)|x,z,d=1] Pr(d=1|x,z)π -1 ) =E Z (E[x’(y*-xβ)|x,z])=E[x’(y*-xβ)|x]=0

24 How to use weights in Stata Most Stata commands can deal with weighted data. Stata allows four kinds of weights: 1.fweights, or frequency weights, are weights that indicate the number of duplicated observations. 2.pweights, or sampling weights, are weights that denote the inverse of the probability that the observation is included due to the sampling design, nonresponse or sample selection. 3.aweights, or analytic weights, are weights that are inversely proportional to the variance of an observation; i.e., the variance of the j-th observation is assumed to be sigma^2/w_j, where w_j are the weights. 4.iweights, or importance weights, are weights that indicate the "importance" of the observation in some vague sense.

25 Option pweights Usually sample surveys provide weights to take account of sampling design, nonresponse. Let p be individual weight Then we can run a regression with weighted observations regress y x 1 x 2 … x k [pweight=p] Let us assume to have a random sample affected by nonresponse, but weights to take account of unit nonresponse are not available A possible way to estimate your own weights is described in the following: probit d z 1 z 2 … z s predict prop gen invprop=1/prop reg y x 1 x 2 … x k [pweight=invprop]

26 For complex survey design it is better to use svyset [pweight=p] svy: regress y x 1 x 2 … x k svyset have options for cluster sampling designs or other complex design To declare survey design with stratum svyset [pweight=p], strata(stratid)

27 Stata propensity score methods for evaluation of treatment Abadie A., Drukker D., Herr J.L., Imbens G.W. (2001), Implementing Matching Estimators for Average Treatment Effects in Stata, The Stata Journal, 1, 1-18 http://ksghome.harvard.edu/~.aabadie.academic.ksg/software.html http://ksghome.harvard.edu/~.aabadie.academic.ksg/software.html Becker S.O., Ichino A. (2002), Estimation of average treatment effects based on propensity scores. The Stata Journal, 2, 358- 377 http://www.lrz-muenchen.de/~sobecker/pscore.html http://www.lrz-muenchen.de/~sobecker/pscore.html Sianesi B. (2001), Implementing Propensity Score Matching Estimators with STATA, UK Stata Users Group, VII Meeting London, http://ideas.repec.org/c/boc/bocode/s432001.htmlhttp://ideas.repec.org/c/boc/bocode/s432001.html

28 Some references for regressions with sample selection Buchinski, M. (2001) Quantile regression with sample selection: Estimation women return to education in the U.S., Empirical Economics, 26, 86-113. Ibrahim, J.G., Chen, M.-H., Lipsitz, S.R., Herring, A.H. (2005) Missing-data methods for generalized linear models: A comparative review, Journal of the American Statistical Association, 100, 469, 332-346. Lipsitz, S.R., Fitzmaurice, G.M., Molenberghs, G., Zhao, L.P. (1997), Quantile regression methods for longitudinal data with drop-outs, Applied Statistics, 46, 463- 476. Robins, J. M., Rotnitzky, A. (1995), Semiparametric Effciency in Multivariate Regression Models With Missing Data, Journal of the American Statistical Association, 90, 122-129. Vella F. (1998), Estimating models with sample selection bias: a survey', The Journal of Human Resources, vol. 3, 127-169. Wooldridge, J.M. (2007) Inverse probability weighted M-Estimation for General missing data problems, Journal of Econometrics, 141, 2, 1281- 1301.


Download ppt "SAMPLE SELECTION Cheti Nicoletti ISER, University of Essex 2009."

Similar presentations


Ads by Google