SAMPLE SELECTION in Earnings Equation Cheti Nicoletti ISER, University of Essex.

Slides:



Advertisements
Similar presentations
Qualitative and Limited Dependent Variable Models Chapter 18.
Advertisements

Employment transitions over the business cycle Mark Taylor (ISER)
Random Assignment Experiments
What are the causes of age discrimination in employment?
Lecture 8 (Ch14) Advanced Panel Data Method
Solving the Problem of Attrition in Longitudinal Surveys: Effects of Interviewer Continuity Peter Lynn, Olena Kaminska University of Essex and Harvey Goldstein.
Some birds, a cool cat and a wolf
Copyright 2010, The World Bank Group. All Rights Reserved. Estimation and Weighting Part II.
Nguyen Ngoc Anh Nguyen Ha Trang
Pooled Cross Sections and Panel Data II
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Clustered or Multilevel Data
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
Introduction to Multilevel Modeling Using SPSS
Complexities of Complex Survey Design Analysis. Why worry about this? Many government studies use these designs – CDC National Health Interview Survey.
Estimation of Demand Prof. Ravikesh Srivastava Lecture-8.
Tax Subsidies for Out-of-Pocket Healthcare Costs Jessica Vistnes Agency for Healthcare Research and Quality William Jack Georgetown University Arik Levinson.
Copyright 2010, The World Bank Group. All Rights Reserved. Estimation and Weighting, Part I.
Off-farm labour participation of farmers and spouses Alessandro Corsi University of Turin.
1 Spatial Variation and Pricing in the UK Residential Mortgage Market 15 th June 2012 Allison Orr, Gwilym Pryce (University of Glasgow)
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
Why are White Nursing Home Residents Twice as Likely as African Americans to Have an Advance Directive? Understanding Ethnic Differences in Advance Care.
N ational T ransfer A ccounts 1 Data and Estimation Issues Sang-Hyop Lee University of Hawaii at Manoa.
Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.
Social Capital and Blood Donation in the Netherlands René Bekkers VU University Amsterdam November 17, th Arnova Conference, Toronto Ingrid Veldhuizen.
Data Analysis to NTA Sang-Hyop Lee 41 Summer Seminar June 8, 2010.
SAMPLE SELECTION Cheti Nicoletti ISER, University of Essex 2009.
[Part 4] 1/43 Discrete Choice Modeling Bivariate & Multivariate Probit Discrete Choice Modeling William Greene Stern School of Business New York University.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Multilevel Data in Outcomes Research Types of multilevel data common in outcomes research Random versus fixed effects Statistical Model Choices “Shrinkage.
Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
HAOMING LIU JINLI ZENG KENAN ERTUNC GENETIC ABILITY AND INTERGENERATIONAL EARNINGS MOBILITY 1.
Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.
Managerial Economics Demand Estimation & Forecasting.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Discrete Choice Modeling William Greene Stern School of Business New York University.
A discussion of Comparing register and survey wealth data ( F. Johansson and A. Klevmarken) & The Impact of Methodological Decisions around Imputation.
Who supports whom? Co-residence between young adults and their parents Maria IacovouMaria Davia Funded by JRF as part of the Poverty among Youth: International.
The Average Propensity to Consume Out of Full Wealth: Testing a New Measure.
Chapter 13: Limited Dependent Vars. Zongyi ZHANG College of Economics and Business Administration.
An Analysis of the Impact of SSP on Wages Jeffrey Zabel Economics Department Tufts University Saul Schwartz School of Public Policy and Administration.
N ational T ransfer A ccounts Data and Estimation Issues May 15, 2009 Sang-Hyop Lee University of Hawaii at Manoa.
Discrete Choice Modeling William Greene Stern School of Business New York University.
Item-Non-Response and Imputation of Labor Income in Panel Surveys: A Cross-National Comparison ITEM-NON-RESPONSE AND IMPUTATION OF LABOR INCOME IN PANEL.
1 1/5/2016 The Link between Individual Expectations and Savings: Do nursing home expectations matter? Kristin J. Kleinjans, University of Aarhus & RAND.
Using microsimulation model to get things right: a wage equation for Poland Leszek Morawski, University of Warsaw Michał Myck, DIW - Berlin Anna Nicińska,
1 Empirical methods: endogeneity, instrumental variables and panel data Advanced Corporate Finance Semester
6. Ordered Choice Models. Ordered Choices Ordered Discrete Outcomes E.g.: Taste test, credit rating, course grade, preference scale Underlying random.
[Part 5] 1/43 Discrete Choice Modeling Ordered Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
4. Tobit-Model University of Freiburg WS 2007/2008 Alexander Spermann 1 Tobit-Model.
Copyright © 2015 Inter-American Development Bank. This work is licensed under a Creative Commons IGO 3.0 Attribution-Non Commercial-No Derivatives (CC-IGO.
Multiple Regression Analysis with Qualitative Information
Panel Data Models By Mai Thanh, Jin Lulu.
Multiple Regression Analysis with Qualitative Information
Introduction to Survey Data Analysis
Charles University Charles University STAKAN III
Impact evaluation: The quantitative methods with applications
Multiple Regression Analysis with Qualitative Information
Chapter 12: Other nonresponse correction techniques
LIMITED DEPENDENT VARIABLE REGRESSION MODELS
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Evaluating Impacts: An Overview of Quantitative Methods
The European Statistical Training Programme (ESTP)
Chapter: 9: Propensity scores
Multiple Regression Analysis with Qualitative Information
in the Spanish Labour Market:
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Introduction to Econometrics, 5th edition
Presentation transcript:

SAMPLE SELECTION in Earnings Equation Cheti Nicoletti ISER, University of Essex

Wage equation and labour participation for women Gourieroux C. (2000), Econometrics of Qualitative Dependent Variables, Cambridge University Press, Cambridge Let y* be the potential offered wage and let w be the reservation wage then the observed wage y is given by Let us consider the following very simple earnings profile equation

Women in the labour force are not a random sample “Women’s labour force participation rates are highly dependent on age.” Gourieroux (2000) Labour participation is in general lower for women aged: –16-20 because some women are still studying –25-44 for work interruption linked to children –55-60 because some women prefer to retire early Presumably the earnings observed for women aged –16-20 are lower than if all women worked –25-44 are higher because women with higher earnings are less incline to work interruptions –55-60 are higher because women with higher earnings are less incline to retire early

Sample selection model Labour participation equation Probit model for labour participation

Joint model for the log-earnings and the labour participation equations Generalized TOBIT MODEL Possible candidates for x: education dummies, age, work experience Possible candidates for z: age, education, number of children, dummies for the presence of children <5, for cohabiting, for widow, regional unemployment rate.

Sample selection problem E(y*|d=1,x,z)=x  +E(  |d=1,x,z) E(  |d=1,x,z)= E(  |ν>-zδ )= E(y*|d=1,x,z)= X 

Two-step estimation 1 STEP: estimation of a probit model for the probability to be in the labour market, Π Pr(d i =1|z i ) di Pr(d i =0|z i ) 1-di = Π  (z i  ) di  (-z i  ) 1-di 2 STEP: estimation of the regression model with an additional variable (the inverse Mill’s ratio) using the subsample of individuals with d i =1 (and using some IV restrictions)

Testing selectivity If the error terms  and u are uncorrelated, then the selection problem is ignorable. H 0 : σ  u =0 Verifying H 0 is equivalent to verify whether the coefficient of the additional variable in the equation is zero (using for ex. a Wald test) Notice that the errors are heteroskedastic so a proper estimation should be adopted to estimate the standard errors

Generalized Tobit: Maximum Likelihood Estimation

heckman The heckman command is used to estimate Generalized Tobit or Tobit of the 2 nd type using ML estimation (default option) or the two-step estimation (option [twostep]) heckman y x 1 x 2 … x k, select(z 1 z 2 … z s ) heckman y x 1 x 2 … x k, select(d = z 1 z 2 … z s ) heckman y x 1 x 2 … x k, select(z 1 z 2 … z s ) twostep

Generalized Tobit: Maximum Likelihood Estimation

Joint model for log-earnings and response probability Possible candidates for x: education dummies, age, work experience d* is the propensity to respond to the earnings question Z: mode of interview, education, gender, age, etc.

Item nonresponse for income equation or poverty model in cross section sample surveys: Potential explanatory variables: Socio-demographic variables: age, gender, level of education, number of adults, number of children. Situational economic circumstance: labour status activity. Data collection characteristics: mode of the interview, number of visits, duration of the interview. (These are plausible IV)

Attrition in panel surveys has two possible causes: failed contact and refusal The potential variables explaining attrition (contact and cooperation) are lagged variables observed in the last wave. The equation of interest has to use lagged variables (otherwise we have missing explanatory variables too) Socio-demographic variables: age, gender, level of education, number of adults, number of children. Social-integration: talking often to neighbours, cohabitation, house ownership. Situational economic circumstance: labour status activity, household equalised income. Data collection characteristics: mode of the interview, number of visits, duration of the interview, same interviewer across wave, duration of the panel, length of the fieldwork. (These are plausible IV)

How to use weights in Stata Most Stata commands can deal with weighted data. Stata allows four kinds of weights: 1.fweights, or frequency weights, are weights that indicate the number of duplicated observations. 2.pweights, or sampling weights, are weights that denote the inverse of the probability that the observation is included due to the sampling design, nonresponse or sample selection. 3.aweights, or analytic weights, are weights that are inversely proportional to the variance of an observation; i.e., the variance of the j-th observation is assumed to be sigma^2/w_j, where w_j are the weights. 4.iweights, or importance weights, are weights that indicate the "importance" of the observation in some vague sense.

Option pweights Usually sample surveys provide weights to take account of sampling design, nonresponse. Let p be individual weight Then we can run a regression with weighted observations regress y x 1 x 2 … x k [pweight=p] Let us assume to have a random sample affected by nonresponse, but weights to take account of unit nonresponse are not available A possible way to estimate your own weights is described in the following: probit d z 1 z 2 … z s predict prop gen invprop=1/prop reg y x 1 x 2 … x k [pweight=invprop]

For complex survey design it is better to use svyset [pweight=p] svy: regress y x 1 x 2 … x k svyset have options for cluster sampling designs or other complex design To declare survey design with stratum svyset [pweight=p], strata(stratid)

Stata propensity score methods for evaluation of treatment Abadie A., Drukker D., Herr J.L., Imbens G.W. (2001), Implementing Matching Estimators for Average Treatment Effects in Stata, The Stata Journal, 1, Becker S.O., Ichino A. (2002), Estimation of average treatment effects based on propensity scores. The Stata Journal, 2, Sianesi B. (2001), Implementing Propensity Score Matching Estimators with STATA, UK Stata Users Group, VII Meeting London,