Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rong Zhang, Brett Inder, Xibin Zhang

Similar presentations


Presentation on theme: "Rong Zhang, Brett Inder, Xibin Zhang"— Presentation transcript:

1 Rong Zhang, Brett Inder, Xibin Zhang
Bayesian Analysis of A Model with Binary Selectivity and Ordered Outcomes Rong Zhang, Brett Inder, Xibin Zhang Good morning, everyone. Today my topic is ... This a joint work with Brett Inder and Xibin Zhang. Department of Econometrics and Business Statistics Monash University

2 1. Introduction & The Model
Outline 1. Introduction & The Model 2. Bayesian Estimation 3. Numerical Study My presentation comprises four parts. First, I would like to introduce our motivation, the exsiting sample selection model and estimation method as well as our model. Second, Bayesian Estimation will be given including how to reparameterize our model and design certain priors to obtain Gibbs sampler. Third, some numerical study has already been done to compare the results of Bayesian estimation and FIML estimation, since MLE is a benchmark among estimation methods, In the end, Bayesin estimation will be applied to some empirical work about mental illness and labor market. 4. Application

3 Introduction & The Model

4 Introduction: Motivation
This model is mainly motivated by empirical work in labor economics where selectivity bias often exists. Relationship between employment status and occupational categories. Our model is mainly motivated by empirical work in labor economics where selectivity bias always exists. For example, we are interested in the relationship between employment status and occupational categories. Here employment status is indicated by 0/1 binary choices while occupational categories are discrete and ordinal. When one person is employed, we can observe his or her occupational category. When the person is not employed, we cannot observe the occupational level. That is how selectivity bias ockers. Meanwhile, endogeniety may arise when unobserved factors affect both employment and occupations.

5 Introduction: Existing Models
Sample Selection Models Heckman (1979) Boyes et al. (1989) Other researchers have already done some research in similar circumstances. For instance, the standard sample selection model was proposed by Heckman in Heckman’s model arose from the interests of female wages taking account of selection via a labor supply equation. Namely, Heckman’s selection equation is 0/1 binary choices equation. But the main equation is a continuous equation about female income. In addtion, Boyes analyzed the banking scoring problem and extended the sample selection model into a bivariate probit model censoring in one way. The selection is decided by whether an individual is granted a loan while the cersoring equation is determined by whether the individual defaults on a loan. Our model is also 0/1 binary selection, but occupational categories are the independent variables in the main equation. So we use a ordered probit type in the main equation instead the continuous equation in Heckman’s model .

6 Introduction: Our Model
Our model can be expressed in a system with latent variables like this. In equation 1, zi1 and zi2 are latent variables. Error terms in the two equations are assumed to be correlated with each other. Both variances of error terms are assumed to be one for identification. In equation 3, when latent variables zi1>0, yi1 is given a value 1 and the person is employed. Then we can observe the occupational category lowercase j when latent variable zi2 locates between certain threshold parameters gammj-1 and gammj. Otherwise the person is not employed when zi1<0 and yi1 has a zero value. No matter what the value zi2 is, yi2 is given a zero value. In other words, we observe employment status 1 and occupational categories j when a person is employed. We give 0 values to both employment and occupations when a person is unemployed.

7 Introduction: Estimation Methods
Heckman’s (1979) Two-step Estimation Full Information Maximum Likelihood (FIML) Estimation by Greene (2006) Semi-parametric Methods (Vella, 1998) Our Contribution Bayesian Estimation In order to estimate sample selection models, we can use Heckman’s Two-step estimation. Two-step method is quite straight forward and easy to conduct. But sometimes it is lack of efficiency. Moreover, two-step method is usually used when the main equation is continuous and it cannot be easily extended to general nonlinear models. Or we can try Full Information Maximum Likelihood estimation. William Greene in 2006 summarized the main idea of FIML estimation according to four types of sample selection models. As we all know, MLE occasionally gets convergence problems. Besides that, researchers like Vella also applied some semi-parametric methods. In our paper, we use Bayesian method to estimate the parameters in our model. Since Bayesian methods and MLE are in different frameworks, Bayesian method can be treated as an alternative method to FIML estimation. Sometimes, Bayesian estimation can provide much richer information including the distributions of parameters or the distributions of any interested probabilities.

8 Bayesian Estimation

9 Reminder: Our Model Just to remind my model. Latent variables zi1 and zi2 follows some bivariate normal distribution given x and beta values. The likelihood function of Y will be 0 or 1 given latent variables. 9

10 Bayesian Estimation: Joint Posterior of Parameters and Latent Variables
The starting point of Bayesian methods is that posterior is proportional to priors times the likelihood function. In stead of joint posterior of parameters, we consider joint posterior of parameters and latent variables. In equation 4 step 1, we apply Bayes theorem on priors of parameters and latent variables. Then in step 2, Bayes theorem is applied again on the conditional likelihood function. For each individual is assumed to be independent to each other, we obtain the third step in equation 4. After plug in all probability functions into equation 4, we get the full expressions of joint posterior in equation 5. Based on this joint posterior, we can work out conditional posterior distribution for each parameters and latent variables. However, if we sample rho directly from this posterior, we cannot get the standard form for it. And the common ways to sample threshold parameters can lead to very slow mixing of whole algorithm. That is why we apply some techniques to get a more efficient Gibbs sampler. 10

11 Bayesian Estimation: Reparameterization
We first reparameterize our model. This new model is exactly the same system as the original one. But the second equation is standardized by dividing the largest threshold parameter gammJ-1. So, all the new threshold parameters are among 0 and 1. Meanwhile, the variance in the second equation is not one any more. In this way, we eliminate two parameters gammJ-1 and rho, by introducing two new parameters lambda and phi as they are shown in the covariance matrix. 11

12 Joint Posterior of New Parameters and Latent Variables
Bayesian Estimation: Joint Posterior of New Parameters and Latent Variables The new joint posterior equals to the Jacobian times the original joint posterior. We can obtain the original joint posterior from equation 5. As I just mentioned, the conditional posterior of rho has no standard form. With specially designed priors, we can get standard forms for the new parameters lambda and phi. We plug in all proper priors into function 5, then times the Jacobian to get equation 6. From equation 6, we can work out the priors for the new parameters. Once again, I want to emphasis that the prior is specially designed here so conjugate conditional posteriors can be got from equation 6 as follows 12

13 Bayesian Estimation: Conditional Posteriors
Here TN denotes a univariate truncated normal distribution. Equation 7 shows latent variable zi1 follows a normal distribution truncated in certain region. New latent variable zi2* also has a truncated normal distribution when the data is uncensored, while the censor one has a normal distribution. Later on, beta has a Gaussian density, gamma follows normal distribution, phi follows inverse gamma distribution. Conditional posterior of threshold parameters gamma* still have some complicated forms, and Metropolis-Hasting algorithm is utilized here with a Dirichlet proposal density. After we get all the conditional posteriors, we sample each latent variables and parameters one by one and repeat the process several times until the Markov Chains converge to some stationary results which will result in the joint density. 13

14 I will soon illustrate the proposed Gibbs sampler using a simulation study.
Numerical Study

15 Reminder: Our Model True values of parameters are set up in advance. Beta1 and beta2 are 2*1 vectors. I only set up one threshold parameter gamm2 and rho = 0.5. Following this original model, we generate one sample with sample size equals 1000. 15

16 Numerical Study Then apply Gibbs sampler to estimate parameters. 3,000 initial draws are discarded as the burn-in period, and the next 5,000 iterations are recorded. Based on the 5000 iterations, we can calculate statistics like mean, sd, credible interval and SIF. From table, we can see mean of each parameters are close to the true values. Sd are relatively small except sd of rho considering the magnitude. It is consistent with SIF values because rho has the largest SIF value. SIF values stands for simulation inefficient factor which is used to test convergence rate. The smaller SIF value the better convergence. It is not surprising for rho is the most difficult parameter to estimate. 16

17 Numerical Study If we check the sample path and ACF, the result is still consistent with SIF values. The sample paths are stationary although Rho has most fluctuation. 17

18 Numerical Study From ACF, sample paths of all parameters decay fast although the sample path of rho decays a bit slower than others. Overall, the convergence is quite good. 18

19 Numerical Study: Monte Carlo Simulation
To exam the performance of the proposed Gibbs sampler, Monte Carlo simulation is utilized here. Sample size is still 1000 and we generate 100 samples. For each sample we apply both MCMC and FIML estimates to get two estimation results and then give some statistics about the mean, max/min values and other statistics about errors. From table 2, the results of MCMC are very close to the results of FIML. The estimated means are all close to true values. No extreme values exist for both methods. Error terms are all relatively small. It proves Bayesian method is an alternative method of FIML. 19

20 An Application about Mental Illness and Labor Market
We then apply our Bayesian method in some empirical work about mental illness and labour market. An Application about Mental Illness and Labor Market 20

21 From the 1997 National Survey of Mental Health and Wellbeing of Adults
An Application: From the 1997 National Survey of Mental Health and Wellbeing of Adults The data are from the 1997 National Survey of Mental Health and Wellbeing of Adults. Dependent variables include gender, age, education levels, geographic factors and health concerns like mental and physical conditions. Yi1 is binary choices about employment status and yi2 are discrete from 1 to 5. When an individual is employed, we can observe one of the five skill levels ranging from elementary, intermediate, advance workers, associate professional and professional. 21

22 An Application: Summary Statistics
Table 3 shows that a small proportion of respondents (1.78%) have short term mental health disorders, while most of them have only one disorder. From Table 4, however, it is obvious that 17.93% of respondents suffer long term mental health problems while 4.84% suffer more than two mental health disorders. 22

23 An Application: Parameter Estimation
After Bayesian estimation, we get the parameter statistics. A 5% significance level is applied in Bayesian credible interval which is used to test significance. According to mental health variables, it seems short term mental disorders have no effect on both employment and occupations because zero is contained in the 95% credible intervals. The effects of this variable may be too difficult to detect as only 1.78 % respondents have short term mental illness. Meanwhile, long term mental disorders can determine employment but cannot decide occupational levels. 23

24 An Application: Distributions of NUMMHDLT in Employment Equation
This figure indicates the distributions of NUMMHDLT in the employment equation, including one estimated coefficient density and five marginal effects densities. The coefficient density looks quite normally distributed with mean The negative sign is as expected because more serious mental problems will result in less possibility of employment. On the other hand, all marginal effects distributions tend to have slight skewness. From those distributions, we can see that once a person is employed, his mental problem will increase probability of low level occupation and decrease the probability of high level occupation. 24

25 Conclusion Bayesian Approach for A Specific Model
Bayesian Approach is as Reliable as FIML Better Overview about the Distributions of Parameters and Probability In conclusion, our paper has provided a Bayesian approach for a specific model with binary selection and ordered outcome observations. It proves Bayesian approach is as reliable as FIML estimation. In particular, Bayesian method shows a better overview about the distributions of the parameters and marginal effects. 25


Download ppt "Rong Zhang, Brett Inder, Xibin Zhang"

Similar presentations


Ads by Google