Implementation of a double-hurdle model Bruno Garcia The Stata Journal (2013), 13, Number 4, pp. 776-794 Presented by Gulzat
The paper is about A double hurdle model (DHM) (Cragg, 1971 Econometrica 39: 829-844) What is new: Stata command dblhurdle (and predict after dblhurdle )
Censored dependent variable models E.g. Consumer or not if a consumer the value of the expenditure is known Tobit: assumes that the factors explaining of becoming a consumer and how much to spend have the same effect on these two decisions DHM: allows these effects to differ
Tobit Model 𝑌 𝑖 = 𝑌 𝑖 ∗ 𝑖𝑓 𝑌 𝑖 ∗ >0 𝑌 𝑖 =0 𝑖𝑓 𝑌 𝑖 ∗ ≤0 𝑌 𝑖 = 𝑌 𝑖 ∗ 𝑖𝑓 𝑌 𝑖 ∗ >0 𝑌 𝑖 =0 𝑖𝑓 𝑌 𝑖 ∗ ≤0 𝑌 𝑖 ∗ = 𝑋 𝑖 𝛽+ 𝜀 𝑖 and 𝜀 𝑖 ≈𝑁(0, 𝜎 2 ) Two variables and one model to explain these two variables
Double Hurdle Model 1. Potential consumer or not, D is not observed 𝐷 𝑖 =1 𝑖𝑓 𝑍 𝑖 𝛿+ 𝑢 𝑖 >0 𝐷 𝑖 =0 𝑖𝑓 𝑍 𝑖 𝛿+ 𝑢 𝑖 ≤0 2. 𝑌 𝑖 ∗ = 𝑋 𝑖 𝛽+ 𝜀 𝑖 𝑌 𝑖 = 𝑌 𝑖 ∗ 𝑖𝑓 𝐷 𝑖 =1 𝑎𝑛𝑑 𝑌 𝑖 ∗ >0 𝑌 𝑖 =0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (or 𝐷 𝑖 =0 or ( 𝑌 𝑖 ∗ ≤0 & 𝐷 𝑖 =1) ) 𝑢 𝑖 ≈𝑁 0,1 𝜀 𝑖 ≈𝑁(0, 𝜎 2 ) 𝑐𝑜𝑟𝑟( 𝑢 𝑖 , 𝜀 𝑖 )=𝜌 unobserved elements effecting consumers/nonconsumers may affect amount of expenditure Individuals make decisions in two steps
Double Hurdle Model (following the paper.....) Decision 1: participation Decision 2: quantity (maybe zero) 𝑦 𝑖 =the observed consumption of an individual, dependent variable continous over positive values, but 𝑃 𝑦=0 >0 𝑎𝑛𝑑 𝑃 𝑦<0 =0 𝑦 𝑖 = 𝑥 𝑖 𝛽+ 𝜖 𝑖 𝑖𝑓 min 𝑥 𝑖 𝛽+ 𝜖 𝑖 , 𝑧 𝑖 𝛾+ 𝑢 𝑖 >0 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝜖 𝑖 𝑢 𝑖 ~𝑁 0,Σ , Σ= 1 𝜎 12 𝜎 12 𝜎 Ψ 𝑥,𝑦,𝜌 =CDF of a bivariate normal with correlation 𝜌
Double Hurdle Model The log liklihood function for the DHM (Φ−𝐶𝐷𝐹, 𝜙−𝐷𝐹): log 𝐿 = 𝑦 𝑖 =0 𝑙𝑜𝑔 1−Φ 𝑧 𝑖 𝛾, 𝑥 𝑖 𝛽 𝜎 ,𝜌 + 𝑦 𝑖 >0 𝑙𝑜𝑔 Φ 𝑧 𝑖 𝛾+ 𝜌 𝜎 ( 𝑦 𝑖 − 𝑥 𝑖 𝛽) 1− 𝜌 2 −𝑙𝑜𝑔 𝜎 +𝑙𝑜𝑔 𝜙 𝑦 𝑖 − 𝑥 𝑖 𝛽 𝜎 𝑦 𝑖 >0 𝑙𝑜𝑔 Φ 𝑧 𝑖 𝛾+ 𝜌 𝜎 ( 𝑦 𝑖 − 𝑥 𝑖 𝛽) 1− 𝜌 2 −𝑙𝑜𝑔 𝜎 +𝑙𝑜𝑔 𝜙 𝑦 𝑖 − 𝑥 𝑖 𝛽 𝜎
Double Hurdle Model 𝑥 𝑖 𝛽+ 𝜖 𝑖 models the quantity equation 𝑧 𝑖 𝛾+ 𝑢 𝑖 models the participation equation The command estimates 𝛽,𝛾,𝜌, 𝑎𝑛𝑑 𝜎 where 𝜎=𝑉𝑎𝑟(𝜖) Restriction: 𝑉𝑎𝑟 𝑢 =1 the model to be identified
Double Hurdle Model: Stata
Double Hurdle Model
Example: The use of the dblhurdle command using smoke Example: The use of the dblhurdle command using smoke.dta from Wooldridge (2010).
Marginal effects The number of years of schooling (educ) on: 1. The probability of smoking 2. The expected number of cigarettes smoked given that you smoke 3. The expected number of cigarettes smoked
Prediction ppar - the probability of being away from the corner conditional on the covariates: ycond - expectation: yexpected - expected value of y conditional on x and z:
Marginal effects
Marginal effects
Marginal effects
Monte Carlo simulation: Finite sample properties of the estimator Three measures of performance: The mean of the estimated parameters should be close to their true values. The mean standard error of the estimated parameters over the repetitions should be close to the standard deviation of the point estimates. The rejection rate of hypothesis tests should be close to the nominal size of the test.
Monte Carlo simulation The data-generating process can be summarized as follows:
Monte Carlo simulation A dataset of 2,000 observations was created. The x’s were drawn from a standard normal distribution, and the d’s were drawn from a Bernoulli with p = 1/2. Refer to this dataset as “base”. Iteration of the simulation: 1. Use “base”. 2. For each observation, draw (gen) 𝜖 from a standard normal. 3. For each observation, draw (gen) u from a standard normal. 4. For each observation, compute y according to the data-generating process presented above. 5. Fit the model, and save the values of interest with post.
Monte Carlo simulation
Monte Carlo simulation A less intuitive issue: The set of regressors in the participation equation=the set of regressors of the quantity equation. The model is weakly identified. The data-generating process:
Monte Carlo simulation
Conclusion Researchers may consider dblhurdle when using tobit model Its flexibility allows the researcher to break down the modeled quantity along two useful dimensions, the “quantity” dimension and the “participation” dimension The command presented in this article only allows for a single corner in the data One desirable feature to add is the capability to handle dependent variables with two corners