Implementation of a double-hurdle model Bruno Garcia The Stata Journal (2013), 13, Number 4, pp. 776-794 Presented by Gulzat.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Sampling Distributions (§ )
16. Censoring, Tobit and Two Part Models. Censoring and Corner Solution Models Censoring model: y = T(y*) = 0 if y* < 0 y = T(y*) = y* if y* > 0. Corner.
GoldSim 2006 User Conference Slide 1 Vancouver, B.C. The Submodel Element.
Hypothesis testing Week 10 Lecture 2.
Ch 6 Introduction to Formal Statistical Inference.
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
Business 205. Review Sampling Continuous Random Variables Central Limit Theorem Z-test.
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
Estimation from Samples Find a likely range of values for a population parameter (e.g. average, %) Find a likely range of values for a population parameter.
Sample size computations Petter Mostad
SUMS OF RANDOM VARIABLES Changfei Chen. Sums of Random Variables Let be a sequence of random variables, and let be their sum:
Lec 6, Ch.5, pp90-105: Statistics (Objectives) Understand basic principles of statistics through reading these pages, especially… Know well about the normal.
G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Inference about Population Parameters: Hypothesis Testing
Choosing Statistical Procedures
Week 9 Chapter 9 - Hypothesis Testing II: The Two-Sample Case.
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Tax Subsidies for Out-of-Pocket Healthcare Costs Jessica Vistnes Agency for Healthcare Research and Quality William Jack Georgetown University Arik Levinson.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Elementary Statistical Methods André L. Souza, Ph.D. The University of Alabama Lecture 22 Statistical Power.
Interval Estimation and Hypothesis Testing
Chapter 14 Monte Carlo Simulation Introduction Find several parameters Parameter follow the specific probability distribution Generate parameter.
Probabilistic Mechanism Analysis. Outline Uncertainty in mechanisms Why consider uncertainty Basics of uncertainty Probabilistic mechanism analysis Examples.
Chapter 7 Hypothesis testing. §7.1 The basic concepts of hypothesis testing  1 An example Example 7.1 We selected 20 newborns randomly from a region.
1 SMU EMIS 7364 NTU TO-570-N Inferences About Process Quality Updated: 2/3/04 Statistical Quality Control Dr. Jerrell T. Stracener, SAE Fellow.
Regression. Population Covariance and Correlation.
1.State your research hypothesis in the form of a relation between two variables. 2. Find a statistic to summarize your sample data and convert the above.
Estimating  0 Estimating the proportion of true null hypotheses with the method of moments By Jose M Muino.
Issues in Estimation Data Generating Process:
1 9 Tests of Hypotheses for a Single Sample. © John Wiley & Sons, Inc. Applied Statistics and Probability for Engineers, by Montgomery and Runger. 9-1.
Estimating and Testing Hypotheses about Means James G. Anderson, Ph.D. Purdue University.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Machine Design Under Uncertainty. Outline Uncertainty in mechanical components Why consider uncertainty Basics of uncertainty Uncertainty analysis for.
Chapter 10 The t Test for Two Independent Samples.
Inference ConceptsSlide #1 1-sample Z-test H o :  =  o (where  o = specific value) Statistic: Test Statistic: Assume: –  is known – n is “large” (so.
Panel Random-Coefficient Model (xtrc) 경제학과 박사과정 이민준.
Stat 31, Section 1, Last Time Distribution of Sample Means –Expected Value  same –Variance  less, Law of Averages, I –Dist’n  Normal, Law of Averages,
Nonlinear regression Review of Linear Regression.
Statistics The big picture.... Populations We want to learn about a population. A population is any large collection of objects or individuals, such as.
Power of a test. power The power of a test (against a specific alternative value) Is In practice, we carry out the test in hope of showing that the null.
Hypothesis Testing Steps for the Rejection Region Method State H 1 and State H 0 State the Test Statistic and its sampling distribution (normal or t) Determine.
6. Ordered Choice Models. Ordered Choices Ordered Discrete Outcomes E.g.: Taste test, credit rating, course grade, preference scale Underlying random.
4. Tobit-Model University of Freiburg WS 2007/2008 Alexander Spermann 1 Tobit-Model.
1/25 Introduction to Econometrics. 2/25 Econometrics Econometrics – „economic measurement“ „May be defined as the quantitative analysis of actual economic.
Lecture 6 Feb. 2, 2015 ANNOUNCEMENT: Lab session will go from 4:20-5:20 based on the poll. (The majority indicated that it would not be a problem to chance,
Power and Multiple Regression
Determining Sample Size
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Limited Dependent Variables
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Instrumental Variable (IV) Regression
Hypothesis Tests: One Sample
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
Chapter 6 Making Sense of Statistical Significance: Decision Errors, Effect Size and Statistical Power Part 1: Sept. 18, 2014.
Confidence Intervals Chapter 10 Section 1.
Hypothesis Tests Regarding a Parameter
LIMITED DEPENDENT VARIABLE REGRESSION MODELS
MULTIVARIATE REGRESSION MODELS
Chapter 14 Monte Carlo Simulation
Confidence Intervals & Polls
Sampling Distributions (§ )
Statistical Power.
Inference Concepts 1-Sample Z-Tests.
Inference about Population Mean
Presentation transcript:

Implementation of a double-hurdle model Bruno Garcia The Stata Journal (2013), 13, Number 4, pp. 776-794 Presented by Gulzat

The paper is about A double hurdle model (DHM) (Cragg, 1971 Econometrica 39: 829-844) What is new: Stata command dblhurdle (and predict after dblhurdle )

Censored dependent variable models E.g. Consumer or not if a consumer the value of the expenditure is known Tobit: assumes that the factors explaining of becoming a consumer and how much to spend have the same effect on these two decisions DHM: allows these effects to differ

Tobit Model 𝑌 𝑖 = 𝑌 𝑖 ∗ 𝑖𝑓 𝑌 𝑖 ∗ >0 𝑌 𝑖 =0 𝑖𝑓 𝑌 𝑖 ∗ ≤0 𝑌 𝑖 = 𝑌 𝑖 ∗ 𝑖𝑓 𝑌 𝑖 ∗ >0 𝑌 𝑖 =0 𝑖𝑓 𝑌 𝑖 ∗ ≤0 𝑌 𝑖 ∗ = 𝑋 𝑖 𝛽+ 𝜀 𝑖 and 𝜀 𝑖 ≈𝑁(0, 𝜎 2 ) Two variables and one model to explain these two variables

Double Hurdle Model 1. Potential consumer or not, D is not observed 𝐷 𝑖 =1 𝑖𝑓 𝑍 𝑖 𝛿+ 𝑢 𝑖 >0 𝐷 𝑖 =0 𝑖𝑓 𝑍 𝑖 𝛿+ 𝑢 𝑖 ≤0 2. 𝑌 𝑖 ∗ = 𝑋 𝑖 𝛽+ 𝜀 𝑖 𝑌 𝑖 = 𝑌 𝑖 ∗ 𝑖𝑓 𝐷 𝑖 =1 𝑎𝑛𝑑 𝑌 𝑖 ∗ >0 𝑌 𝑖 =0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (or 𝐷 𝑖 =0 or ( 𝑌 𝑖 ∗ ≤0 & 𝐷 𝑖 =1) ) 𝑢 𝑖 ≈𝑁 0,1 𝜀 𝑖 ≈𝑁(0, 𝜎 2 ) 𝑐𝑜𝑟𝑟( 𝑢 𝑖 , 𝜀 𝑖 )=𝜌 unobserved elements effecting consumers/nonconsumers may affect amount of expenditure Individuals make decisions in two steps

Double Hurdle Model (following the paper.....) Decision 1: participation Decision 2: quantity (maybe zero) 𝑦 𝑖 =the observed consumption of an individual, dependent variable continous over positive values, but 𝑃 𝑦=0 >0 𝑎𝑛𝑑 𝑃 𝑦<0 =0 𝑦 𝑖 = 𝑥 𝑖 𝛽+ 𝜖 𝑖 𝑖𝑓 min 𝑥 𝑖 𝛽+ 𝜖 𝑖 , 𝑧 𝑖 𝛾+ 𝑢 𝑖 >0 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝜖 𝑖 𝑢 𝑖 ~𝑁 0,Σ , Σ= 1 𝜎 12 𝜎 12 𝜎 Ψ 𝑥,𝑦,𝜌 =CDF of a bivariate normal with correlation 𝜌

Double Hurdle Model The log liklihood function for the DHM (Φ−𝐶𝐷𝐹, 𝜙−𝐷𝐹): log 𝐿 = 𝑦 𝑖 =0 𝑙𝑜𝑔 1−Φ 𝑧 𝑖 𝛾, 𝑥 𝑖 𝛽 𝜎 ,𝜌 + 𝑦 𝑖 >0 𝑙𝑜𝑔 Φ 𝑧 𝑖 𝛾+ 𝜌 𝜎 ( 𝑦 𝑖 − 𝑥 𝑖 𝛽) 1− 𝜌 2 −𝑙𝑜𝑔 𝜎 +𝑙𝑜𝑔 𝜙 𝑦 𝑖 − 𝑥 𝑖 𝛽 𝜎 𝑦 𝑖 >0 𝑙𝑜𝑔 Φ 𝑧 𝑖 𝛾+ 𝜌 𝜎 ( 𝑦 𝑖 − 𝑥 𝑖 𝛽) 1− 𝜌 2 −𝑙𝑜𝑔 𝜎 +𝑙𝑜𝑔 𝜙 𝑦 𝑖 − 𝑥 𝑖 𝛽 𝜎

Double Hurdle Model 𝑥 𝑖 𝛽+ 𝜖 𝑖 models the quantity equation 𝑧 𝑖 𝛾+ 𝑢 𝑖 models the participation equation The command estimates 𝛽,𝛾,𝜌, 𝑎𝑛𝑑 𝜎 where 𝜎=𝑉𝑎𝑟(𝜖) Restriction: 𝑉𝑎𝑟 𝑢 =1 the model to be identified

Double Hurdle Model: Stata

Double Hurdle Model

Example: The use of the dblhurdle command using smoke Example: The use of the dblhurdle command using smoke.dta from Wooldridge (2010).

Marginal effects The number of years of schooling (educ) on: 1. The probability of smoking 2. The expected number of cigarettes smoked given that you smoke 3. The expected number of cigarettes smoked

Prediction ppar - the probability of being away from the corner conditional on the covariates: ycond - expectation: yexpected - expected value of y conditional on x and z:

Marginal effects

Marginal effects

Marginal effects

Monte Carlo simulation: Finite sample properties of the estimator Three measures of performance: The mean of the estimated parameters should be close to their true values. The mean standard error of the estimated parameters over the repetitions should be close to the standard deviation of the point estimates. The rejection rate of hypothesis tests should be close to the nominal size of the test.

Monte Carlo simulation The data-generating process can be summarized as follows:

Monte Carlo simulation A dataset of 2,000 observations was created. The x’s were drawn from a standard normal distribution, and the d’s were drawn from a Bernoulli with p = 1/2. Refer to this dataset as “base”. Iteration of the simulation: 1. Use “base”. 2. For each observation, draw (gen) 𝜖 from a standard normal. 3. For each observation, draw (gen) u from a standard normal. 4. For each observation, compute y according to the data-generating process presented above. 5. Fit the model, and save the values of interest with post.

Monte Carlo simulation

Monte Carlo simulation A less intuitive issue: The set of regressors in the participation equation=the set of regressors of the quantity equation. The model is weakly identified. The data-generating process:

Monte Carlo simulation

Conclusion Researchers may consider dblhurdle when using tobit model Its flexibility allows the researcher to break down the modeled quantity along two useful dimensions, the “quantity” dimension and the “participation” dimension The command presented in this article only allows for a single corner in the data One desirable feature to add is the capability to handle dependent variables with two corners