Beyond the Two-Part Model: Methods for Handling Truncated and Skewed Dependent Variables Eric P. Slade, PhD 1,2 1.VISN5 Capitol Network Mental Illness.

Slides:

Advertisements

Similar presentations

The Simple Regression Model

Advertisements

Brief introduction on Logistic Regression

Managerial Economics Estimation of Demand

Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.

Inference for Regression

Models with Discrete Dependent Variables

1 A REVIEW OF QUME 232  The Statistical Analysis of Economic (and related) Data.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. Lecture 4: Mathematical Tools for Econometrics Statistical Appendix (Chapter 3.1–3.2)

The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.

The Simple Linear Regression Model: Specification and Estimation

Multiple Linear Regression Model

Binary Response Lecture 22 Lecture 22.

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.

Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.

Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.

The Simple Regression Model

1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.

So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.

1 MF-852 Financial Econometrics Lecture 6 Linear Regression I Roy J. Epstein Fall 2003.

Topic 3: Regression.

The Simple Regression Model

Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.

Modelling health care costs: practical examples and applications Andrew Briggs Philip Clarke University of Oxford & Daniel Polsky Henry Glick University.

Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.

Classification and Prediction: Regression Analysis

MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.

Nonlinear Regression Functions

Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.

Hypothesis Testing in Linear Regression Analysis

Cost as the Dependent Variable (II) Paul G. Barnett, PhD VA Health Economics Resource Center.

Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.

Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.

What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.

Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Managerial Economics Demand Estimation & Forecasting.

Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.

Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.

“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington.

Issues in Estimation Data Generating Process:

Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe.

Chapter 13: Limited Dependent Vars. Zongyi ZHANG College of Economics and Business Administration.

Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Econometrics Course: Cost as the Dependent Variable (I) Paul G. Barnett, PhD November 20, 2013.

Principles of Econometrics, 4t h EditionPage 1 Chapter 8: Heteroskedasticity Chapter 8 Heteroskedasticity Walter R. Paczkowski Rutgers University.

6. Simple Regression and OLS Estimation Chapter 6 will expand on concepts introduced in Chapter 5 to cover the following: 1) Estimating parameters using.

Analysis of Experimental Data IV Christoph Engel.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 4-1 Basic Mathematical tools Today, we will review some basic mathematical tools. Then we.

11.1 Heteroskedasticity: Nature and Detection Aims and Learning Objectives By the end of this session students should be able to: Explain the nature.

The Probit Model Alexander Spermann University of Freiburg SS 2008.

Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.

Lecturer: Ing. Martina Hanová, PhD..  How do we evaluate a model?  How do we know if the model we are using is good?  assumptions relate to the (population)

Estimating standard error using bootstrap

Lecture 6 Feb. 2, 2015 ANNOUNCEMENT: Lab session will go from 4:20-5:20 based on the poll. (The majority indicated that it would not be a problem to chance,

Linear Regression with One Regression

Ch. 2: The Simple Regression Model

Limited Dependent Variables

Probability Theory and Parameter Estimation I

The Simple Linear Regression Model: Specification and Estimation

Drop-in Sessions! When: Hillary Term - Week 1 Where: Q-Step Lab (TBC) Sign up with Alice Evans.

Ch. 2: The Simple Regression Model

CHAPTER 29: Multiple Regression*

Multiple Regression Analysis: Further Issues

Interval Estimation and Hypothesis Testing

Chapter 7: The Normality Assumption and Inference with OLS

The Simple Regression Model

Presentation transcript:

Beyond the Two-Part Model: Methods for Handling Truncated and Skewed Dependent Variables Eric P. Slade, PhD 1,2 1.VISN5 Capitol Network Mental Illness Research and Education Clinical Center (VISN5 MIRECC), Baltimore, MD 2.University of Maryland, School of Medicine, Baltimore, MD

Background Dependent variables with truncated and/or skewed distributions are common in health economics. Dependent variables with truncated and/or skewed distributions are common in health economics. Health care expenditures or costs Health care expenditures or costs Scores on symptom or physical function scales Scores on symptom or physical function scales Number of service visits or days of care Number of service visits or days of care

Background Distributions often have mass points at zero and long right tails. Distributions often have mass points at zero and long right tails. Normal Distribution

Background What’s the Objective? What’s the Objective? To estimate a marginal effect (M.E.) or an elasticity, often evaluated at the means of all covariates. To estimate a marginal effect (M.E.) or an elasticity, often evaluated at the means of all covariates.

Background Other evaluation points (besides the mean) might also be of interest. Other evaluation points (besides the mean) might also be of interest. For example, might want to know the effect on the expenditure of low income consumers, if their co- payment changed from $0 to $40. For example, might want to know the effect on the expenditure of low income consumers, if their co- payment changed from $0 to $40.

Background What functional form should I use to estimate these effects? What functional form should I use to estimate these effects? Specification for Y Specification for Y Mass points create non-linearities. Mass points create non-linearities. Skewness of Y may cause violation of homoskedasticity assumption. Skewness of Y may cause violation of homoskedasticity assumption. Specification for the x’s Specification for the x’s Over the years, several estimators have been proposed. Over the years, several estimators have been proposed.

Background Tobit Tobit Assumes Y* is normally distributed but partially observed. Assumes Y* is normally distributed but partially observed. Observed Y is truncated/censored at Y*<c. Observed Y is truncated/censored at Y*<c. Coefficients may be biased if normality or homoskedasticity assumption fails. Coefficients may be biased if normality or homoskedasticity assumption fails. Same linear index (X’b) characterizes all Y values Same linear index (X’b) characterizes all Y values

Background 2-Part Model 2-Part Model Addresses mass point and non- linearity at Y=0. Addresses mass point and non- linearity at Y=0. Part 1: Use probit to model whether (or not) Y>0. Part 1: Use probit to model whether (or not) Y>0. Part 2: Use ordinary least squares (OLS) to model Y conditional on Y>0 Part 2: Use ordinary least squares (OLS) to model Y conditional on Y>0 Untransformed Y Untransformed Y Log-transformed ln(Y) Log-transformed ln(Y)

Background 2-Part model (cont.) 2-Part model (cont.) Heteroskedasticity can result in bias. Heteroskedasticity can result in bias. Even log-transformed estimates of E(Y) can be imprecise. Even log-transformed estimates of E(Y) can be imprecise. Also, estimated M.E.’s apply to log-scaled Y, requiring retransformation of variance to untransformed Y scale. Also, estimated M.E.’s apply to log-scaled Y, requiring retransformation of variance to untransformed Y scale. See Mullahy JHE 17(3), 1998; Manning JHE 17(3), See Mullahy JHE 17(3), 1998; Manning JHE 17(3), 1998.

Background Generalized Linear Model (GLM) with a “log link” (Manning and Mullahy, JHE 20(4), 2001) Generalized Linear Model (GLM) with a “log link” (Manning and Mullahy, JHE 20(4), 2001) 2-part model with probit for part 1, or 2-part model with probit for part 1, or 1-part model using only obs. with y>0. 1-part model using only obs. with y>0. ln[E(Y | X)] = Xβ, Y>0. ln[E(Y | X)] = Xβ, Y>0. No “re-transformation” of the variance required. No “re-transformation” of the variance required. More precise than OLS when the data are generated by the assumed dist’n. More precise than OLS when the data are generated by the assumed dist’n.

Background One drawback to GLM is that E(Y | X) is non-linear in parameters. One drawback to GLM is that E(Y | X) is non-linear in parameters. How should we specify the X’s? How should we specify the X’s? Imprecision (large standard errors) can also be a problem if the distribution generating the data is misspecified. Imprecision (large standard errors) can also be a problem if the distribution generating the data is misspecified.

CDE Intro Conditional Density Estimation (CDE) Conditional Density Estimation (CDE) Gilleskie, D.B. and Mroz, T.A A flexible approach for estimating the effects of covariates on health expenditures. J. Health Econ, 23(2), pp Gilleskie, D.B. and Mroz, T.A A flexible approach for estimating the effects of covariates on health expenditures. J. Health Econ, 23(2), pp Let the data tell you the model! Let the data tell you the model! CDE allows us to construct a non-linear empirical approximation to: CDE allows us to construct a non-linear empirical approximation to:

CDE Intro The CDE approximates E(Y | x) with: The CDE approximates E(Y | x) with: The approximate value of y drawn from the k th interval of Y.  Does not depend on x. The approximate prob. that y is from the k th interval of Y.  Depends on x. K is the total number of intervals supporting the distribution of Y.

CDE Method Conceptually, 4 Steps are required to construct the approximation. Conceptually, 4 Steps are required to construct the approximation. Step 1: Discretize the support of Y into K intervals. Step 1: Discretize the support of Y into K intervals.

CDE Method p[y k-1 ≤ Y < y k | x]

CDE Method CDE Method K trades off smoothness against fit. K trades off smoothness against fit. As K ↑, estimated function better predicts observed values y (i.e., better fit). As K ↑, estimated function better predicts observed values y (i.e., better fit). However, higher K means fewer data points available for estimation in each interval k. However, higher K means fewer data points available for estimation in each interval k. Potential over-fitting Potential over-fitting Some intervals may have relatively large standard errors. Some intervals may have relatively large standard errors.

CDE Method 4 steps (cont.) 4 steps (cont.) Step 2: Estimate the conditional density: Step 2: Estimate the conditional density: Pr(y k-1 < Y < y k | x), for all k = 1 to K.

By Bayes’ Rule: p[y k-1 ≤ Y < y k | x] = p[y k-1 ≤ Y < y k | x, Y ≥ y k-1 ] × p[Y ≥ y k-1 | x] CDE Method Step 2: Estimate the conditional density (cont.) Step 2: Estimate the conditional density (cont.) The “Hazard Rate Decomposition” The “Hazard Rate Decomposition” 1- p[Y < y k-1 | x]

CDE Method p[Y ≥ y k-1 | x] is the prob. represented by the shaded area to the right of AB, OR 1 minus the prob. represented by the area to the left of AB. A B

CDE Method HAZARD RATE DECOMPOSITION p[y k-1 ≤ Y < y k | x, Y ≥ y k-1 ] = λ(k,x) is the “Discrete Time Hazard” Function The probability that an observation on Y is drawn from the k th interval, given that it is not drawn from the first k-1 intervals.

CDE Method p[y k-1 ≤ Y < y k | x, Y ≥ y k-1 ] is the prob. represented by the area of ABCD as a fraction of the entire shaded area to the right of AB A B C D

CDE Method (4) So, HAZARD RATE DECOMPOSITION The probability that an observation on Y is not drawn from the first k-1 intervals.

CDE Method 4 steps (cont.) 4 steps (cont.) Step 3: k, k = 1 to K. Step 3: Approximate y in interval k, k = 1 to K. Specify: Specify: h* is fixed for all values of x within the k th interval, i.e., the estimate of y is a constant.

CDE Method (6) Therefore the CDE approximation of E(Y|x) is: The approximation works like a random histogram, conditional on K and x.

CDE Estimation CDE Estimation SUMMARY (5 Steps) (1) Choose K, the number of intervals, and (2) the interval boundaries. (1) Choose K, the number of intervals, and (2) the interval boundaries. G&M treat mass point at 0 separately. G&M treat mass point at 0 separately. G&M impose the restriction that all K intervals must have an equal number of observations with positive values for Y. G&M impose the restriction that all K intervals must have an equal number of observations with positive values for Y. G&M propose a likelihood-based criterion for selecting K. G&M propose a likelihood-based criterion for selecting K. (3) Choose constants for h*(k), k=1 to K. (3) Choose constants for h*(k), k=1 to K. G&M use the mean of observed y in interval k. G&M use the mean of observed y in interval k.

CDE Estimation CDE Estimation (4) Decide how to approximate conditional densities. (4) Decide how to approximate conditional densities. G&M use a single flexibly specified logit. G&M use a single flexibly specified logit. The R.H.S. includes a transformation of the interval number k: α k = -ln(K-k) for k<K. The R.H.S. includes a transformation of the interval number k: α k = -ln(K-k) for k<K. R.H.S. also includes polynomials in the X’s and interactions between X’s and the α k ‘s R.H.S. also includes polynomials in the X’s and interactions between X’s and the α k ‘s Standard significance testing (Wald) used to determine the order of polynomials. Standard significance testing (Wald) used to determine the order of polynomials.

CDE Estimation CDE Estimation (5) Decide how to approximate partial derivatives (marginal effects). (5) Decide how to approximate partial derivatives (marginal effects). G&M calculate “arc” derivatives. G&M calculate “arc” derivatives. Evaluate E(Y|x) at various values of the explanatory variables. Evaluate E(Y|x) at various values of the explanatory variables. The population average derivative is the average of the arc derivatives. The population average derivative is the average of the arc derivatives.

CDE Performance CDE Performance Monte Carlo simulations suggest that, for most data generating processes (DGP’s), CDE outperforms both GLM and a flexibly specified OLS one-part model. Monte Carlo simulations suggest that, for most data generating processes (DGP’s), CDE outperforms both GLM and a flexibly specified OLS one-part model. CDE’s advantage over GLM and OLS is greater the more heteroskedastic is the DGP. CDE’s advantage over GLM and OLS is greater the more heteroskedastic is the DGP.

CDE Performance CDE Performance The GLM model performs poorly when the model specification for explanatory variables is not sufficiently flexible. The GLM model performs poorly when the model specification for explanatory variables is not sufficiently flexible. Akaike Information Criterion produces too conservative a specification. Akaike Information Criterion produces too conservative a specification. OLS tends to outperform GLM, and performs almost as well as CDE. OLS tends to outperform GLM, and performs almost as well as CDE.

Summary Summary Extra effort required by CDE could be “worth it” if obtaining robust coefficient estimates is paramount. Extra effort required by CDE could be “worth it” if obtaining robust coefficient estimates is paramount. e.g., if accurate prediction really matters. e.g., if accurate prediction really matters. CDE offers the advantage of being able to simulate variation in M.E.’s at different points in the covariate distribution. CDE offers the advantage of being able to simulate variation in M.E.’s at different points in the covariate distribution. The CDE learning curve could be steep The CDE learning curve could be steep However, in the VA, there is a recurring need for robust cost estimation. However, in the VA, there is a recurring need for robust cost estimation.

Next Steps Next Steps Plan to use CDE in upcoming MHICM project. Plan to use CDE in upcoming MHICM project. Modify CDE for 2-stage estimation Modify CDE for 2-stage estimation CDE could be sensitive to outliers. CDE could be sensitive to outliers. Problem of over-fitting Problem of over-fitting Newer methods test for mixtures -- could identify outliers Newer methods test for mixtures -- could identify outliers