Beyond the Two-Part Model: Methods for Handling Truncated and Skewed Dependent Variables Eric P. Slade, PhD 1,2 1.VISN5 Capitol Network Mental Illness.

Beyond the Two-Part Model: Methods for Handling Truncated and Skewed Dependent Variables Eric P. Slade, PhD 1,2 1.VISN5 Capitol Network Mental Illness Research and Education Clinical Center (VISN5 MIRECC), Baltimore, MD 2.University of Maryland, School of Medicine, Baltimore, MD

Background Dependent variables with truncated and/or skewed distributions are common in health economics. Dependent variables with truncated and/or skewed distributions are common in health economics. Health care expenditures or costs Health care expenditures or costs Scores on symptom or physical function scales Scores on symptom or physical function scales Number of service visits or days of care Number of service visits or days of care

Background Distributions often have mass points at zero and long right tails. Distributions often have mass points at zero and long right tails. Normal Distribution

Background What’s the Objective? What’s the Objective? To estimate a marginal effect (M.E.) or an elasticity, often evaluated at the means of all covariates. To estimate a marginal effect (M.E.) or an elasticity, often evaluated at the means of all covariates.

Background Other evaluation points (besides the mean) might also be of interest. Other evaluation points (besides the mean) might also be of interest. For example, might want to know the effect on the expenditure of low income consumers, if their co- payment changed from $0 to $40. For example, might want to know the effect on the expenditure of low income consumers, if their co- payment changed from $0 to $40.

Background What functional form should I use to estimate these effects? What functional form should I use to estimate these effects? Specification for Y Specification for Y Mass points create non-linearities. Mass points create non-linearities. Skewness of Y may cause violation of homoskedasticity assumption. Skewness of Y may cause violation of homoskedasticity assumption. Specification for the x’s Specification for the x’s Over the years, several estimators have been proposed. Over the years, several estimators have been proposed.

Background Tobit Tobit Assumes Y* is normally distributed but partially observed. Assumes Y* is normally distributed but partially observed. Observed Y is truncated/censored at Y*<c. Observed Y is truncated/censored at Y*<c. Coefficients may be biased if normality or homoskedasticity assumption fails. Coefficients may be biased if normality or homoskedasticity assumption fails. Same linear index (X’b) characterizes all Y values Same linear index (X’b) characterizes all Y values

Background 2-Part Model 2-Part Model Addresses mass point and non- linearity at Y=0. Addresses mass point and non- linearity at Y=0. Part 1: Use probit to model whether (or not) Y>0. Part 1: Use probit to model whether (or not) Y>0. Part 2: Use ordinary least squares (OLS) to model Y conditional on Y>0 Part 2: Use ordinary least squares (OLS) to model Y conditional on Y>0 Untransformed Y Untransformed Y Log-transformed ln(Y) Log-transformed ln(Y)

Background 2-Part model (cont.) 2-Part model (cont.) Heteroskedasticity can result in bias. Heteroskedasticity can result in bias. Even log-transformed estimates of E(Y) can be imprecise. Even log-transformed estimates of E(Y) can be imprecise. Also, estimated M.E.’s apply to log-scaled Y, requiring retransformation of variance to untransformed Y scale. Also, estimated M.E.’s apply to log-scaled Y, requiring retransformation of variance to untransformed Y scale. See Mullahy JHE 17(3), 1998; Manning JHE 17(3), 1998. See Mullahy JHE 17(3), 1998; Manning JHE 17(3), 1998.

Background Generalized Linear Model (GLM) with a “log link” (Manning and Mullahy, JHE 20(4), 2001) Generalized Linear Model (GLM) with a “log link” (Manning and Mullahy, JHE 20(4), 2001) 2-part model with probit for part 1, or 2-part model with probit for part 1, or 1-part model using only obs. with y>0. 1-part model using only obs. with y>0. ln[E(Y | X)] = Xβ, Y>0. ln[E(Y | X)] = Xβ, Y>0. No “re-transformation” of the variance required. No “re-transformation” of the variance required. More precise than OLS when the data are generated by the assumed dist’n. More precise than OLS when the data are generated by the assumed dist’n.

Background One drawback to GLM is that E(Y | X) is non-linear in parameters. One drawback to GLM is that E(Y | X) is non-linear in parameters. How should we specify the X’s? How should we specify the X’s? Imprecision (large standard errors) can also be a problem if the distribution generating the data is misspecified. Imprecision (large standard errors) can also be a problem if the distribution generating the data is misspecified.

CDE Intro Conditional Density Estimation (CDE) Conditional Density Estimation (CDE) Gilleskie, D.B. and Mroz, T.A. 2004. A flexible approach for estimating the effects of covariates on health expenditures. J. Health Econ, 23(2), pp. 217-420. Gilleskie, D.B. and Mroz, T.A. 2004. A flexible approach for estimating the effects of covariates on health expenditures. J. Health Econ, 23(2), pp. 217-420. Let the data tell you the model! Let the data tell you the model! CDE allows us to construct a non-linear empirical approximation to: CDE allows us to construct a non-linear empirical approximation to:

CDE Intro The CDE approximates E(Y | x) with: The CDE approximates E(Y | x) with: The approximate value of y drawn from the k th interval of Y.  Does not depend on x. The approximate prob. that y is from the k th interval of Y.  Depends on x. K is the total number of intervals supporting the distribution of Y.

CDE Method Conceptually, 4 Steps are required to construct the approximation. Conceptually, 4 Steps are required to construct the approximation. Step 1: Discretize the support of Y into K intervals. Step 1: Discretize the support of Y into K intervals.

CDE Method p[y k-1 ≤ Y < y k | x]

CDE Method CDE Method K trades off smoothness against fit. K trades off smoothness against fit. As K ↑, estimated function better predicts observed values y (i.e., better fit). As K ↑, estimated function better predicts observed values y (i.e., better fit). However, higher K means fewer data points available for estimation in each interval k. However, higher K means fewer data points available for estimation in each interval k. Potential over-fitting Potential over-fitting Some intervals may have relatively large standard errors. Some intervals may have relatively large standard errors.

CDE Method 4 steps (cont.) 4 steps (cont.) Step 2: Estimate the conditional density: Step 2: Estimate the conditional density: Pr(y k-1 < Y < y k | x), for all k = 1 to K.

By Bayes’ Rule: p[y k-1 ≤ Y < y k | x] = p[y k-1 ≤ Y < y k | x, Y ≥ y k-1 ] × p[Y ≥ y k-1 | x] CDE Method Step 2: Estimate the conditional density (cont.) Step 2: Estimate the conditional density (cont.) The “Hazard Rate Decomposition” The “Hazard Rate Decomposition” 1- p[Y < y k-1 | x]

CDE Method p[Y ≥ y k-1 | x] is the prob. represented by the shaded area to the right of AB, OR 1 minus the prob. represented by the area to the left of AB. A B

CDE Method HAZARD RATE DECOMPOSITION p[y k-1 ≤ Y < y k | x, Y ≥ y k-1 ] = λ(k,x) is the “Discrete Time Hazard” Function The probability that an observation on Y is drawn from the k th interval, given that it is not drawn from the first k-1 intervals.

CDE Method p[y k-1 ≤ Y < y k | x, Y ≥ y k-1 ] is the prob. represented by the area of ABCD as a fraction of the entire shaded area to the right of AB A B C D

CDE Method (4) So, HAZARD RATE DECOMPOSITION The probability that an observation on Y is not drawn from the first k-1 intervals.

CDE Method 4 steps (cont.) 4 steps (cont.) Step 3: k, k = 1 to K. Step 3: Approximate y in interval k, k = 1 to K. Specify: Specify: h* is fixed for all values of x within the k th interval, i.e., the estimate of y is a constant.

CDE Method (6) Therefore the CDE approximation of E(Y|x) is: The approximation works like a random histogram, conditional on K and x.

CDE Estimation CDE Estimation SUMMARY (5 Steps) (1) Choose K, the number of intervals, and (2) the interval boundaries. (1) Choose K, the number of intervals, and (2) the interval boundaries. G&M treat mass point at 0 separately. G&M treat mass point at 0 separately. G&M impose the restriction that all K intervals must have an equal number of observations with positive values for Y. G&M impose the restriction that all K intervals must have an equal number of observations with positive values for Y. G&M propose a likelihood-based criterion for selecting K. G&M propose a likelihood-based criterion for selecting K. (3) Choose constants for h*(k), k=1 to K. (3) Choose constants for h*(k), k=1 to K. G&M use the mean of observed y in interval k. G&M use the mean of observed y in interval k.

CDE Estimation CDE Estimation (4) Decide how to approximate conditional densities. (4) Decide how to approximate conditional densities. G&M use a single flexibly specified logit. G&M use a single flexibly specified logit. The R.H.S. includes a transformation of the interval number k: α k = -ln(K-k) for k<K. The R.H.S. includes a transformation of the interval number k: α k = -ln(K-k) for k<K. R.H.S. also includes polynomials in the X’s and interactions between X’s and the α k ‘s R.H.S. also includes polynomials in the X’s and interactions between X’s and the α k ‘s Standard significance testing (Wald) used to determine the order of polynomials. Standard significance testing (Wald) used to determine the order of polynomials.

CDE Estimation CDE Estimation (5) Decide how to approximate partial derivatives (marginal effects). (5) Decide how to approximate partial derivatives (marginal effects). G&M calculate “arc” derivatives. G&M calculate “arc” derivatives. Evaluate E(Y|x) at various values of the explanatory variables. Evaluate E(Y|x) at various values of the explanatory variables. The population average derivative is the average of the arc derivatives. The population average derivative is the average of the arc derivatives.

CDE Performance CDE Performance Monte Carlo simulations suggest that, for most data generating processes (DGP’s), CDE outperforms both GLM and a flexibly specified OLS one-part model. Monte Carlo simulations suggest that, for most data generating processes (DGP’s), CDE outperforms both GLM and a flexibly specified OLS one-part model. CDE’s advantage over GLM and OLS is greater the more heteroskedastic is the DGP. CDE’s advantage over GLM and OLS is greater the more heteroskedastic is the DGP.

CDE Performance CDE Performance The GLM model performs poorly when the model specification for explanatory variables is not sufficiently flexible. The GLM model performs poorly when the model specification for explanatory variables is not sufficiently flexible. Akaike Information Criterion produces too conservative a specification. Akaike Information Criterion produces too conservative a specification. OLS tends to outperform GLM, and performs almost as well as CDE. OLS tends to outperform GLM, and performs almost as well as CDE.

Summary Summary Extra effort required by CDE could be “worth it” if obtaining robust coefficient estimates is paramount. Extra effort required by CDE could be “worth it” if obtaining robust coefficient estimates is paramount. e.g., if accurate prediction really matters. e.g., if accurate prediction really matters. CDE offers the advantage of being able to simulate variation in M.E.’s at different points in the covariate distribution. CDE offers the advantage of being able to simulate variation in M.E.’s at different points in the covariate distribution. The CDE learning curve could be steep The CDE learning curve could be steep However, in the VA, there is a recurring need for robust cost estimation. However, in the VA, there is a recurring need for robust cost estimation.

Next Steps Next Steps Plan to use CDE in upcoming MHICM project. Plan to use CDE in upcoming MHICM project. Modify CDE for 2-stage estimation Modify CDE for 2-stage estimation CDE could be sensitive to outliers. CDE could be sensitive to outliers. Problem of over-fitting Problem of over-fitting Newer methods test for mixtures -- could identify outliers Newer methods test for mixtures -- could identify outliers

Beyond the Two-Part Model: Methods for Handling Truncated and Skewed Dependent Variables Eric P. Slade, PhD 1,2 1.VISN5 Capitol Network Mental Illness.

Similar presentations

Presentation on theme: "Beyond the Two-Part Model: Methods for Handling Truncated and Skewed Dependent Variables Eric P. Slade, PhD 1,2 1.VISN5 Capitol Network Mental Illness."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Beyond the Two-Part Model: Methods for Handling Truncated and Skewed Dependent Variables Eric P. Slade, PhD 1,2 1.VISN5 Capitol Network Mental Illness.

Similar presentations

Presentation on theme: "Beyond the Two-Part Model: Methods for Handling Truncated and Skewed Dependent Variables Eric P. Slade, PhD 1,2 1.VISN5 Capitol Network Mental Illness."— Presentation transcript:

Similar presentations

About project

Feedback