Econometrics in Health Economics Discrete Choice Modeling and Frontier Modeling and Efficiency Estimation Professor William Greene Stern School of Business.

Econometrics in Health Economics Discrete Choice Modeling and Frontier Modeling and Efficiency Estimation Professor William Greene Stern School of Business New York University September 2-4, 2007

Frontier and Efficiency Estimation  Session 5 Efficiency Analysis Stochastic Frontier Model Efficiency Estimation  Session 6 Panel Data Models and Heterogeneity Fixed and Random Effects Bayesian and Classical Estimation  Session 7 Efficiency Models Stochastic Frontier and Data Envelopment Analysis Student Presentation: Silvio Daidone and Francesco D’Amico  Session 8: Computer Exercises and Applications

The Production Function “A single output technology is commonly described by means of a production function f(z) that gives the maximum amount q of output that can be produced using input amounts (z 1,…,z L-1 ) > 0. “Microeconomic Theory,” Mas-Colell, Whinston, Green: Oxford, 1995, p. 129. See also Samuelson (1938) and Shephard (1953).

Thoughts on Inefficiency Failure to achieve the theoretical maximum  Hicks (ca. 1935) on the benefits of monopoly  Leibenstein (ca. 1966): X inefficiency  Debreu, Farrell (1950s) on management inefficiency All related to firm behavior in the absence of market restraint – the exercise of market power.

A History of Empirical Investigation  Cobb-Douglas (1927)  Arrow, Chenery, Minhas, Solow (1963)  Joel Dean (1940s, 1950s)  Johnston (1950s)  Nerlove (1960)  Christensen et al. (1972)

Inefficiency in the “Real” World Measurement of inefficiency in “markets” – heterogeneous production outcomes:  Aigner and Chu (1968)  Timmer (1971)  Aigner, Lovell, Schmidt (1977)  Meeusen, van den Broeck (1977)

Production Functions Production is a process of transformation of a set of inputs, denoted x  into a set of outputs, y  Transformation of inputs to outputs is via the transformation function: T(y,x) = 0.

Defining the Production Set Level set: The Production function is defined by the isoquant The efficient subset is defined in terms of the level sets:

Isoquants and Level Sets

The Distance Function

Inefficiency

Production Function Model with Inefficiency

Cost Inefficiency y* = f(x)  C* = g(y*,w) (Samuelson – Shephard duality results) Cost inefficiency: If y < f(x), then C must be greater than g(y,w). Implies the idea of a cost frontier. lnC = lng(y,w) + u, u > 0.

Specification

Corrected Ordinary Least Squares

Modified OLS An alternative approach that requires a parametric model of the distribution of u i is modified OLS (MOLS). The OLS residuals, save for the constant displacement, are pointwise consistent estimates of their population counterparts, - u i. suppose that u i has an exponential distribution with mean λ. Then, the variance of u i is λ 2, so the standard deviation of the OLS residuals is a consistent estimator of E[u i ] = λ. Since this is a one parameter distribution, the entire model for u i can be characterized by this parameter and functions of it. The estimated frontier function can now be displaced upward by this estimate of E[u i ].

COLS and MOLS

Deterministic Frontier: Programming Estimators

Estimating Inefficiency

Statistical Problems with Programming Estimators  They do correspond to MLEs.  The likelihood functions are “irregular”  There are no known statistical properties – no estimable covariance matrix for estimates.  They might be “robust,” like LAD. Noone knows for sure. Never demonstrated.

A Model with a Statistical Basis

Extensions  Cost frontiers, based on duality results: ln y = f(x) – u  ln C = g(y,w) + u’ u > 0. u’ > 0. Economies of scale and allocative inefficiency blur the relationship.  Corrected and modified least squares estimators based on the deterministic frontiers are easily constructed.

Data Envelopment Analysis

Methodological Problems  Measurement error  Outliers  Specification errors  The overall problem with the deterministic frontier approach

Stochastic Frontier Models  Motivation: Factors not under control of the firm Measurement error Differential rates of adoption of technology  frontier is randomly placed by the whole collection of stochastic elements which might enter the model outside the control of the firm.  Aigner, Lovell, Schmidt (1977), Meeusen, van den Broeck (1977)

Stochastic Frontier Model u i > 0, but v i may take any value. A symmetric distribution, such as the normal distribution, is usually assumed for v i. Thus, the stochastic frontier is  +  ’x i +v i and, as before, u i represents the inefficiency.

Least Squares Estimation Average inefficiency is embodied in the third moment of the disturbance ε i = v i - u i. So long as E[v i - u i ] is constant, the OLS estimates of the slope parameters of the frontier function are unbiased and consistent. (The constant term estimates α-E[u i ]. The average inefficiency present in the distribution is reflected in the asymmetry of the distribution, which can be estimated using the OLS residuals:

Application to Spanish Dairy Farms InputUnitsMeanStd. Dev. MinimumMaximum MilkMilk production (liters) 131,108 92,539 14,110727,281 Cows# of milking cows 2.12 11.27 4.5 82.3 Labor# man-equivalent units 1.67 0.55 1.0 4.0 LandHectares of land devoted to pasture and crops. 12.99 6.17 2.0 45.1 FeedTotal amount of feedstuffs fed to dairy cows (tons) 57,94147,9813,924.1 376,732 N = 247 farms, T = 6 years (1993-1998)

Example: Dairy Farms

The Normal-Half Normal Model

Normal-Half Normal Variable

Decomposition

Standard Form

Estimation: Least Squares/MoM  OLS estimator of β is consistent  E[u i ] = (2/π) 1/2 σ u, so OLS constant estimates α+ (2/π) 1/2 σ u  Second and third moments of OLS residuals estimate

A Problem with Method of Moments  Estimator of σ u is [m 3 /-.21801] 1/3  Theoretical m 3 is < 0  Sample m 3 may be > 0. If so, no solution for σ u. (Negative to 1/3 power.)

Likelihood Function Waldman (1982) result on skewness of OLS residuals: If the OLS residuals are positively skewed, rather than negative, then OLS maximizes the log likelihood, and there is no evidence of inefficiency in the data.

Alternative Model: Exponential

Normal-Exponential Likelihood

Truncated Normal Model

Normal-Truncated Normal

Other Models  Other Parametric Models (we will examine gamma later in the course)  Semiparametric and nonparametric – the recent outer reaches of the theoretical literature  Other variations including heterogeneity in the frontier function and in the distribution of inefficiency

Estimating u i  No direct estimate of u i  Data permit estimation of y i – β’x i. Can this be used? ε i = y i – β’x i = v i – u i Indirect estimate of u i, using E[u i |v i – u i ]  v i – u i is estimable with e i = y i – b’x i.

Fundamental Tool - JLMS We can insert our maximum likelihood estimates of all parameters. Note: This estimates E[u|v i – u i ], not u i.

Other Distributions

Efficiency

Application: Electricity Generation

Estimated Translog Production Frontiers

Inefficiency Estimates

Estimated Inefficiency Distribution

Confidence Region

Application (Based on Costs)

Multiple Output Frontier  The formal theory of production departs from the transformation function that links the vector of outputs, y to the vector of inputs, x; T(y,x) = 0.  As it stands, some further assumptions are obviously needed to produce the framework for an empirical model. By assuming homothetic separability, the function may be written in the form A(y) = f(x).

Multiple Output Production Function Inefficiency in this setting reflects the failure of the firm to achieve the maximum aggregate output attainable. Note that the model does not address the economic question of whether the chosen output mix is optimal with respect to the output prices and input costs. That would require a profit function approach. Berger (1993) and Adams et al. (1999) apply the method to a panel of U.S. banks – 798 banks, ten years.

Duality Between Production and Cost

Implied Cost Frontier Function

Stochastic Cost Frontier

Cobb-Douglas Cost Frontier

Translog Cost Frontier

Restricted Translog Cost Function

Cost Application to C&G Data

Estimates of Economic Efficiency

Duality – Production vs. Cost

Multiple Output Cost Frontier

Allocative Inefficiency and Economic Inefficiency Technical inefficiency: Off the isoquant. Allocative inefficiency: Wrong input mix.

Cost Structure – Demand System

Cost Frontier Model

The Greene Problem  Factor shares are derived from the cost function by differentiation.  Where does e k come from?  Any nonzero value of e k, which can be positive or negative, must translate into higher costs. Thus, u must be a function of e 1,…,e K such that ∂u/∂e k > 0  Noone had derived a complete, internally consistent equation system  the Greene problem.  Solution: Kumbhakar in several recent papers. Very complicated – near to impractical Apparently not of interest to practitioners

Observable Heterogeneity  As opposed to unobservable heterogeneity  Observe: Y or C (outcome) and X or w (inputs or input prices)  Firm characteristics z. Not production or cost, characterize the production process. Enter the production or cost function? Enter the inefficiency distribution? How?

Shifting the Outcome Function Firm specific heterogeneity can also be incorporated into the inefficiency model as follows: This modifies the mean of the truncated normal distribution y i =  x i + v i - u i v i ~ N[0,  v 2 ] u i = | Ui | where U i ~ N[  i,  u 2 ],  i =  0 +  1 z i,

Heterogeneous Mean

Estimated Efficiency

One Step or Two Step 2 Step: Fit Half or truncated normal model, compute JLMS u i, regress u i on z i Airline EXAMPLE: Fit model without POINTS, LOADFACTOR, STAGE 1 Step: Include z i in the model, compute u i including z i Airline example: Include 3 variables Methodological issue: Left out variables in two step approach.

WHO Health Care Study

Application: WHO Data

One vs. Two Step

Unobservable Heterogeneity  Parameters vary across firms Random variation (heterogeneity, not Bayesian) Variation partially explained by observable indicators  Continuous variation – random parameter models: Considered with panel data models  Latent class – discrete parameter variation

A Latent Class Model

Latent Class Application Banking Costs

Heteroscedasticity in v and/or u Var[v i | h i ] =  v 2 g v (h i,  ) =  vi 2 g v (h i,0) = 1, g v (h i,  ) = [exp(  T h i )] 2 Var[U i | h i ] =  u 2 gu(hi,  )=  ui 2 g u (h i,0) = 1, g u (h i,  ) = [exp(  T h i )] 2

Application: WHO Data

A “Scaling” Model

Model Extensions  Simulation Based Estimators Normal-Gamma Frontier Model Bayesian Estimation of Stochastic Frontiers  Similar Model Structures  Similar Estimation Methodologies  Similar Results

Normal-Gamma Very flexible model. VERY difficult log likelihood function. Bayesians love it. Conjugate functional forms for other model parts

Normal-Gamma Model z ~ N[-  i +  v 2 /  u,  v 2 ]. q(r,ε i ) is extremely difficult to compute

Normal-Gamma

Simulating the Likelihood  i = y i -  Tx i,  i = -  i -  v 2 /  u,  =  v, and P L =  (-  i /  ) and F q is a draw from the continuous uniform(0,1) distribution.

Application to C&G Data This is the standard data set for developing and testing Exponential, Gamma, and Bayesian estimators.

Application to C&G Data

Bayesian Estimation  Short history – first developed post 1995  Range of applications Largely replicated existing classical methods Recent applications have extended received approaches  Common features of the application

Bayesian Formulation of SF Model Normal – Exponential Model vi – ui = yi -  -  Txi. Estimation proceeds (in principle) by specifying priors over  = ( , ,  v,  u), then deriving inferences from the joint posterior p(  |data). In general, the joint posterior for this model cannot be derived in closed form, so direct analysis is not feasible. Using Gibbs sampling, and known conditional posteriors, it is possible use Markov Chain Monte Carlo (MCMC) methods to sample from the marginal posteriors and use that device to learn about the parameters and inefficiencies. In particular, for the model parameters, we are interested in estimating E[  |data], Var[  |data] and, perhaps even more fully characterizing the density f(  |data).

Estimating Inefficiency One might, ex post, estimate E[u i |data] however, it is more natural in this setting to include (u 1,...,u N ) with , and estimate the conditional means with those of the other parameters. The method is known as data augmentation.

Priors Over Parameters

Priors for Inefficiencies

Posterior

Gibbs Sampling: Conditional Posteriors

Bayesian Normal-Gamma Model  Tsionas (2002) Erlang form – Integer P “Random parameters” Applied to C&G  River Huang (2004) Fully general Applied (as usual) to C&G

Bayesian and Classical Results

Methodological Comparison  Bayesian vs. Classical Interpretation Practical results: Bernstein – von Mises Theorem in the presence of diffuse priors  Kim and Schmidt comparison (JPA, 2000)  Important difference – tight priors over u i in this context.  Conclusions?

Econometrics in Health Economics Discrete Choice Modeling and Frontier Modeling and Efficiency Estimation Professor William Greene Stern School of Business.

Similar presentations

Presentation on theme: "Econometrics in Health Economics Discrete Choice Modeling and Frontier Modeling and Efficiency Estimation Professor William Greene Stern School of Business."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Econometrics in Health Economics Discrete Choice Modeling and Frontier Modeling and Efficiency Estimation Professor William Greene Stern School of Business.

Similar presentations

Presentation on theme: "Econometrics in Health Economics Discrete Choice Modeling and Frontier Modeling and Efficiency Estimation Professor William Greene Stern School of Business."— Presentation transcript:

Similar presentations

About project

Feedback