Parametric modelling of cost data: some simulation evidence Andrew Briggs University of Oxford Richard Nixon MRC Biostatistics Unit, Cambridge Simon Dixon University of Sheffield Simon Thompson MRC Biostatistics Unit, Cambridge 2003 CHEBS Seminar, Friday 7 th November
Parametric modelling of cost data: Background Cost data are typically non-normally distributed, with high skew and kurtosis Arithmetic mean cost is of interest to policy makers Central Limit Theorem ensures sample mean is consistent estimator Commentators have proposed parametric modelling of cost data to improve efficiency In particular, Lognormal distribution commonly advocated Alternatively, Gamma distribution is an increasingly popular choice
Parametric modelling of cost data: Choice of estimator If data are Lognormal an efficient estimator of mean cost is: exp(lm+lv/2) If data are Gamma distributed the maximum likelihood estimate of the population mean is the sample mean
Parametric distributions: Simulation experiment Lognormal / Gamma distributions Population mean was set to be 1000 Five choices of coefficient of variation (CoV = 0.25, 0.5, 1.0, 1.5, 2.0) to define distribution parameters Samples of five different sizes (n = 20, 50, 200, 500, 2000) drawn from each distribution for each CoV 2 x 5 x 5 = 50 experiments Bias, coverage probability and RMSE all recorded
Parametric distributions: Distribution sets
Parametric distributions: Estimated RMSE from simulations
Parametric distributions: Estimated coverage probabilities
Empirical cost distributions: Summary statistics for 3 data sets Raw cost Log transformed cost
Empirical cost distributions: Data set 1: CPOU Raw cost Log transformed cost
Empirical cost distributions: Data set 2: IV Fluids Raw cost Log transformed cost
Empirical cost distributions: Data set 3: Paramedics Raw cost Log transformed cost
Empirical cost data sets: Simulation results
Parametric cost modelling: Comments & conclusions “All models are wrong” (Box 1976) “No data are normally distributed” (Nester 1996) Costs are estimated from resource use times unit cost Any parametric assumption relating to costs is at best an approximation Simulations confirm that there are efficiency gains if appropriate distribution is chosen But incorrect assumptions can lead to very misleading conclusions Sample mean performs well and is unlikely to lead to inappropriate inference Only when there are sufficient data to permit detailed modelling is the choice of an alternative estimator warrented