Modelling Cardinal Utilities from Ordinal Utility data: An exploratory analysis Peter Gilks, Chris McCabe, John Brazier, Aki Tsuchiya, Josh Solomon
Background Limitations of conventional methods of utility elicitation Early work suggesting ordinal data can predict cardinal preferences SF6D and HUI 2 surveys used ranking exercises as warm up prior to SG valuation tasks Opportunity to test and develop methods proposed by Solomon
SF-6D valuation data sets Ranked seven SF-6d states (including pits and full health) and death SG valuations of five states against full health and pits and then chained using valuation of pits against full health and death (respondents asked to confirm pits ranking against death) 611 respondents sampled from the general population 249 mean SG health states values ranging from.21 to.99; averaged 14 valuations per state
HUI2 valuation data set Ranked 9 HUI2 states (including pits and full health) and death SG valuations of 8 states against full health and death (respondents asked to confirm ranking of state against death) 198 respondents sampled from the general population 249 mean SG health states values ranging from to.77; averaged 24 valuations per state
Methods Aim: To model the predicted health state valuations using the ordinal preference data 1)Statistical model Conditional logistic regression (McFadden choice model) based on random utility theory (previous attempts used Thurstone’s Comparative Judgement Model) 2) Value function Relating the health state descriptive system to the utility value
The Statistical Model Respondent i has latent utility value for state j, U ij. Respondent will choose state j as best from a group of states k=1,…,n if U ij > U ik for all k j. Utility function U ij = μ j + ε ij. Where μ j represents the underlying tastes of the population and ε ij represents the peculiar choice of the individual. Odds of choosing state j over state k are exp{μ j – μ k } So we want to model the dependent variable μ against the dimensions of the descriptive systems: SF6D and HUI2.
Assumption: independence of irrelevant alternatives Model is based on assumption that the ranking exercise is equivalent to the respondent making a series of individual choices from smaller and smaller groups of states. For example, to rank 10 health states; Selects first preference from all 10, rank 1 Selects best from remaining 9, rank 2 Selects best from remaining 8, rank 3 and so on…… NB. This assumes that the choice over a given pair does not depend on the other alternatives available
Value function The expected value of each unobserved utility was assumed to be a linear function of the categorical ratings on the domains of each dataset respectively. The specifications are; For HUI2: μ = β 1 S2 + β 2 S3 + β 3 S4 + β 4 M2 + β 5 M3 + β 6 M4 + β 7 M5 + β 8 E2 + β 9 E3 + β 10 E4 + β 11 E5 + β 12 C2 + β 13 C3 + β 14 C4 + β 15 SC2 + β 16 SC3 + β 17 SC4 + β 18 P2 + β 19 P3 + β 20 P4 + β 21 P5 + β d Death For SF6D: μ = β 1 PF2 + β 2 PF3 + β 3 PF4 + β 4 PF5 + β 5 PF6 + β 6 RL2 +β 7 RL3 + β 8 RL4 + β 9 SF2 + β 10 SF3 + β 11 SF4 + β 12 SF5 + β 13 P2 + β 14 P3 + β 15 P4 + β 16 P5 + β 17 P6 + β 18 MH2 + β 19 MH3 + β 20 MH4 + β 21 MH5 + β 22 V2 + β 23 V3 + β 24 V4 + β 25 V5 +β d Death Note: no constant term and a coefficient for death! This facilitates re- scaling results on to the Full-Health Death (1,0) Scale.
Rescaling The scale of the latent variable μ is arbitrarily defined by the identifying assumptions in the model. 1)Normalise to observed SG scale (originally proposed by Josh Solomon) Multiply coefficients by the ratio: β ri = β i * min. obs. SG/ Predicted PITS value 2) Normalise to death β ri = β i / |β d | This anchors death at zero and perfect health at 1 NB. states can still be valued as worse than death.
Model Assessment Methods Main aim is to compare the predictive performance of the rank model and the original standard gamble model: Check coefficients for sign and consistency. Plot predictions against observed for rank model and SG model for both datasets. Statistical tests of predictive performance. Look for systematic patterns in the errors.
HUI2
Rank Model SG Model Smooth line = mean health state values ranked by severity Top line is predictions Bottom line is error. Mean values, predicted values and error (predict - mean) for Rank model including death and SG Model (OLS) HUI2
SF6D
Rank Model SG Model Mean values, predicted values and error (predict - mean) for Rank model including death and SG Model (6) SF6D Smooth line = means Top messy line is predictions Bottom messy lines is error. Both Models: Under predict large means Over predict low means
Summary of Findings Rank models able to predict actual mean SG health states nearly as well as the SG models – associated with modest increase in in MAE Evidence that it has produced less systematic error in SF-6D data set and improvements in consistency
Issues – taking results at face value Is the ranked model good enough? Could we start using it……… Given ranking is a warm up, results could be better if more care taken over this part of the exercise Ranked methods are probably cheaper What evidence is there that ranking exercises impose a lower cognitive burden? Seems to be higher levels of completion.
Issues – harder questions Is the selection process of the ranking task assumed by the model correct? Why should the relationship between the latent utility value and SG (in this case) cardinal values be linear? –What other functional forms might theory suggest? –Is the latent utility value similar to Dyer and Sarin’s ‘value function’ or something else? Does rank data elicit preferences or simply how good or bad a health state is, and does it matter?
Issues – the death question Not a major problem here because all mean health state values above zero The MVH EQ-5D data has been analysed in a similar way by Josh Solomon, but the ranking of death was very different to the implied ranking from the TTO – only state is ranked worse than death compared to 16/43 states by TTO! Ranked model normalised to death and full health does not predict TTO values worse than death very well
Further work – more suggestions welcome See how well SG data predicts ranking at the individual level Consider interactions Model different functional relationships between latent variable and SG examine completion rates and extent to which ranking will extend the vote to more vulnerable populations