The Art of Model Building and Statistical Tests
2 Outline The art of model building Using Software output The t-statistic The likelihood ratio test The use of goodness-of-fit Other tests Tests of model structure Test of the IIA assumption Test of taste variations Test of unequal variances (hetero scedasticity) Prediction tests Outlier analysis Market segment prediction tests Policy forecasting tests
Estimation Results for Trinomial Mode Choice Model Base Specification Ben Akiva Table 7.1
4 The Use of Goodness-of-Fit Measures Value of likelihood function The likelihood ratio index (rho-squared) Adjusted likelihood ratio index (rho-squared bar)
5 The Art of Model Building Informal tests of the coefficient estimates Signs and relative values Marginal rate of substitution Value of time
6 The t-Statistic The Likelihood Ratio Test 2 distributed k degrees of freedom = number of parameters in the model
T-statistic Discrete choice models use the t-ratio to determine if the statistic produced is statistically different from zero. Standard t-tests provide a significance level of rejecting the null hypothesis. The null hypotheses being that coefficients estimated are statistically different from zero. The t-values are placed in brackets next to each estimated coefficient in this thesis. A t-value of < ±2.56 rejects the null hypothesis at the 99% confidence level and value of between ± 2.56 and 1.96 is significant at the 95% confidence level. T-values of between ± 1.96 and 1.50 are significant at the 85% confidence level, and generally are left in the model but care should be taken when interpreting the results.
Likelihood ratio test The likelihood ratio test measures the performance of one model relative to another model. Typically in MNL modelling this test is used to compare models, one which may have additional variables included in the model, and the other without these variables. This statistic uses the measures of the difference between two models using the final likelihood statistics from both models and using the following formula: Here L* is the likelihood ratio and L(0) is the final likelihood of the base model, and L(β) is the final likelihood statistic from the model with different number of variables. A specified level of confidence is taken (0.1 or 0.5) with the given degrees of freedom from the chi-squared tables. If the estimated value of the chi-squared exceeds the critical value of the specified level of confidence, the null hypothesis is rejected. That is, the L(0) has a better model fit than L(β).
9 The Likelihood Ratio Test (continued) -2( ) = 2 distributed k U - k R degrees of freedom
10 Testing new variables Ben-Akiva Table 7.3
11 The Use of Goodness-of-Fit Measure 2 = 2 bar -> 12 = 14 = 15 = 0 -2( ) = 4.2 < 6.25 = c.v. 90%,3 The Likelihood Ratio Test
12 Test of Nonlinear Specifications Ben Akiva Table 7.6
13 Disutility of travel Time Test of Nonlinear Specifications (continued) Piecewise linear approximation Nonlinear specification x travel time 2
14 Test of Nonlinear Specifications (continued)
15 Estimation Problems Use of too many alternative-specific constants Incorrect specification of socioeconomic variables Perfect collinearity of variables Models with one or more unbounded coefficients
16 Constrained Estimation Inequality constraints Fixed value constraints Linear constraints Assumed value of time
17 Test of Model Structure The key question Is a multinomial logit model appropriate for the current data set? Or Are the basic assumptions required for the MNL structure true for the data? Basic MNL assumptions IIA – Independence from irrelevant alternatives (Test nested structures) No random taste variations (all significant differences in tastes are captured by socioeconomic variables) – means and variances are constant
18 Taste Variation Test General strategy – estimate models for market segments and compare with full data set model Market segments Groups based on socioeconomic variables – household size, income ranges, etc. Null hypothesis – the vector of estimated coefficients for each subset, j, equals F Likelihood ratio test statistic -2[L F ( F )- i L i ( i )] distributed as 2 with i k i - k F, where k i is the number of coefficients in the model for subset i, and k F is the number for the full model ^^
Estimation Results for Trinomial Mode Choice Model Market Segmentation by Income
Taste Variation Test
21 Other Applications of Market Segmentation Tests Life-cycle/life-style groups Tests of transferability – models based on data from different years or urban areas, to determine stability of policy-relevant coefficients Unrestricted model – collection of models estimated for separate data sets Restricted model – combined data sets, critical coefficients constrained to single values