Bootstrapping judgemental adjustments to improve forecasting accuracy - judgemental bootstraps vs error bootstraps Robert Fildes Centre for Forecasting, Lancaster University, UK Paul Goodwin Bath University, UK Research Grants GR/60181/01 and GR/60198/01
What is a bootstrap? X 1 X 2 X 3.. X p Actual Observations Y Expert’s Prediction F Error = Y-F Based on the cues {X} we derive predictions of Y
The Bootstrapping Literature Armstrong (2001): the bootstrap “provides more accurate forecasts than expert judgement” But the evidence is primarily cross-sectional Time series evidence mixed: –Armstrong summarizes early studies –Fildes & Fitzgerald, Economica (on Balance of Payments forecasts, 1983) –Fildes, JoF, 1991 (on construction industry forecasts) –Lawrence & O’Connor, Omega, 1996 (experimental evidence) Why the discrepancies?
Time Series Bootstrapping In cross-sectional studies –Cues are constrained, i.e. experts have cue information + priors –Priors may well contain no information (e.g. Linda of Tversky and Kahneman) In time-series studies –Models are constrained to include only data-based cues –Other cues available from the environment ‘news’, external info, internal organisational info Knowledge of ‘unique’ future events e.g: examination record credit score card Fildes & Fitzgerald data history Fildes GDP and new orders
Bias and Inefficiencies Bias –If the expert forecast is biased, i.e the mean error is non-zero, the bootstrap cannot be optimum Though it can be better than an alternative –Evidence suggest time series expert forecasts often biased Optimism bias of analysts Bias in sales forecasts (Mathews & Diamantopoulos, Lawrence et al) Inefficiencies –Where a cue variable (or missing variable) is mis-weighted in the judgement A bootstrap model can never be optimum ex post Conclusion: A time series bootstrap is unlikely to be optimum Potential to improve on a bootstrap
Company Evidence Data (4 U.K. based companies) 753 SKUs, Monthly –Company A: Major UK Manufacturer of Laundry, household cleaning and personal care products SKUs x 22 months -> 3012 triplets –Company B: Major International Pharmaceutical Manufacturer SKUs x 36 months -> 5428 triplets –Company C: Major International canned Food Manufacturer -296 SKUs x 20 months -> 2856 triplets 783 SKUs, Weekly –Company D: Major UK Retailer (over SKUs) weeks -> triplets The EPSRC Research Project Data collected on: Actuals, Statistical System forecast, and Final adjusted forecast
Checking for bias in the forecasts Statistical Issues For unbiasedness Errors heteroscedastic with outliers Can firms be pooled? Solutions Errors normalised by standard deviation of actuals and analysed by size of adjustment
But are the forecasts inefficient? - the cues: past actuals, past errors and the adjustments Final Forecasts are biased The error models (to overcome bias and inefficiencies): Efficiency = all available information is being used effectively i.e. the models have no explanatory power for the jth sku in the ith company To estimate, normalise, pool across sku, remove outliers, test for seasonality The result? The forecasts are inefficient & different companies embody different inefficiencies; R 2 low Positively adjusted forecasts are more inefficient Persistent optimism bias
Can we model the error to ensure an efficient forecast? improved forecasts The models: This last- the 50/50 model: Blattberg & Hoch We can then use these models to predict the actual and compare with the final forecast = 1*(SysFor)+1*Adjust NB. Standard bootstrap without incorporating the information in ‘Adjust’ cannot perform well from the efficiency evidence.
Weighting the Information Sources Note how close to 1,1, the final forecast Note how close to 1, 0.5, Man model Major mis-weightings
Comparative Results: Overall gains, - Major gains with some companies (particularly retailer) Comparative Results: Overall gains, - Major gains with some companies (particularly retailer) To test: split sample – test sample results Accuracy measures: Trimmed MAPE & MdAPE + ranking of these measures for each company
The Results Consistency over estimation and validation samples Optimism bias in ‘final forecast’ ensures standard bootstrap inadequate for positive info Effective use of negative information implies Blattberg-Hoch fails Optimal bootstrap consistently effective Final forecast ‘good’ for manufacturers and negative info Different companies have different propensities for gain Overall, the adjustment models perform well - substantial improvements are possible - accuracy gains much larger (as high as 20%) than shown in statistical selection comparisons (M3 Overall, the adjustment models perform well - substantial improvements are possible - accuracy gains much larger (as high as 20%) than shown in statistical selection comparisons (M3 Does Multicollinearity affect interpretation of weights?
Conclusions Standard Bootstrap models not a panacea –Need to eliminate likely biases and inconsistencies –Cue information not readily available (or even non- existant) to model Mis-weighting of information common –Different companies and different processes lead to differential mis-weightings –For the retailer, mis-weighting so extreme as to raise questions as to motivation Asymmetric loss: A confusion between forecast and inventory decision Major Accuracy improvements possible –But implementation issues complex How do you change the forecasting process to improve the cue weights? See Feature talk tomorrow!