forecasts of rare events Verification for forecasts of rare events Chris Ferro Mathematics Research Institute University of Exeter, UK 15 mins + 5 mins questions 11th International Meeting on Statistical Climatology Edinburgh, 14 July 2010
Forecast verification Describe and understand forecast performance. We consider challenges raised by deterministic forecasts for the occurrence of large, rare values. Read more next year in the expanded second edition of ‘the book’!
Daily rainfall in mid-Wales Observations: radar measurements Forecasts: direct output from the old 12km mesoscale version of the Met Office Unified Model 1 Jan 05 – 11 Nov 06 Data courtesy of Marion Mittermaier
Threshold exceedances Observe Forecast Yes No 13 9 27 600 Hit rate = 13 / (13 + 27) = 0.325
Threshold exceedances Observe Forecast Yes No 7 3 12 627 Hit rate = 7 / (7 + 12) = 0.368
Threshold exceedances Observe Forecast Yes No 3 5 638 Hit rate = 3 / (3 + 5) = 0.375
Threshold exceedances Observe Forecast Yes No 1 2 4 642 Hit rate = 1 / (1 + 4) = 0.2
Threshold exceedances Observe Forecast Yes No 1 3 645 Hit rate = 0 / (0 + 3) = 0
Threshold exceedances Observe Forecast Yes No 1 648 Hit rate = 0 / (0 + 0) = NaN
Sampling variation increases Hit rate decreases to 0 Sampling variation increases 95% confidence intervals
Proportion correct tends to 1 Sampling variation decreases 95% confidence intervals (a+d)/n >= 1-2p since (b+c)/n <= 2p so constraint reduces absolute variation. Relative variation increases
Lessons As we move to rarer events... forecast performance degenerates, sampling variation (uncertainty) can increase, some aspects of performance are hard to maintain e.g. hit rate, other aspects of performance are easy to maintain e.g. proportion correct.
Solutions Verification measures that do not degenerate, e.g. Extreme Dependency Score measures decay rate. Reduce sampling variation by... imposing a parametric form on how the entries in the contingency table change with the threshold, estimating the parameters with moderate events, using the fitted model to extrapolate to rare events. Ferro (2007), Stephenson et al. (2008), Ferro and Stephenson (2010)
Define the base rate to be p = (a + c) / n. Table of frequencies Observe Forecast Yes No a b a + b c d c + d a + c b + d n Define the base rate to be p = (a + c) / n.
Table of relative frequencies Observe Forecast Yes No a / n b / n q c / n d / n 1 – q p 1 – p 1 There is no theory for how q changes as p → 0 so recalibrate the forecasts to force q = p and model the bias, q / p, separately.
Recalibration Use unequal thresholds to equalise the numbers of observed and forecasted events, removing bias. Observe Forecast Yes No 9 10 620
Parametric model for the table Observe Forecast Yes No a / n • p 1 – p 1 For a wide class of (regularly varying) probability distributions, a / n ~ αpβ as p → 0 where α > 0 and β ≥ 1. For random forecasts, α = 1 and β = 2. Ledford and Tawn (1996), Heffernan (2000)
Parametric model for the table Observe Forecast Yes No αpβ • p 1 – p 1 Given estimates of α and β, we can derive the behaviour of verification measures for small p, e.g. Hit rate ~ αpβ / p = αpβ–1.
Check model adequacy If the model fits then log a / n ~ log α + β log p for small p and the graph of log a / n against log p will approximate a straight line for small enough p.
Parameter estimation If the model holds for p < p0 then the observations and forecasts can be transformed into a sample from a distribution for which Pr(Z > z) ≈ α exp(–βz) for z > –log p0. Choose p0 and then estimate α and β by maximum likelihood using those transformed data that exceed –log p0.
Daily rainfall in mid-Wales Observe Forecast Yes No 0.97p1.25 • p 1 – p 1 The model is a good fit for base rates p < 0.135 (not shown) with estimates α = 0.97 (0.71, 1.60) and β = 1.25 (1.11, 1.47). 90% confidence intervals
Standard (black) and model-based (red) estimates of the hit rate with bootstrap 90% confidence intervals
Standard (black) and model-based (red) estimates of prop. correct with bootstrap 90% confidence intervals
Summary Forecast performance degenerates for rare events (some aspects are hard to maintain, others are easy to maintain) and uncertainty can increase. Probability models based on extreme-value theory can help to estimate performance more precisely and to extrapolate performance to rarer events.
Discussion A model for how the bias changes with base rate could be included to estimate the performance of uncalibrated forecasts. Similar ideas can be used for other types of forecasts and observations, e.g. probabilistic forecasts, although different models are required. Other related issues include observation errors, location/timing errors, sensitivity to outliers, etc.
References c.a.t.ferro@ex.ac.uk Ferro CAT (2007) A probability model for verifying deterministic forecasts of extreme events. Weather and Forecasting, 22, 1089-1100. Ferro CAT, Stephenson DB (2010) Improved verification measures for deterministic forecasts of rare, binary events. In preparation. Heffernan JE (2000) A directory of coefficients of tail dependence. Extremes, 3, 279-290. Ledford AW, Tawn JA (1996) Statistics for near independence in multivariate extreme values. Biometrika, 83, 169-187. Stephenson DB, Casati B, Ferro CAT, Wilson CA (2008) The extreme dependency score: a non-vanishing verification score for deterministic forecasts of rare events. Meteorological Applications, 15, 41-50. c.a.t.ferro@ex.ac.uk