Download presentation
Presentation is loading. Please wait.
1
the performance of weather forecasts
Ensemble size and the performance of weather forecasts Chris Ferro Mathematics Research Institute University of Exeter, UK 73rd Annual Meeting of the Institute of Mathematical Statistics Gothenburg, 12 August 2010
2
Outline Forecast verification Evaluating probabilistic forecasts The effects of changing ensemble size
3
Forecast verification
Describe and understand forecast performance. Monitor performance, improve predictions, build confidence, inform use, reward success... Different types of prediction may require different methods, e.g. point/spatial, forecast/warning, deterministic/probabilistic.
4
Probabilistic forecasts
Deterministic numerical weather prediction models project an initial state forward in time according to the laws of hydro- and thermo-dynamics. A set (or ensemble) of projections, representing future model states that fit with observations, is obtained by perturbing the initial state. Such ensemble members are often used to form a probabilistic prediction. A simple example would be the proportion of members that predict an event.
5
October rainfall in Jakarta
9-member ensemble (+) and observations (|)
6
Proper scoring rules A scoring rule is a function, s(f,v), of the predictive density or mass function, f, and the verification, v. A scoring rule is proper if the forecaster optimises their expected score by forecasting their belief, g; i.e. E[s(g,V)] ≤ E[s(f,V)] for all f, where V ~ g. Logarithmic score: s(f,v) = –log f(v) Brier score: s(f,v) = [f(1) – v]2 when v = 0 or 1 Brier (1950), Good (1952), Gneiting & Raftery (2007)
7
Jakarta rainfall Consider forecasting the event ‘rainfall > 50mm’. Let vt = 1 if the event is observed at time t, vt = 0 otherwise, and let ft be the proportion of the m = 9 ensemble members that predict the event at time t. Brier score For our Jakarta rainfall forecasts, Bm = 0.22 ± 0.05.
8
Reliability and resolution
Proper scores can be decomposed into measures of two key attributes of forecast performance. A forecast system is reliable if, for all forecasts f, the long-run frequency distribution of verifications when the forecast was f is also f, i.e. π(v | f) = f(v). A forecast system has resolution if the long-run conditional frequency distribution of verifications varies with the forecast, i.e. π(v | f) varies with f.
9
Decomposing proper scores
Proper scores decompose into three terms, e.g. For Jakarta, REL = 0.06, RES = 0.05, UNC = 0.21. Reliability Resolution Uncertainty BSS = (RES – REL) / UNC. Since RES < REL, these forecasts have less ‘skill’ than the climatological forecast. Logarithmic score = the Kullback-Leibler divergence between the forecast distributions and the conditional climatological distributions – the Jensen-Shannon distance between the conditional climatological distributions + the entropy of the climatological distribution. Murphy (1973), Bröcker (2009)
10
Ensemble size effects How should we balance ensemble size and model complexity given limited computing resource? How should we account for differing ensemble sizes when we compare forecasting systems? We can obtain expressions for the effect of ensemble size on some proper scoring rules. Subsampling techniques can also be used.
11
Ensemble size: Brier score
Assuming stationarity, the expected Brier score is where V = 1 (or V = 0) if the event occurs (or not), N = F Fm and Fi = 1 (or Fi = 0) if the event is forecasted (or not) by the i-th ensemble member. Assume (F1, ..., Fm) are 2nd-order exchangeable. (Without some assumption we cannot predict how increasing ensemble size will affect performance.)
12
Ensemble size: Brier score
It is straightforward to show that and therefore
13
Ensemble size: Brier score
Given m members, an unbiased estimator for E(BM) is Increasing ensemble size improves the expected Brier score, both reliability and resolution. Effect is smaller for sharper forecasts or larger m, and depends on the forecasts only: increasing m just improves precision of model probability estimates. When M ≤ m this is the average over subsamples of size M. Closed-form expression for asymptotic distribution. Ferro (2007), Ferro et al. (2008)
14
Subsampled (o) and estimated (-) Brier scores
Jakarta rainfall Subsampled (o) and estimated (-) Brier scores
15
Summary Increasing ensemble size can improve both the reliability and resolution of forecasts. The effect is smaller for larger or sharper ensembles. The effect on some verification measures of changing the ensemble size can be estimated without bias if members are exchangeable. The results presented here extend to the discrete and continuous ranked probability scores.
16
Directions Currently developing expressions for the joint effects of ensemble size and model grid size. Increasing ensemble size improves precision of the estimate of the model distribution; changing model complexity changes the model distribution. Other verification areas of strong current interest: rare-event forecasts, climate predictions, spatial and object-oriented verification, observation errors.
17
References c.a.t.ferro@ex.ac.uk
Brier GW (1950) Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78, 1–3. Bröcker J (2009) Reliability, sufficiency, and the decomposition of proper scores. Quarterly J. Royal Meteorological Society, 135, 1512–1519. Ferro CAT (2007) Comparing probabilistic forecasting systems with the Brier score. Weather and Forecasting, 22, 1076–1088. Ferro CAT, Richardson DS, Weigel AP (2008) On the effect of ensemble size on the discrete and continuous ranked probability scores. Meteorological Applications, 15, 19–24. Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J. American Statistical Association, 102, 359–378. Good IJ (1952) Rational decisions. J. Royal Statistical Society, 14, 107–114. Murphy AH (1973) A new vector partition of the probability score. J. Applied Meteorology, 12, 595–600. Jolliffe IT, Stephenson DB (2003) Forecast Verification: A Practitioner’s Guide in the Atmospheric Sciences. Wiley.
18
Averages of subsampled Brier score components
Jakarta rainfall Averages of subsampled Brier score components
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.