Download presentation
Presentation is loading. Please wait.
1
How good (or bad) are seasonal forecasts?
Simon J. Mason International Research Institute for Climate and Society The Earth Institute of Columbia University Masters in Climate and Society New York, NY, U.S.A., 28 March 2012
2
Forecast verification
If the forecasts are correct 60% of the time, is that good? What makes a forecast “good” (or even “correct”)? How can we measure goodness? Are seasonal forecasts “good”?
3
Proportion Correct How many times was the forecast correct?
4
Finley’s Tornado Forecasts
A set of tornado forecasts for the U.S. Midwest published in 1884.
5
No Tornado Forecasts A better score can be achieved by issuing no forecasts of tornadoes!
6
What makes a “good” forecast?
Forecast: Laos will not top the 2012 Olympic medal table. Verification: Who needs to ask? Forecasts are most useful when there is some inherent uncertainty.
7
Uncertainty “In Hertford, Hereford, and Hampshire hurricanes hardly ever happen.” My Fair Lady Measuring uncertainty: c (1 – c) where c is the climatological probability
8
Seasonal temperature forecasts
The convention is to issue temperature forecasts using categories defined with a 1971 – 2000 climatology. 2009 2008 2007 2006 2005 2004 2003 2002
9
What makes a “good” forecast?
Forecast: Rafael Nadal will do well in the 2012 Australian Open. Verification: Depends on what “well” means. (He lost in 5 sets in the final) Forecasts cannot be verified if there is ambiguity.
10
Ambiguity “The British, he thought, must be gluttons for satire: even the weather forecast seemed to be some kind of spoof, predicting every possible combination of weather for the next twenty-four hours without actually committing itself to anything specific.” David John Lodge, Changing Places If there is ambiguity forecasts cannot be verified unequivocally, and so there can be dispute over how good the forecasts are. Forecasts must be specific.
11
Ambiguity What is the seasonal forecast for Rio de Janeiro? Does the forecast apply to: Individual stations? Regions, and if so what size? If rainfall is above-normal in Rio, and below-normal in the west of region IV so that the area-average is normal, is the forecast “good” (in the sense that the category with the highest probability verified)?
12
What makes a “good” forecast?
Forecast: The Netherlands will successfully defend the PDC World Cup of Darts in Verification: Who cares? (England won.) Forecasts are most useful when they are salient.
13
Salience Recent drought in East Africa
Did the MAM 2011 forecast provide a warning of drought? Does the forecast even say anything at all about drought?
14
What makes a “good” forecast?
Forecast: Mitt Romney will win the Republican presidential nomination and the 2012 elections Forecast: I don’t think so: I lied to influence your voting Forecasts are consistent when they match what the forecaster thinks.
15
Consistency GHACOF MAM forecasts forecasts observations
The failure to forecast the shift towards dry conditions, may be partly because of hedging.
16
What makes a “good” forecast?
Forecast: The Giants will beat the Patriots in the Super Bowl XLVI 27 points to 23. Verification: The Giants won 21 to 17. Correctness is only one aspect of forecast quality.
17
Correctness, Precision, Accuracy
“By Jove, she’s got it! I think she’s got it.” My Fair Lady Correctness – did what the forecast say would happen actually happen? Accuracy – is the forecast close to the observation? Precision – what level of accuracy is being implied?
18
Correctness, Precision, Accuracy
Most seasonal forecasts avoid being overly precise (3, or maybe 5, categories). Sharpness measures whether the forecasts vary much from climatology. Accuracy and correctness are irrelevant concepts when forecasts are probabilistic.
19
Skill Is one set of forecasts better than another? A skill score is used to compare the quality of one forecast strategy with that of another set (the reference set). The skill score defines the percentage improvement over the reference forecast. Skill scores are relative measures of forecast quality. But better in what respect? We still need to define “good” …
20
Heidke Skill Score Against perpetual forecasts of no tornadoes, Finley’s forecasts are pretty awful (but note that -100% is not the minimum):
21
Heidke Skill Score … but against random guessing, Finley does pretty well:
22
What makes a “good” forecast?
Quality forecasts should correspond with what actually happens (includes correctness, accuracy, and some other attributes …) Value forecasts should be potentially useful (includes salience, timeliness, specificity) Consistency forecasts should indicate what the experts really think Murphy, A. H., 1993: What is a good forecast? An essay on the nature of goodness in weather forecasting. Weather and Forecasting, 8, 281–293.
23
Forecast verification
If the forecasts are correct 60% of the time, is that good? What makes a forecast “good”? How can we measure goodness? Are seasonal forecasts “good”?
24
Forecast verification
Discrete Continuous Deterministic It will rain tomorrow There will be 10 mm of rain tomorrow Probabilistic There is a 50% chance of rain tomorrow There is a p% chance of more than k mm of rain tomorrow
25
Seasonal Forecast Formats, I
26
Forecast verification
Discrete Continuous Deterministic It will rain tomorrow There will be 10 mm of rain tomorrow Probabilistic There is a 50% chance of rain tomorrow There is a p% chance of more than k mm of rain tomorrow
27
Error measures calculate the accuracy in a forecast.
28
Biases Variance or amplitude bias:
Typically very small in statistically-corrected forecasts if skill is low because forecasts always close to the mean Mean or unconditional bias: Always close to zero for cross-validated forecasts; Indicates ability to forecast shifts in climate for retroactive forecasts; Slightly negative if predictand data are positively skewed.
29
Pearson’s correlation
Pearson’s correlation measures association (are increases and decreases in the forecasts associated with increases and decreases in the observations?). It does not measure accuracy. When squared, it tells us how much of the variance of the observations is correctly forecast.
30
Spearman’s correlation
Numerator: ? Denominator: ? How much of the squared variance of the ranks for the observations can we correctly forecast? Huh? Spearman’s correlation does not have as obvious an interpretation as Pearson’s, but it is much less sensitive to extremes.
31
Kendall’s tau Denominator: total number of pairs.
Numerator: difference in the numbers of concordant and discordant pairs. Kendall’s correlation measures discrimination (do the forecasts increase and decrease as the observations increase and decrease?). It can be transformed to the probability that the forecasts successfully distinguish the wetter (or hotter) of two observations?
32
Seasonal Forecast Formats, II
33
Forecast verification
Discrete Continuous Deterministic It will rain tomorrow There will be 10 mm of rain tomorrow Probabilistic There is a 50% chance of rain tomorrow There is a p% chance of more than k mm of rain tomorrow
34
Probability “errors” The Brier score is closely related to the mean squared-error: i.e., the two equations are identical, except for the units.
35
Reliability A consistency between the a priori stated probabilities of an event and the a posteriori observed relative frequencies of this event. Murphy, A. H., 1973: A new vector partition of the probability score. Journal of Applied Meteorology, 12, 595–600. Murphy, A. H., 1993: What is a good forecast? An essay on the nature of goodness in weather forecasting. Weather and Forecasting, 8, –293. If the proportion of times that the event occurs is the same as the prior stated probability for all values of the prior probability, the system is reliable (or well calibrated).
36
Measuring Reliability
Reliability Diagrams Calculate separately the relative frequencies of an event occurring for each forecast probability. Good reliability indicated by a 45° diagonal.
37
Resolution Does the outcome increase and decrease in frequency/intensity if the forecast increases and decreases? If the median outcome is the same regardless of the forecasts then the forecasts contain no useful information – there is no resolution.
38
Attributes Diagrams The histograms show the sharpness.
The vertical line shows the observed climatology and indicates the forecast bias. The horizontal line shows no resolution. The diagonal lines show reliability (solid) and “skill” (dashed). The coloured line shows the reliability of the forecasts. The dashed line shows a smoothed fit.
39
Discrimination Does the forecast increase and decrease in as the outcome increases and decreases? If the median forecast is the same regardless of the outcome then the forecasts contain no useful information – there is no discrimination.
40
Measuring discrimination
Discrimination is measured using a “two-alternative forced choice (2AFC) test”. A 2AFC test is like a “Who Wants to be a Millionaire” question after opting for a 50:50.
41
Two-Alternative Forced Choice Test
In which of these two Januaries did El Niño occur (Niño3.4 index >27°C)? What is the probability of getting the answer correct? That depends on whether we can believe the forecasts. Repeat for all possible pairs of forecasts. This test can be generalized to three categories (as used in seasonal forecasts) and to infinite categories (equivalent to Kendall’s tau.
42
Forecast Attributes Reliability did we correctly indicate the uncertainty in the forecast? But reliable forecasts may contain no useful information Resolution did we get more rainfall when we forecast more rainfall? But sample size is a problem Discrimination did we forecast more rainfall when we got more rainfall?
43
Forecast verification
If the forecasts are correct 60% of the time, is that good? What makes a forecast “good”? How can we measure goodness? Are seasonal forecasts “good”?
44
Skill of PRESAO forecasts
Have the PRESAO forecasts had good discrimination? JAS
45
PRESAO JAS forecasts observations
The average forecasts fail to forecast the shift towards dry conditions.
46
PRESAO JAS
47
Are seasonal forecasts skillful?
“Skill” is a vaguely defined forecast attribute. RCOF forecasts have some skill (resolution / discrimination), but reliability is poor. The IRI forecasts have been reliable for precipitation.
48
Additional reading Barnston, A. G., S. Li, S. J. Mason, D. G. DeWitt, L. Goddard, and X. Gong, 2010: Verification of the first 11 years of IRI’s seasonal climate forecasts. Journal of Applied Meteorology and Climatology, 49, 493–520. Mason, S. J., 2012: Seasonal and longer-range forecasts. In Jolliffe, I. T., and D. B. Stephenson (Eds), Forecast Verification: A Practitioner’s Guide in Atmospheric Science, Wiley, Chichester, 203–220. Mason, S. J., 2008: Understanding forecast verification statistics. Meteorological Applications, 15, 31–40. Mason, S. J., and A. P. Weigel, 2009: A generic forecast verification framework for administrative purposes. Monthly Weather Review, 137, 331–349.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.