How good (or bad) are seasonal forecasts?

Slides:



Advertisements
Similar presentations
ECMWF Slide 1Met Op training course – Reading, March 2004 Forecast verification: probabilistic aspects Anna Ghelli, ECMWF.
Advertisements

WCRP Overview. Two Problems in Climate Risk Management 1.Uncertainty in the projected impacts The British, he thought, must be gluttons for satire: even.
Guidance of the WMO Commission for CIimatology on verification of operational seasonal forecasts Ernesto Rodríguez Camino AEMET (Thanks to S. Mason, C.
1 of Introduction to Forecasts and Verification.
Verification of probability and ensemble forecasts
Details for Today: DATE:3 rd February 2005 BY:Mark Cresswell FOLLOWED BY:Assignment 2 briefing Evaluation of Model Performance 69EG3137 – Impacts & Models.
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Improving COSMO-LEPS forecasts of extreme events with.
Creating probability forecasts of binary events from ensemble predictions and prior information - A comparison of methods Cristina Primo Institute Pierre.
Daria Kluver Independent Study From Statistical Methods in the Atmospheric Sciences By Daniel Wilks.
Introduction to Probability and Probabilistic Forecasting L i n k i n g S c i e n c e t o S o c i e t y Simon Mason International Research Institute for.
Gridded OCF Probabilistic Forecasting For Australia For more information please contact © Commonwealth of Australia 2011 Shaun Cooper.
Reliability Trends of the Global Forecast System Model Output Statistical Guidance in the Northeastern U.S. A Statistical Analysis with Operational Forecasting.
Barbara Casati June 2009 FMI Verification of continuous predictands
Chapter 7 Correlational Research Gay, Mills, and Airasian
Chapter 13 – Weather Analysis and Forecasting. The National Weather Service The National Weather Service (NWS) is responsible for forecasts several times.
Creating Empirical Models Constructing a Simple Correlation and Regression-based Forecast Model Christopher Oludhe, Department of Meteorology, University.
Multi-Model Ensembling for Seasonal-to-Interannual Prediction: From Simple to Complex Lisa Goddard and Simon Mason International Research Institute for.
Introduction to Seasonal Climate Prediction Liqiang Sun International Research Institute for Climate and Society (IRI)
Statistics. Question Tell whether the following statement is true or false: Nominal measurement is the ranking of objects based on their relative standing.
Test item analysis: When are statistics a good thing? Andrew Martin Purdue Pesticide Programs.
Verification of ensembles Courtesy of Barbara Brown Acknowledgments: Tom Hamill, Laurence Wilson, Tressa Fowler Copyright UCAR 2012, all rights reserved.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts Ian Jolliffe and David Stephenson 1EMS September Sampling uncertainty.
Verification of the Cooperative Institute for Precipitation Systems‘ Analog Guidance Probabilistic Products Chad M. Gravelle and Dr. Charles E. Graves.
Verification methods - towards a user oriented verification WG5.
Heidke Skill Score (for deterministic categorical forecasts) Heidke score = Example: Suppose for OND 1997, rainfall forecasts are made for 15 stations.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Model validation Simon Mason Seasonal Forecasting Using the Climate Predictability Tool Bangkok, Thailand, 12 – 16 January 2015.
Forecasting in CPT Simon Mason Seasonal Forecasting Using the Climate Predictability Tool Bangkok, Thailand, 12 – 16 January 2015.
61 st IHC, New Orleans, LA Verification of the Monte Carlo Tropical Cyclone Wind Speed Probabilities: A Joint Hurricane Testbed Project Update John A.
Can we distinguish wet years from dry years? Simon Mason Seasonal Forecasting Using the Climate Predictability Tool Bangkok, Thailand,
The Centre for Australian Weather and Climate Research A partnership between CSIRO and the Bureau of Meteorology Verification and Metrics (CAWCR)
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Verification of ensemble systems Chiara Marsigli ARPA-SIMC.
Details for Today: DATE:13 th January 2005 BY:Mark Cresswell FOLLOWED BY:Practical Dynamical Forecasting 69EG3137 – Impacts & Models of Climate Change.
Verification methods - towards a user oriented verification The verification group.
Probability Distributions ( 확률분포 ) Chapter 5. 2 모든 가능한 ( 확률 ) 변수의 값에 대해 확률을 할당하는 체계 X 가 1, 2, …, 6 의 값을 가진다면 이 6 개 변수 값에 확률을 할당하는 함수 Definition.
Verification of Seasonal Forecasts
Quantitative Methods in the Behavioral Sciences PSY 302
Outline Sampling Measurement Descriptive Statistics:
Statistics & Evidence-Based Practice
Data analysis is one of the first steps toward determining whether an observed pattern has validity. Data analysis also helps distinguish among multiple.
Data measurement, probability and Spearman’s Rho
Hypothesis Testing: One Sample Cases
Statistical Reasoning in Everyday Life
Understanding the Relative Operating Characteristic (ROC)
RELIABILITY OF QUANTITATIVE & QUALITATIVE RESEARCH TOOLS
Verifying and interpreting ensemble products
Question 1 Given that the globe is warming, why does the DJF outlook favor below-average temperatures in the southeastern U. S.? Climate variability on.
Binary Forecasts and Observations
Module 8 Statistical Reasoning in Everyday Life
Relative Operating Characteristics
Post Processing.
Predictability of Indian monsoon rainfall variability
Using statistics to evaluate your test Gerard Seinhorst
Winter-Quiz- Intermezzo
IRI forecast April 2010 SASCOF-1
1. Homework #2 (not on posted slides) 2. Inferential Statistics 3
Measuring the performance of climate predictions
Can we distinguish wet years from dry years?
Chapter 18 The Binomial Test
Forecast system development activities
Seasonal Forecasting Using the Climate Predictability Tool
Seasonal Forecasting Using the Climate Predictability Tool
the performance of weather forecasts
What is a good ensemble forecast?
ANalysis Of VAriance Lecture 1 Sections: 12.1 – 12.2
Short Range Ensemble Prediction System Verification over Greece
MGS 3100 Business Analysis Regression Feb 18, 2016
Power Regression & Regression estimation of event probabilities (REEP)
Presentation transcript:

How good (or bad) are seasonal forecasts? Simon J. Mason simon@iri.columbia.edu International Research Institute for Climate and Society The Earth Institute of Columbia University Masters in Climate and Society New York, NY, U.S.A., 28 March 2012

Forecast verification If the forecasts are correct 60% of the time, is that good? What makes a forecast “good” (or even “correct”)? How can we measure goodness? Are seasonal forecasts “good”?

Proportion Correct How many times was the forecast correct?

Finley’s Tornado Forecasts A set of tornado forecasts for the U.S. Midwest published in 1884.

No Tornado Forecasts A better score can be achieved by issuing no forecasts of tornadoes!

What makes a “good” forecast? Forecast: Laos will not top the 2012 Olympic medal table. Verification: Who needs to ask? Forecasts are most useful when there is some inherent uncertainty.

Uncertainty “In Hertford, Hereford, and Hampshire hurricanes hardly ever happen.” My Fair Lady Measuring uncertainty: c  (1 – c) where c is the climatological probability

Seasonal temperature forecasts The convention is to issue temperature forecasts using categories defined with a 1971 – 2000 climatology. 2009 2008 2007 2006 2005 2004 2003 2002

What makes a “good” forecast? Forecast: Rafael Nadal will do well in the 2012 Australian Open. Verification: Depends on what “well” means. (He lost in 5 sets in the final) Forecasts cannot be verified if there is ambiguity.

Ambiguity “The British, he thought, must be gluttons for satire: even the weather forecast seemed to be some kind of spoof, predicting every possible combination of weather for the next twenty-four hours without actually committing itself to anything specific.” David John Lodge, Changing Places If there is ambiguity forecasts cannot be verified unequivocally, and so there can be dispute over how good the forecasts are. Forecasts must be specific.

Ambiguity What is the seasonal forecast for Rio de Janeiro? Does the forecast apply to: Individual stations? Regions, and if so what size? If rainfall is above-normal in Rio, and below-normal in the west of region IV so that the area-average is normal, is the forecast “good” (in the sense that the category with the highest probability verified)?

What makes a “good” forecast? Forecast: The Netherlands will successfully defend the PDC World Cup of Darts in 2012. Verification: Who cares? (England won.) Forecasts are most useful when they are salient.

Salience Recent drought in East Africa Did the MAM 2011 forecast provide a warning of drought? Does the forecast even say anything at all about drought?

What makes a “good” forecast? Forecast: Mitt Romney will win the Republican presidential nomination and the 2012 elections Forecast: I don’t think so: I lied to influence your voting Forecasts are consistent when they match what the forecaster thinks.

Consistency GHACOF MAM forecasts forecasts observations The failure to forecast the shift towards dry conditions, may be partly because of hedging.

What makes a “good” forecast? Forecast: The Giants will beat the Patriots in the Super Bowl XLVI 27 points to 23. Verification: The Giants won 21 to 17. Correctness is only one aspect of forecast quality.

Correctness, Precision, Accuracy “By Jove, she’s got it! I think she’s got it.” My Fair Lady Correctness – did what the forecast say would happen actually happen? Accuracy – is the forecast close to the observation? Precision – what level of accuracy is being implied?

Correctness, Precision, Accuracy Most seasonal forecasts avoid being overly precise (3, or maybe 5, categories). Sharpness measures whether the forecasts vary much from climatology. Accuracy and correctness are irrelevant concepts when forecasts are probabilistic.

Skill Is one set of forecasts better than another? A skill score is used to compare the quality of one forecast strategy with that of another set (the reference set). The skill score defines the percentage improvement over the reference forecast. Skill scores are relative measures of forecast quality. But better in what respect? We still need to define “good” …

Heidke Skill Score Against perpetual forecasts of no tornadoes, Finley’s forecasts are pretty awful (but note that -100% is not the minimum):

Heidke Skill Score … but against random guessing, Finley does pretty well:

What makes a “good” forecast? Quality forecasts should correspond with what actually happens (includes correctness, accuracy, and some other attributes …) Value forecasts should be potentially useful (includes salience, timeliness, specificity) Consistency forecasts should indicate what the experts really think Murphy, A. H., 1993: What is a good forecast? An essay on the nature of goodness in weather forecasting. Weather and Forecasting, 8, 281–293.

Forecast verification If the forecasts are correct 60% of the time, is that good? What makes a forecast “good”? How can we measure goodness? Are seasonal forecasts “good”?

Forecast verification Discrete Continuous Deterministic It will rain tomorrow There will be 10 mm of rain tomorrow Probabilistic There is a 50% chance of rain tomorrow There is a p% chance of more than k mm of rain tomorrow

Seasonal Forecast Formats, I

Forecast verification Discrete Continuous Deterministic It will rain tomorrow There will be 10 mm of rain tomorrow Probabilistic There is a 50% chance of rain tomorrow There is a p% chance of more than k mm of rain tomorrow

Error measures calculate the accuracy in a forecast.

Biases Variance or amplitude bias: Typically very small in statistically-corrected forecasts if skill is low because forecasts always close to the mean Mean or unconditional bias: Always close to zero for cross-validated forecasts; Indicates ability to forecast shifts in climate for retroactive forecasts; Slightly negative if predictand data are positively skewed.

Pearson’s correlation Pearson’s correlation measures association (are increases and decreases in the forecasts associated with increases and decreases in the observations?). It does not measure accuracy. When squared, it tells us how much of the variance of the observations is correctly forecast.

Spearman’s correlation Numerator: ? Denominator: ? How much of the squared variance of the ranks for the observations can we correctly forecast? Huh? Spearman’s correlation does not have as obvious an interpretation as Pearson’s, but it is much less sensitive to extremes.

Kendall’s tau Denominator: total number of pairs. Numerator: difference in the numbers of concordant and discordant pairs. Kendall’s correlation measures discrimination (do the forecasts increase and decrease as the observations increase and decrease?). It can be transformed to the probability that the forecasts successfully distinguish the wetter (or hotter) of two observations?

Seasonal Forecast Formats, II

Forecast verification Discrete Continuous Deterministic It will rain tomorrow There will be 10 mm of rain tomorrow Probabilistic There is a 50% chance of rain tomorrow There is a p% chance of more than k mm of rain tomorrow

Probability “errors” The Brier score is closely related to the mean squared-error: i.e., the two equations are identical, except for the units.

Reliability A consistency between the a priori stated probabilities of an event and the a posteriori observed relative frequencies of this event. Murphy, A. H., 1973: A new vector partition of the probability score. Journal of Applied Meteorology, 12, 595–600. Murphy, A. H., 1993: What is a good forecast? An essay on the nature of goodness in weather forecasting. Weather and Forecasting, 8, 281–293. If the proportion of times that the event occurs is the same as the prior stated probability for all values of the prior probability, the system is reliable (or well calibrated).

Measuring Reliability Reliability Diagrams Calculate separately the relative frequencies of an event occurring for each forecast probability. Good reliability indicated by a 45° diagonal.

Resolution Does the outcome increase and decrease in frequency/intensity if the forecast increases and decreases? If the median outcome is the same regardless of the forecasts then the forecasts contain no useful information – there is no resolution.

Attributes Diagrams The histograms show the sharpness. The vertical line shows the observed climatology and indicates the forecast bias. The horizontal line shows no resolution. The diagonal lines show reliability (solid) and “skill” (dashed). The coloured line shows the reliability of the forecasts. The dashed line shows a smoothed fit.

Discrimination Does the forecast increase and decrease in as the outcome increases and decreases? If the median forecast is the same regardless of the outcome then the forecasts contain no useful information – there is no discrimination.

Measuring discrimination Discrimination is measured using a “two-alternative forced choice (2AFC) test”. A 2AFC test is like a “Who Wants to be a Millionaire” question after opting for a 50:50.

Two-Alternative Forced Choice Test In which of these two Januaries did El Niño occur (Niño3.4 index >27°C)? What is the probability of getting the answer correct? That depends on whether we can believe the forecasts. Repeat for all possible pairs of forecasts. This test can be generalized to three categories (as used in seasonal forecasts) and to infinite categories (equivalent to Kendall’s tau.

Forecast Attributes Reliability did we correctly indicate the uncertainty in the forecast? But reliable forecasts may contain no useful information Resolution did we get more rainfall when we forecast more rainfall? But sample size is a problem Discrimination did we forecast more rainfall when we got more rainfall?

Forecast verification If the forecasts are correct 60% of the time, is that good? What makes a forecast “good”? How can we measure goodness? Are seasonal forecasts “good”?

Skill of PRESAO forecasts Have the PRESAO forecasts had good discrimination? JAS

PRESAO JAS forecasts observations The average forecasts fail to forecast the shift towards dry conditions.

PRESAO JAS

Are seasonal forecasts skillful? “Skill” is a vaguely defined forecast attribute. RCOF forecasts have some skill (resolution / discrimination), but reliability is poor. The IRI forecasts have been reliable for precipitation.

Additional reading Barnston, A. G., S. Li, S. J. Mason, D. G. DeWitt, L. Goddard, and X. Gong, 2010: Verification of the first 11 years of IRI’s seasonal climate forecasts. Journal of Applied Meteorology and Climatology, 49, 493–520. Mason, S. J., 2012: Seasonal and longer-range forecasts. In Jolliffe, I. T., and D. B. Stephenson (Eds), Forecast Verification: A Practitioner’s Guide in Atmospheric Science, Wiley, Chichester, 203–220. Mason, S. J., 2008: Understanding forecast verification statistics. Meteorological Applications, 15, 31–40. Mason, S. J., and A. P. Weigel, 2009: A generic forecast verification framework for administrative purposes. Monthly Weather Review, 137, 331–349.