Predicting the performance of climate predictions Chris Ferro (University of Exeter) Tom Fricker, Fredi Otto, Emma Suckling 13th EMS Annual Meeting and.

Slides:



Advertisements
Similar presentations
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Extended range forecasts at MeteoSwiss: User experience.
Advertisements

Measuring the performance of climate predictions Chris Ferro, Tom Fricker, David Stephenson Mathematics Research Institute University of Exeter, UK IMA.
What is a good ensemble forecast? Chris Ferro University of Exeter, UK With thanks to Tom Fricker, Keith Mitchell, Stefan Siegert, David Stephenson, Robin.
What is a good ensemble forecast? Chris Ferro University of Exeter, UK With thanks to Tom Fricker, Keith Mitchell, Stefan Siegert, David Stephenson, Robin.
Fair scores for ensemble forecasts Chris Ferro University of Exeter 13th EMS Annual Meeting and 11th ECAM (10 September 2013, Reading, UK)
LRF Training, Belgrade 13 th - 16 th November 2013 © ECMWF Sources of predictability and error in ECMWF long range forecasts Tim Stockdale European Centre.
ECMWF long range forecast systems
Statistical post-processing using reforecasts to improve medium- range renewable energy forecasts Tom Hamill and Jeff Whitaker NOAA Earth System Research.
Creating probability forecasts of binary events from ensemble predictions and prior information - A comparison of methods Cristina Primo Institute Pierre.
Jon Robson (Uni. Reading) Rowan Sutton (Uni. Reading) and Doug Smith (UK Met Office) Analysis of a decadal prediction system:
On judging the credibility of climate predictions Chris Ferro (University of Exeter) Tom Fricker, Fredi Otto, Emma Suckling 12th International Meeting.
Introduction to Probability and Probabilistic Forecasting L i n k i n g S c i e n c e t o S o c i e t y Simon Mason International Research Institute for.
1 Use of Mesoscale and Ensemble Modeling for Predicting Heavy Rainfall Events Dave Ondrejik Warning Coordination Meteorologist
Analysis of Extremes in Climate Science Francis Zwiers Climate Research Division, Environment Canada. Photo: F. Zwiers.
For stimulus s, have estimated s est Bias: Cramer-Rao bound: Mean square error: Variance: Fisher information How good is our estimate? (ML is unbiased:
Climate case study. Outline The challenge The simulator The data Definitions and conventions Elicitation Expert beliefs about climate parameters Expert.
The potential to narrow uncertainty in regional climate predictions Ed Hawkins, Rowan Sutton NCAS-Climate, University of Reading IMSC 11 – July 2010.
University of Oxford Quantifying and communicating the robustness of estimates of uncertainty in climate predictions Implications for uncertainty language.
EG1204: Earth Systems: an introduction Meteorology and Climate Lecture 7 Climate: prediction & change.
Evaluation of a Mesoscale Short-Range Ensemble Forecasting System over the Northeast United States Matt Jones & Brian A. Colle NROW, 2004 Institute for.
Introduction to Numerical Weather Prediction and Ensemble Weather Forecasting Tom Hamill NOAA-CIRES Climate Diagnostics Center Boulder, Colorado USA.
The discipline of statistics: Provides methods for organizing and summarizing data and for drawing conclusions based on information contained in data.
A Regression Model for Ensemble Forecasts David Unger Climate Prediction Center.
Chapter 13 – Weather Analysis and Forecasting. The National Weather Service The National Weather Service (NWS) is responsible for forecasts several times.
Evaluation of Potential Performance Measures for the Advanced Hydrologic Prediction Service Gary A. Wick NOAA Environmental Technology Laboratory On Rotational.
Introduction to Seasonal Climate Prediction Liqiang Sun International Research Institute for Climate and Society (IRI)
1 Patch Complexity, Finite Pixel Correlations and Optimal Denoising Anat Levin, Boaz Nadler, Fredo Durand and Bill Freeman Weizmann Institute, MIT CSAIL.
Evaluating decadal hindcasts: why and how? Chris Ferro (University of Exeter) T. Fricker, F. Otto, D. Stephenson, E. Suckling CliMathNet Conference (3.
Evidence Based Medicine
Verification of ensembles Courtesy of Barbara Brown Acknowledgments: Tom Hamill, Laurence Wilson, Tressa Fowler Copyright UCAR 2012, all rights reserved.
Estimating a Population Proportion
Measuring forecast skill: is it real skill or is it the varying climatology? Tom Hamill NOAA Earth System Research Lab, Boulder, Colorado
Celeste Saulo and Juan Ruiz CIMA (CONICET/UBA) – DCAO (FCEN –UBA)
Toward Probabilistic Seasonal Prediction Nir Krakauer, Hannah Aizenman, Michael Grossberg, Irina Gladkova Department of Civil Engineering and CUNY Remote.
Verification of IRI Forecasts Tony Barnston and Shuhua Li.
Evaluation of climate models, Attribution of climate change IPCC Chpts 7,8 and 12. John F B Mitchell Hadley Centre How well do models simulate present.
University of Oxford Uncertainty in climate science: what it means for the current debate Myles Allen Department of Physics, University of Oxford
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Probabilistic Forecasting. pdfs and Histograms Probability density functions (pdfs) are unobservable. They can only be estimated. They tell us the density,
ENSEMBLES RT4/RT5 Joint Meeting Paris, February 2005 Overview of the WP5.3 Activities Partners: ECMWF, METO/HC, MeteoSchweiz, KNMI, IfM, CNRM, UREAD/CGAM,
1 Motivation Motivation SST analysis products at NCDC SST analysis products at NCDC  Extended Reconstruction SST (ERSST) v.3b  Daily Optimum Interpolation.
Chapter Thirteen Copyright © 2004 John Wiley & Sons, Inc. Sample Size Determination.
. Outline  Evaluation of different model-error schemes in the WRF mesoscale ensemble: stochastic, multi-physics and combinations thereof  Where is.
Forecasting and Decision Making Under Uncertainty Thomas R. Stewart, Ph.D. Center for Policy Research Rockefeller College of Public Affairs and Policy.
18 September 2009: On the value of reforecasts for the TIGGE database 1/27 On the value of reforecasts for the TIGGE database Renate Hagedorn European.
- 1 - Overall procedure of validation Calibration Validation Figure 12.4 Validation, calibration, and prediction (Oberkampf and Barone, 2004 ). Model accuracy.
Based on data to 2000, 20 years of additional data could halve uncertainty in future warming © Crown copyright Met Office Stott and Kettleborough, 2002.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Verification of ensemble systems Chiara Marsigli ARPA-SIMC.
Two extra components in the Brier Score Decomposition David B. Stephenson, Caio A. S. Coelho (now at CPTEC), Ian.T. Jolliffe University of Reading, U.K.
Nathalie Voisin 1, Florian Pappenberger 2, Dennis Lettenmaier 1, Roberto Buizza 2, and John Schaake 3 1 University of Washington 2 ECMWF 3 National Weather.
NCAR, 15 April Fuzzy verification of fake cases Beth Ebert Center for Australian Weather and Climate Research Bureau of Meteorology.
VERIFICATION OF A DOWNSCALING SEQUENCE APPLIED TO MEDIUM RANGE METEOROLOGICAL PREDICTIONS FOR GLOBAL FLOOD PREDICTION Nathalie Voisin, Andy W. Wood and.
Sample Space and Events Section 2.1 An experiment: is any action, process or phenomenon whose outcome is subject to uncertainty. An outcome: is a result.
Figures from “The ECMWF Ensemble Prediction System”
Verifying and interpreting ensemble products
GPC CPTEC: Seasonal forecast activities update
Evaluating forecasts and models
Nathalie Voisin, Andy W. Wood and Dennis P. Lettenmaier
forecasts of rare events
Judging the credibility of climate projections
Caio Coelho (Joint CBS/CCl IPET-OPSLS Co-chair) CPTEC/INPE, Brazil
Numerical Weather Prediction Center (NWPC), Beijing, China
Christoph Gebhardt, Zied Ben Bouallègue, Michael Buchhold
SPECS: Climate Prediction for Climate Services
Measuring the performance of climate predictions
Verification of probabilistic forecasts: comparing proper scoring rules Thordis L. Thorarinsdottir and Nina Schuhen
the performance of weather forecasts
What is a good ensemble forecast?
Ryan Kang, Wee Leng Tan, Thea Turkington, Raizan Rahmat
Presentation transcript:

Predicting the performance of climate predictions Chris Ferro (University of Exeter) Tom Fricker, Fredi Otto, Emma Suckling 13th EMS Annual Meeting and 11th ECAM (10 September 2013, Reading, UK)

Performance-based arguments Extrapolate past performance on basis of knowledge of the climate model and the real climate (Parker 2010). Define a reference class of predictions (including the prediction in question) whose performances you cannot reasonably order in advance, measure the performance of some members of the class, and infer the performance of the prediction in question. Popular for weather forecasts (many similar forecasts) but less use for climate predictions (Frame et al. 2007).

Bounding arguments 1.Form a reference class of predictions that does not contain the prediction in question. 2.Judge if the prediction in question is a harder or easier problem than those in the reference class. 3.Measure the performance of some members of the reference class. This bounds your expectations about the performance of the prediction in question (Otto et al. 2013).

Hindcast example Global mean, annual mean surface air temperature anomalies relative to mean over the previous 20 years. Initial-condition ensembles of HadCM3 launched every year from 1960 to Measure performance by the absolute errors and consider a lead time of 9 years. 1. Perfect model: predict another HadCM3 member 2. Imperfect model: predict a MIROC5 member 3. Reality: predict HadCRUT4 observations

Hindcast example

1. Errors when predict HadCM3

2. Errors when predict MIROC5

3. Errors when predict reality

Recommendations Use existing data explicitly to justify quantitative predictions of the performance of climate predictions. Collect data on more predictions, covering a range of physical processes and conditions, to tighten bounds. Design hindcasts and imperfect model experiments to be as similar as possible to future prediction problems. Train ourselves to be better judges of relative performance, especially to avoid over-confidence.

References Ferro CAT (2013) Fair scores for ensemble forecasts. Submitted Frame DJ, Faull NE, Joshi MM, Allen MR (2007) Probabilistic climate forecasts and inductive problems. Philos. Trans. R. Soc. A 365, Fricker TE, Ferro CAT, Stephenson DB (2013) Three recommendations for evaluating climate predictions. Meteorol. Appl. 20, Goddard L, co-authors (2013) A verification framework for interannual-to- decadal predictions experiments. Clim. Dyn. 40, Otto FEL, Ferro CAT, Fricker TE, Suckling EB (2013) On judging the credibility of climate predictions. Clim. Change, in press Parker WS (2010) Predicting weather and climate: uncertainty, ensembles and probability. Stud. Hist. Philos. Mod. Phys. 41,

Bounding arguments S = performance of a prediction from reference class C S′ = performance of the prediction in question, from C′ Let performance be positive with smaller values better. Infer probabilities Pr(S > s) from a sample from class C. If C′ is harder than C then Pr(S′ > s) > Pr(S > s) for all s. If C′ is easier than C then Pr(S′ > s) s) for all s.

Future developments Bounding arguments may help us to form fully probabilistic judgments about performance. Let s = (s 1,..., s n ) be a sample from S ~ F(∙|p). Let S′ ~ F(∙|cp) with priors p ~ g(∙) and c ~ h(∙). Then Pr(S′ ≤ s|s) = ∫∫F(s|cp)h(c)g(p|s)dcdp. Bounding arguments refer to prior beliefs about S′ directly rather than indirectly through beliefs about c.

Evaluating climate predictions 1. Large trends over the verification period can inflate spuriously the value of some verification measures, e.g. correlation. Scores, which measure the performance of each forecast separately before averaging, are immune to spurious skill. Correlation: 0.06 and 0.84

Evaluating climate predictions 2. Long-range predictions of short-lived quantities (e.g. daily temperatures) can be well calibrated, and may exhibit resolution. Evaluate predictions for relevant quantities, not only multi-year means.

Evaluating climate predictions 3. Scores should favour ensembles whose members behave as if they and the observation are sampled from the same distribution. ‘Fair’ scores do this; traditional scores do not. n = 2 n = 4 n = 8 unfair score fair score Figure: The unfair continuous ranked probability score is optimized by under-dispersed ensembles of size n.

Summary Use existing data explicitly to justify quantitative predictions of the performance of climate predictions. Be aware that some measures of performance may be inflated spuriously by climate trends. Consider climate predictions of more decision-relevant quantities, not only multi-year means. Use fair scores to evaluate ensemble forecasts.

Fair scores for ensemble forecasts Let s(p,y) be a scoring rule for a probability forecast, p, and observation, y. The rule is proper if its expectation, E y [s(p,y)], is optimized when y ~ p. No forecasts score better, on average, than the observation’s distribution. Let s(x,y) be a scoring rule for an ensemble forecast, x, sampled randomly from p. The rule is fair if E x,y [s(x,y)] is optimized when y ~ p. No ensembles score better, on average, than those from the observation’s distribution. Fricker et al. (2013), Ferro (2013)

Fair scores: binary characterization Let y = 1 if an event occurs, and let y = 0 otherwise. Let s i,y be the (finite) score when i of n ensemble members forecast the event and the observation is y. The (negatively oriented) score is fair if (n – i)(s i+1,0 – s i,0 ) = i(s i-1,1 – s i,1 ) for i = 0, 1,..., n and s i+1,0 ≥ s i,0 for i = 0, 1,..., n – 1. Ferro (2013)

Fair scores: example The (unfair) ensemble version of the continuous ranked probability score is where p n (t) is the proportion of the n ensemble members (x 1,..., x n ) no larger than t, and where I(A) = 1 if A is true and I(A) = 0 otherwise. A fair version is

Fair scores: example Unfair (dashed) and fair (solid) expected scores against σ when y ~ N(0,1) and x i ~ N(0,σ 2 ) for i = 1,..., n. n = 2 n = 4 n = 8

Predicting performance We might try to predict performance by forming our own prediction of the predictand. If we incorporate information about the prediction in question then we must already have judged its credibility; if not then we ignore relevant information. Consider predicting a coin toss. Our own prediction is Pr(head) = 0.5. Then our prediction of the performance of another prediction is bound to be Pr(correct) = 0.5 regardless of other information about that prediction.