Measuring the performance of climate predictions

Slides:



Advertisements
Similar presentations
Multi-model ensemble post-processing and the replicate Earth paradigm (Manuscript available on-line in Climate Dynamics) Craig H. Bishop Naval Research.
Advertisements

Chapter 3 Properties of Random Variables
Measuring the performance of climate predictions Chris Ferro, Tom Fricker, David Stephenson Mathematics Research Institute University of Exeter, UK IMA.
What is a good ensemble forecast? Chris Ferro University of Exeter, UK With thanks to Tom Fricker, Keith Mitchell, Stefan Siegert, David Stephenson, Robin.
What is a good ensemble forecast? Chris Ferro University of Exeter, UK With thanks to Tom Fricker, Keith Mitchell, Stefan Siegert, David Stephenson, Robin.
Fair scores for ensemble forecasts Chris Ferro University of Exeter 13th EMS Annual Meeting and 11th ECAM (10 September 2013, Reading, UK)
LRF Training, Belgrade 13 th - 16 th November 2013 © ECMWF Sources of predictability and error in ECMWF long range forecasts Tim Stockdale European Centre.
6th WMO tutorial Verification Martin GöberContinuous 1 Good afternoon! नमस्कार नमस्कार Guten Tag! Buenos dias! до́брый день! до́брыйдень Qwertzuiop asdfghjkl!
A Metrics Framework for Interannual-to-Decadal Predictions Experiments L. Goddard, on behalf of the US CLIVAR Decadal Predictability Working Group & Collaborators:
Februar 2003 Workshop Kopenhagen1 Assessing the uncertainties in regional climate predictions of the 20 th and 21 th century Andreas Hense Meteorologisches.
© Crown copyright Met Office Decadal Climate Prediction Doug Smith, Nick Dunstone, Rosie Eade, Leon Hermanson, Adam Scaife.
L.M. McMillin NOAA/NESDIS/ORA Regression Retrieval Overview Larry McMillin Climate Research and Applications Division National Environmental Satellite,
Details for Today: DATE:3 rd February 2005 BY:Mark Cresswell FOLLOWED BY:Assignment 2 briefing Evaluation of Model Performance 69EG3137 – Impacts & Models.
1 Seasonal Forecasts and Predictability Masato Sugi Climate Prediction Division/JMA.
Creating probability forecasts of binary events from ensemble predictions and prior information - A comparison of methods Cristina Primo Institute Pierre.
Jon Robson (Uni. Reading) Rowan Sutton (Uni. Reading) and Doug Smith (UK Met Office) Analysis of a decadal prediction system:
On judging the credibility of climate predictions Chris Ferro (University of Exeter) Tom Fricker, Fredi Otto, Emma Suckling 12th International Meeting.
STAT 497 APPLIED TIME SERIES ANALYSIS
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Statistical Methods for long-range forecast By Syunji Takahashi Climate Prediction Division JMA.
Naive Extrapolation1. In this part of the course, we want to begin to explicitly model changes that depend not only on changes in a sample or sampling.
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Caio A. S. Coelho Supervisors: D. B. Stephenson, F. J. Doblas-Reyes (*) Thanks to CAG, S. Pezzulli and M. Balmaseda.
Evaluating decadal hindcasts: why and how? Chris Ferro (University of Exeter) T. Fricker, F. Otto, D. Stephenson, E. Suckling CliMathNet Conference (3.
Climate Variability & Change - Past & Future Decades Brian Hoskins Director, Grantham Institute for Climate Change, Imperial College London Professor of.
Barcelona, 2015 Ocean prediction activites at BSC-IC3 Virginie Guemas and the Climate Forecasting Unit 9 February 2015.
Verification of ensembles Courtesy of Barbara Brown Acknowledgments: Tom Hamill, Laurence Wilson, Tressa Fowler Copyright UCAR 2012, all rights reserved.
Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts Ian Jolliffe and David Stephenson 1EMS September Sampling uncertainty.
Instrumental Variables: Problems Methods of Economic Investigation Lecture 16.
Model dependence and an idea for post- processing multi-model ensembles Craig H. Bishop Naval Research Laboratory, Monterey, CA, USA Gab Abramowitz Climate.
Future Climate Projections. Lewis Richardson ( ) In the 1920s, he proposed solving the weather prediction equations using numerical methods. Worked.
Toward Probabilistic Seasonal Prediction Nir Krakauer, Hannah Aizenman, Michael Grossberg, Irina Gladkova Department of Civil Engineering and CUNY Remote.
Research Needs for Decadal to Centennial Climate Prediction: From observations to modelling Julia Slingo, Met Office, Exeter, UK & V. Ramaswamy. GFDL,
Ben Kirtman University of Miami-RSMAS Disentangling the Link Between Weather and Climate.
. Outline  Evaluation of different model-error schemes in the WRF mesoscale ensemble: stochastic, multi-physics and combinations thereof  Where is.
BioSS reading group Adam Butler, 21 June 2006 Allen & Stott (2003) Estimating signal amplitudes in optimal fingerprinting, part I: theory. Climate dynamics,
MULTIVARIATE REGRESSION Multivariate Regression; Selection Rules LECTURE 6 Supplementary Readings: Wilks, chapters 6; Bevington, P.R., Robinson, D.K.,
The inapplicability of traditional statistical methods for analysing climate ensembles Dave Stainforth International Meeting of Statistical Climatology.
Based on data to 2000, 20 years of additional data could halve uncertainty in future warming © Crown copyright Met Office Stott and Kettleborough, 2002.
1 Arun Kumar Climate Prediction Center 20 July 2011 US CLIVAR Decadal Predictability Working Group (DPWG) Report US CLIVAR Summit 2011 Co-Chairs: Amy Solomon.
Analysis of Experimental Data; Introduction
Furthermore… References Katz, R.W. and A.H. Murphy (eds), 1997: Economic Value of Weather and Climate Forecasts. Cambridge University Press, Cambridge.
Two extra components in the Brier Score Decomposition David B. Stephenson, Caio A. S. Coelho (now at CPTEC), Ian.T. Jolliffe University of Reading, U.K.
Diagnostic verification and extremes: 1 st Breakout Discussed the need for toolkit to build beyond current capabilities (e.g., NCEP) Identified (and began.
Eidgenössisches Departement des Innern EDI Bundesamt für Meteorologie und Klimatologie MeteoSchweiz Assessing the skill of decadal predictions Reidun Gangstø,
Verification methods - towards a user oriented verification The verification group.
Predicting the performance of climate predictions Chris Ferro (University of Exeter) Tom Fricker, Fredi Otto, Emma Suckling 13th EMS Annual Meeting and.
1/39 Seasonal Prediction of Asian Monsoon: Predictability Issues and Limitations Arun Kumar Climate Prediction Center
National Oceanic and Atmospheric Administration’s National Weather Service Colorado Basin River Forecast Center Salt Lake City, Utah 11 The Hydrologic.
Climate Change Spring 2016 Kyle Imhoff. Let’s start with the big picture (climate forcings)…
Statistical Forecasting
Operations Management Contemporary Concepts and Cases
SUR-2250 Error Theory.
Challenges of Seasonal Forecasting: El Niño, La Niña, and La Nada
Forecasting Chapter 11.
Verifying and interpreting ensemble products
Question 1 Given that the globe is warming, why does the DJF outlook favor below-average temperatures in the southeastern U. S.? Climate variability on.
Stat 112 Notes 4 Today: Review of p-values for one-sided tests
The Carbon Cycle.
forecasts of rare events
WGCM/WGSIP decadal prediction proposal
Emerging signals at various spatial scales
Predictability assessment of climate predictions within the context
Linking operational activities and research
Uncertainty and Error
Simple Linear Regression
Raw plume forecast data
Proposed WCRP Grand Challenge on Near Term Climate Prediction
the performance of weather forecasts
What is a good ensemble forecast?
Presentation transcript:

Measuring the performance of climate predictions Chris Ferro, Tom Fricker, David Stephenson Mathematics Research Institute University of Exeter, UK 25 mins + 5 mins questions IMA Conference on the Mathematics of the Climate System Reading, 14 September 2011

How good are climate predictions? Predictions are useless without some information about their quality. Focus on information contained in hindcasts, i.e. retrospective forecasts of past events. 1. How should we measure the performance of climate predictions? 2. What does past performance tell us about future performance?

Hindcasts Thanks: Doug Smith (Met Office Hadley Centre) initialization, start times, lead times, ensembles Thanks: Doug Smith (Met Office Hadley Centre)

Challenges Sample sizes are small, e.g. CMIP5 core hindcast experiments give 10 predictions for each lead time. Some external forcings (e.g. greenhouse gases and volcanoes) are prescribed, not predicted. The quality of measurements of predictands varies over time and space. Observations from the hindcast period are used (to some extent) to construct the prediction system. forcing: performance conditional on quality of prescribed forcing

Common practice Choice of predictand: Evaluate predictions only after removing biases Evaluate predictions of only long-term averages Choice of performance measure: Evaluate only the ensemble mean predictions Evaluate using correlation or mean square error Resample to estimate the sampling uncertainty

Common practice Choice of predictand: Evaluate predictions only after removing biases Evaluate predictions of only long-term averages Choice of performance measure: Evaluate only the ensemble mean predictions Evaluate using correlation or mean square error Resample to estimate the sampling uncertainty

Conventional reasoning We can’t predict weather at long lead times. So, don’t compare predicted and observed weather. Instead, compare predicted and observed climate, e.g. multi-year averages. Reduces noise and increases evaluation precision.

Evaluate weather, not climate! The foregoing argument is wrong for two reasons. We should evaluate predictands relevant to users. Evaluating climate averages reduces signal-to-noise ratios and so decreases evaluation precision. Better to evaluate predictions as weather forecasts and then average over time to improve precision. Interpretation of averaging after evaluating might be compromised if errors are nonstationary. Does signal-to-noise result still hold for nonstationary errors?

Evaluate weather, not climate! Di prediction error for lead time i = 1, ..., n D error after averaging over the n lead times S1 mean of the square of the errors D1, ..., Dn Sn square of the mean error D Under moderate conditions, the signal-to-noise ratio E(Sn)2 / var(Sn) of Sn becomes increasingly small relative to the signal-to-noise ratio of S1 as the averaging length, n, increases. If the Di are independent and identically distributed with first and third moments equal to zero

Common practice Choice of predictand: Evaluate predictions only after removing biases Evaluate predictions of only long-term averages Choice of performance measure: Evaluate only the ensemble mean predictions Evaluate using correlation or mean square error Resample to estimate the sampling uncertainty

Skill inflation Predictions initialized along trending observations. Trend: variations in the observations over the evaluation period are large relative to the variations over the lead time. Predictions initialized along trending observations.

Skill inflation Strong association even if predictions fail to follow observations over the lead time. Performance measures can mislead and mask differences between prediction systems.

Avoiding skill inflation Observations Xt and predictions Pt sampled over time t from a joint distribution function F. Real-valued performance measure, s(F). Suppose that the joint distribution, Ft, of (Xt, Pt) changes with t so that F is a mixture distribution. No skill inflation if s satisfies the following property: s(Ft) = s0 for all t implies s(F) = s0 for all mixtures F. i.e. level sets of s are closed under convex combination

Avoiding skill inflation All convex properties of real-valued scoring rules, σ(X,P), are immune to skill inflation. These include s(F) = expected value of σ(X,P), e.g. mean square error, and s(F) = any quantile of σ(X,P), e.g. median absolute deviation. Also monotonic functions of these, e.g. RMSE.

Summary Measuring performance can help to improve predictions and to guide responses to predictions. Evaluating climate predictions is hard because of small sample sizes, unpredicted forcings etc. Evaluate as weather forecasts then average! Use performance measures such as scoring rules that are immune to skill inflation from trends!

Related questions How does performance vary with the timescale of the predictand and of variations in the predictand? What can we learn by evaluating across a range of lead times and evaluation periods? What does past performance tell us about future performance? How should hindcast experiments be designed to yield as much information as possible?

The EQUIP project: www.equip.leeds.ac.uk References Ferro CAT, Fricker TE (2011) An unbiased decomposition of the Brier score. Submitted. Fricker TE, Ferro CAT (2011) A framework for evaluating climate predictions. In preparation. Goddard L and co-authors (2011) A verification framework for interannual-to-decadal prediction experiments. In preparation. Jolliffe IT, Stephenson DB (2011) Forecast Verification: A Practitioner’s Guide in Atmospheric Science. 2nd edition. Wiley. In press. Smith DM and co-authors (2007) Improved surface temperature prediction for the coming decade from a global climate model. Science, 317, 796—799. The EQUIP project: www.equip.leeds.ac.uk c.a.t.ferro@ex.ac.uk