Verification of ensembles Courtesy of Barbara Brown Acknowledgments: Tom Hamill, Laurence Wilson, Tressa Fowler Copyright UCAR 2012, all rights reserved.

Slides:



Advertisements
Similar presentations
ECMWF Slide 1Met Op training course – Reading, March 2004 Forecast verification: probabilistic aspects Anna Ghelli, ECMWF.
Advertisements

What is a good ensemble forecast? Chris Ferro University of Exeter, UK With thanks to Tom Fricker, Keith Mitchell, Stefan Siegert, David Stephenson, Robin.
What is a good ensemble forecast? Chris Ferro University of Exeter, UK With thanks to Tom Fricker, Keith Mitchell, Stefan Siegert, David Stephenson, Robin.
1 Verification Continued… Holly C. Hartmann Department of Hydrology and Water Resources University of Arizona RFC Verification Workshop,
14 May 2001QPF Verification Workshop Verification of Probability Forecasts at Points WMO QPF Verification Workshop Prague, Czech Republic May 2001.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Verification of probability and ensemble forecasts
Details for Today: DATE:3 rd February 2005 BY:Mark Cresswell FOLLOWED BY:Assignment 2 briefing Evaluation of Model Performance 69EG3137 – Impacts & Models.
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Improving COSMO-LEPS forecasts of extreme events with.
Creating probability forecasts of binary events from ensemble predictions and prior information - A comparison of methods Cristina Primo Institute Pierre.
Daria Kluver Independent Study From Statistical Methods in the Atmospheric Sciences By Daniel Wilks.
Forecasting Uncertainty Related to Ramps of Wind Power Production
Introduction to Probability and Probabilistic Forecasting L i n k i n g S c i e n c e t o S o c i e t y Simon Mason International Research Institute for.
Probabilistic forecasts of precipitation in terms of quantiles John Bjørnar Bremnes.
Verification and evaluation of a national probabilistic prediction system Barbara Brown NCAR 23 September 2009.
Copyright 2004 David J. Lilja1 What Do All of These Means Mean? Indices of central tendency Sample mean Median Mode Other means Arithmetic Harmonic Geometric.
Introduction to Numerical Weather Prediction and Ensemble Weather Forecasting Tom Hamill NOAA-CIRES Climate Diagnostics Center Boulder, Colorado USA.
Barbara Casati June 2009 FMI Verification of continuous predictands
Ensemble Post-Processing and it’s Potential Benefits for the Operational Forecaster Michael Erickson and Brian A. Colle School of Marine and Atmospheric.
Methods and Philosophy of Statistical Process Control
Probability and Statistics in Engineering Philip Bedient, Ph.D.
1 A MONTE CARLO EXPERIMENT In the previous slideshow, we saw that the error term is responsible for the variations of b 2 around its fixed component 
Evaluation of Potential Performance Measures for the Advanced Hydrologic Prediction Service Gary A. Wick NOAA Environmental Technology Laboratory On Rotational.
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
Performance of the MOGREPS Regional Ensemble
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Handling Data and Figures of Merit Data comes in different formats time Histograms Lists But…. Can contain the same information about quality What is meant.
ECMWF WWRP/WMO Workshop on QPF Verification - Prague, May 2001 NWP precipitation forecasts: Validation and Value Deterministic Forecasts Probabilities.
Common verification methods for ensemble forecasts, and how to apply them properly Tom Hamill NOAA Earth System Research Lab, Physical Sciences Division,
Rank Histograms – measuring the reliability of an ensemble forecast You cannot verify an ensemble forecast with a single.
4IWVM - Tutorial Session - June 2009 Verification of categorical predictands Anna Ghelli ECMWF.
Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts Ian Jolliffe and David Stephenson 1EMS September Sampling uncertainty.
Tutorial. Other post-processing approaches … 1) Bayesian Model Averaging (BMA) – Raftery et al (1997) 2) Analogue approaches – Hopson and Webster, J.
Exploring sample size issues for 6-10 day forecasts using ECMWF’s reforecast data set Model: 2005 version of ECMWF model; T255 resolution. Initial Conditions:
Verification methods - towards a user oriented verification WG5.
Chapter 21 Basic Statistics.
Measuring forecast skill: is it real skill or is it the varying climatology? Tom Hamill NOAA Earth System Research Lab, Boulder, Colorado
1 An overview of the use of reforecasts for improving probabilistic weather forecasts Tom Hamill NOAA / ESRL, Physical Sciences Div.
2nd SRNWP Workshop on “Short-range ensembles” – Bologna, 7-8 April Verification of ensemble systems Chiara Marsigli ARPA-SIM.
Probabilistic Forecasting. pdfs and Histograms Probability density functions (pdfs) are unobservable. They can only be estimated. They tell us the density,
. Outline  Evaluation of different model-error schemes in the WRF mesoscale ensemble: stochastic, multi-physics and combinations thereof  Where is.
CY1B2 Statistics1 (ii) Poisson distribution The Poisson distribution resembles the binomial distribution if the probability of an accident is very small.
The Centre for Australian Weather and Climate Research A partnership between CSIRO and the Bureau of Meteorology Verification and Metrics (CAWCR)
Basic Verification Concepts
© 2009 UCAR. All rights reserved. ATEC-4DWX IPR, 21−22 April 2009 National Security Applications Program Research Applications Laboratory Ensemble-4DWX.
For starters - pick up the file pebmass.PDW from the H:Drive. Put it on your G:/Drive and open this sheet in PsiPlot.
EUMETCAL NWP-course 2007: The Concept of Ensemble Forecasting Renate Hagedorn European Centre for Medium-Range Weather Forecasts The General Concept of.
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss A more reliable COSMO-LEPS F. Fundel, A. Walser, M. A.
Verification of ensemble systems Chiara Marsigli ARPA-SIMC.
Furthermore… References Katz, R.W. and A.H. Murphy (eds), 1997: Economic Value of Weather and Climate Forecasts. Cambridge University Press, Cambridge.
Two extra components in the Brier Score Decomposition David B. Stephenson, Caio A. S. Coelho (now at CPTEC), Ian.T. Jolliffe University of Reading, U.K.
Common verification methods for ensemble forecasts
Verification methods - towards a user oriented verification The verification group.
WARM UP: Penny Sampling 1.) Take a look at the graphs that you made yesterday. What are some intuitive takeaways just from looking at the graphs?
Sampling Distributions and Estimation
Addressing the environmental impact of salt use on roads
Verifying and interpreting ensemble products
Precipitation Products Statistical Techniques
Ensemble Forecasting and its Verification
forecasts of rare events
Probabilistic forecasts
Christoph Gebhardt, Zied Ben Bouallègue, Michael Buchhold
Verification of Tropical Cyclone Forecasts
Measuring the performance of climate predictions
(-4)*(-7)= Agenda Bell Ringer Bell Ringer
Verification of SPE Probability Forecasts at SEPC
Verification of probabilistic forecasts: comparing proper scoring rules Thordis L. Thorarinsdottir and Nina Schuhen
the performance of weather forecasts
What is a good ensemble forecast?
Short Range Ensemble Prediction System Verification over Greece
Presentation transcript:

Verification of ensembles Courtesy of Barbara Brown Acknowledgments: Tom Hamill, Laurence Wilson, Tressa Fowler Copyright UCAR 2012, all rights reserved.

How good is this ensemble forecast? Copyright UCAR 2012, all rights reserved.

Questions to ask before beginning? How were the ensembles constructed? – Poor man’s ensemble (distinct members) – Multi-physics (distinct members) – Random perturbation of initial conditions (anonymous members) How are your forecasts used? – Improved point forecast(ensemble mean) – Probability of an event – Full distribution Copyright UCAR 2012, all rights reserved.

Approaches to evaluating ensemble forecasts As individual members – Use methods for continuous or categorical forecasts As probability forecasts – Create probabilities by applying thresholds or statistical post-processing As a full distribution – Use individual members or fit a distributions through post-processing Copyright UCAR 2012, all rights reserved.

Evaluate each member as a separate, deterministic forecast Why? Because it is easy and important If members are unique, it might provide useful diagnostics. If members are biased, verification statistics might be skewed. If members have different levels of bias, should you calibrate? – Do these results conform to your understanding of how the ensemble members were created? Copyright UCAR 2012, all rights reserved.

Verifying a probabilistic forecast You cannot verify a probabilistic forecast with a single observation. The more data you have for verification, (as with other statistics) the more certain you are. Rare events (low probability) require more data to verify. These comments refer to probabilistic forecasts developed by methods other than ensembles as well. Copyright UCAR 2012, all rights reserved.

Properties of a perfect probabilistic forecast of a binary event. Sharpness forecast frequency observed non-events observed events Resolution Reliability Copyright UCAR 2012, all rights reserved.

Our friend, the scatterplot Copyright UCAR 2012, all rights reserved.

Introducing the attribute diagram! ( close relative to the reliability diagram) Analogous to the scatter plot- same intuition holds. Data must be binned! Hides how much data is represented by each Expresses conditional probabilities. Confidence intervals can illustrate the problems with small sample sizes. Copyright UCAR 2012, all rights reserved.

Attribute Diagram

Reliability Diagram Exercise

Discrimination  Discrimination: The ability of the forecast system to clearly distinguish situations leading to the occurrence of an event of interest from those leading to the non-occurrence of the event.  Depends on:  Separation of means of conditional distributions  Variance within conditional distributions forecast frequency observed non-events observed events forecast frequency observed non-events observed events forecast frequency observed non-events observed events (a)(b)(c) Good discriminationPoor discriminationGood discrimination Copyright UCAR 2012, all rights reserved.

Brier score Appropriate for probabilities of a binary event occurring. Analogous to mean squared error. Can be decomposed to resolution, reliability and uncertainty components. Geometrically relates to attribute diagram. Fair comparisons require common sample climatology. Copyright UCAR 2012, all rights reserved.

The Brier Score Mean square error of a probability forecast Weights larger errors more than smaller ones Copyright UCAR 2012, all rights reserved.

Components of probability error The Brier score can be decomposed into 3 terms (for K probability classes and a sample of size N): reliability resolution uncertainty If for all occasions when forecast probability p k is predicted, the observed frequency of the event is = p k then the forecast is said to be reliable. Similar to bias for a continuous variable The ability of the forecast to distinguish situations with distinctly different frequencies of occurrence. The variability of the observations. Maximized when the climatological frequency (base rate) =0.5 Has nothing to do with forecast quality! Use the Brier skill score to overcome this problem. The presence of the uncertainty term means that Brier Scores should not be compared on different samples. Copyright UCAR 2012, all rights reserved.

Sharpness also important “Sharpness” measures the specificity of the probabilistic forecast. Given two reliable forecast systems, the one producing the sharper forecasts is preferable. But: don’t want sharp if not reliable. Implies unrealistic confidence. Copyright UCAR 2012, all rights reserved.

Sharpness ≠ resolution Sharpness is a property of the forecasts alone; a measure of sharpness in Brier score decomposition would be how populated the extreme N i ’s are. Copyright UCAR 2012, all rights reserved.

Forecasts of a full distribution How is it expressed? – Discretely by providing forecasts from all ensemble members – A parametric distribution – normal (ensemble mean, spread) – Smoothed function – kernel smoother Copyright UCAR 2012, all rights reserved.

Assuming the forecast is reliable (calibrated) By default, we assume all ensemble forecasts have the same number of members. Comparing forecasts with different number of members is an advanced topic. For a perfect ensemble, the observation comes from the same distribution as the ensemble. Huh? Copyright UCAR 2012, all rights reserved.

Rank 1 of 21Rank 14 of 21 Rank 5 of 21Rank 3 of 21 Copyright UCAR 2012, all rights reserved.

Rank Histograms Copyright UCAR 2012, all rights reserved.

Verifying a continuous expression of a distribution (i.e. normal, Poisson, beta) Probability of any observation occurring is on [0,1] interval. Probability Integral Transformed (PIT) - fancy word for how likely is a given forecast Still create a rank histogram using bins of probability of observed events. Copyright UCAR 2012, all rights reserved.

Verifying a distribution forecast Copyright UCAR 2012, all rights reserved.

Warnings about rank histograms Assume all samples come from the same climatology! A flat rank histogram can be derived by combining forecasts with offsetting biases See Hamill, T. M., and J. Juras, 2006: Measuring forecast skill: is it real skill or is it the varying climatology? Quart. J. Royal Meteor. Soc., Jan 2007 issue Techniques exist for evaluating “flatness”, but they mostly require much data. Copyright UCAR 2012, all rights reserved.

Continuous and discrete rank probability scores. Introduce pdf -> cdf High, low, no variance (event) Area of wide, narrow Perfect forecast with bias … Aggregate Relates to Brier score – for a forecast of a binary event, the RPS score is equivalent to the Brier score. Copyright UCAR 2012, all rights reserved.

Rank Probability Scores Copyright UCAR 2012, all rights reserved.

A good RPS score minimizes area Copyright UCAR 2012, all rights reserved.

Final comments Know how and why your ensemble is being created. Use a combination of graphics and scores. Areas of more research – Verification of spatial forecasts – Additional intuitive measures of performance for probability and ensemble forecasts. Copyright UCAR 2012, all rights reserved.

Useful references Good overall references for forecast verification: – (1) Wilks, D.S., 2006: Statistical Methods in the Atmospheric Sciences (2nd Ed). Academic Press, 627 pp. – (2) WMO Verification working group forecast verification web page, – (3) Jolliffe and Stephenson Book: Jolliffe, I.T., and D.B. Stephenson, 2012: Forecast Verification. A Practitioner's Guide in Atmospheric Science., 2 nd Edition, Wiley and Sons Ltd. Verification tutorial – Eumetcal ( Rank histograms: Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts. Mon. Wea. Rev., 129, Spread-skill relationships: Whitaker, J.S., and A. F. Loughe, 1998: The relationship between ensemble spread and ensemble mean skill. Mon. Wea. Rev., 126, Brier score, continuous ranked probability score, reliability diagrams: Wilks text again. Relative operating characteristic: Harvey, L. O., Jr, and others, 1992: The application of signal detection theory to weather forecasting behavior. Mon. Wea. Rev., 120, Economic value diagrams: – (1)Richardson, D. S., 2000: Skill and relative economic value of the ECMWF ensemble prediction system. Quart. J. Royal Meteor. Soc., 126, – (2) Zhu, Y, and others, 2002: The economic value of ensemble-based weather forecasts. Bull. Amer. Meteor. Soc., 83, Overestimating skill: Hamill, T. M., and J. Juras, 2006: Measuring forecast skill: is it real skill or is it the varying climatology? Quart. J. Royal Meteor. Soc., Jan 2007 issue. Copyright UCAR 2012, all rights reserved.

Reliability Diagram Exercise Probabilities underforecast Essentially no skill Perfect forecast Tends toward mean but has skill Small samples In some bins Reliable fore- cast of rare event No resolution Over- resolved forecast Typical categorical forecast