1 Verification of nowcasts and very short range forecasts Beth Ebert BMRC, Australia WWRP Int'l Symposium on Nowcasting and Very Short Range Forecasting,

Slides:

Advertisements

Similar presentations

Report of the Q2 Short Range QPF Discussion Group Jon Ahlquist Curtis Marshall John McGinley - lead Dan Petersen D. J. Seo Jean Vieux.

Advertisements

Quantification of Spatially Distributed Errors of Precipitation Rates and Types from the TRMM Precipitation Radar 2A25 (the latest successive V6 and V7)

6th WMO tutorial Verification Martin GöberContinuous 1 Good afternoon! नमस्कार नमस्कार Guten Tag! Buenos dias! до́брый день! до́брыйдень Qwertzuiop asdfghjkl!

14 May 2001QPF Verification Workshop Verification of Probability Forecasts at Points WMO QPF Verification Workshop Prague, Czech Republic May 2001.

Validation of Satellite Precipitation Estimates for Weather and Hydrological Applications Beth Ebert BMRC, Melbourne, Australia 3 rd IPWG Workshop / 3.

1 Verification Introduction Holly C. Hartmann Department of Hydrology and Water Resources University of Arizona RFC Verification Workshop,

Verification of probability and ensemble forecasts

Daria Kluver Independent Study From Statistical Methods in the Atmospheric Sciences By Daniel Wilks.

Verification and evaluation of a national probabilistic prediction system Barbara Brown NCAR 23 September 2009.

Assessment of Tropical Rainfall Potential (TRaP) forecasts during the Australian tropical cyclone season Beth Ebert BMRC, Melbourne, Australia.

Monitoring the Quality of Operational and Semi-Operational Satellite Precipitation Estimates – The IPWG Validation / Intercomparison Study Beth Ebert Bureau.

Validation of the Ensemble Tropical Rainfall Potential (eTRaP) for Landfalling Tropical Cyclones Elizabeth E. Ebert Centre for Australian Weather and Climate.

Verification Methods for High Resolution Model Forecasts Barbara Brown NCAR, Boulder, Colorado Collaborators: Randy Bullock, John Halley.

NWP Verification with Shape- matching Algorithms: Hydrologic Applications and Extension to Ensembles Barbara Brown 1, Edward Tollerud 2, Tara Jensen 1,

Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Quantitative precipitation forecasts in the Alps – first.

Application of Forecast Verification Science to Operational River Forecasting in the National Weather Service Julie Demargne, James Brown, Yuqiong Liu.

Exploring the Use of Object- Oriented Verification at the Hydrometeorological Prediction Center Faye E. Barthold 1,2, Keith F. Brill 1, and David R. Novak.

Ensemble Post-Processing and it’s Potential Benefits for the Operational Forecaster Michael Erickson and Brian A. Colle School of Marine and Atmospheric.

Chapter 7 Correlational Research Gay, Mills, and Airasian

Verification has been undertaken for the 3 month Summer period (30/05/12 – 06/09/12) using forecasts and observations at all 205 UK civil and defence aerodromes.

Richard (Rick)Jones Regional Training Workshop on Severe Weather Forecasting Macau, April 8 -13, 2013.

1 On the use of radar data to verify mesoscale model precipitation forecasts Martin Goeber and Sean Milton Model Diagnostics and Validation group Numerical.

Verifying Satellite Precipitation Estimates for Weather and Hydrological Applications Beth Ebert Bureau of Meteorology Research Centre Melbourne, Australia.

4th Int'l Verification Methods Workshop, Helsinki, 4-6 June Methods for verifying spatial forecasts Beth Ebert Centre for Australian Weather and.

4IWVM - Tutorial Session - June 2009 Verification of categorical predictands Anna Ghelli ECMWF.

Verification methods - towards a user oriented verification WG5.

Measuring forecast skill: is it real skill or is it the varying climatology? Tom Hamill NOAA Earth System Research Lab, Boulder, Colorado

Heidke Skill Score (for deterministic categorical forecasts) Heidke score = Example: Suppose for OND 1997, rainfall forecasts are made for 15 stations.

Ebert-McBride Technique (Contiguous Rain Areas) Ebert and McBride (2000: Verification of precipitation in weather systems: determination of systematic.

Latest results in verification over Poland Katarzyna Starosta, Joanna Linkowska Institute of Meteorology and Water Management, Warsaw 9th COSMO General.

Experiments in 1-6 h Forecasting of Convective Storms Using Radar Extrapolation and Numerical Weather Prediction Acknowledgements Mei Xu - MM5 Morris Weisman.

Refinement and Evaluation of Automated High-Resolution Ensemble-Based Hazard Detection Guidance Tools for Transition to NWS Operations Kick off JNTP project.

The Centre for Australian Weather and Climate Research A partnership between CSIRO and the Bureau of Meteorology Verification and Metrics (CAWCR)

Standard Verification Strategies Proposal from NWS Verification Team NWS Verification Team Draft03/23/2009 These slides include notes, which can be expanded.

Feature-based (object-based) Verification Nathan M. Hitchens National Severe Storms Laboratory.

Verification of Precipitation Areas Beth Ebert Bureau of Meteorology Research Centre Melbourne, Australia

Object-oriented verification of WRF forecasts from 2005 SPC/NSSL Spring Program Mike Baldwin Purdue University.

Typhoon Forecasting and QPF Technique Development in CWB Kuo-Chen Lu Central Weather Bureau.

U. Damrath, COSMO GM, Athens 2007 Verification of numerical QPF in DWD using radar data - and some traditional verification results for surface weather.

TOULOUSE (FRANCE), 5-9 September 2005 OBJECTIVE VERIFICATION OF A RADAR-BASED OPERATIONAL TOOL FOR IDENTIFICATION OF HAILSTORMS I. San Ambrosio, F. Elizaga.

Verification of ensemble systems Chiara Marsigli ARPA-SIMC.

1 Validation for CRR (PGE05) NWC SAF PAR Workshop October 2005 Madrid, Spain A. Rodríguez.

Nathalie Voisin 1, Florian Pappenberger 2, Dennis Lettenmaier 1, Roberto Buizza 2, and John Schaake 3 1 University of Washington 2 ECMWF 3 National Weather.

Page 1© Crown copyright 2004 The use of an intensity-scale technique for assessing operational mesoscale precipitation forecasts Marion Mittermaier and.

Verification of ensemble precipitation forecasts using the TIGGE dataset Laurence J. Wilson Environment Canada Anna Ghelli ECMWF GIFS-TIGGE Meeting, Feb.

Gridded warning verification Harold E. Brooks NOAA/National Severe Storms Laboratory Norman, Oklahoma

Diagnostic verification and extremes: 1 st Breakout Discussed the need for toolkit to build beyond current capabilities (e.g., NCEP) Identified (and began.

WRF Verification Toolkit Workshop, Boulder, February 2007 Spatial verification of NWP model fields Beth Ebert BMRC, Australia.

NCAR, 15 April Fuzzy verification of fake cases Beth Ebert Center for Australian Weather and Climate Research Bureau of Meteorology.

Eidgenössisches Departement des Innern EDI Bundesamt für Meteorologie und Klimatologie MeteoSchweiz Weather type dependant fuzzy verification of precipitation.

Extracting probabilistic severe weather guidance from convection-allowing model forecasts Ryan Sobash 4 December 2009 Convection/NWP Seminar Series Ryan.

Nowcasting Convection Fusing 0-6 hour observation- and model-based probability forecasts WWRP Symposium on Nowcasting and Very Short Range Forecasting.

Verification methods - towards a user oriented verification The verification group.

Verification of C&V Forecasts Jennifer Mahoney and Barbara Brown 19 April 2001.

User-Focused Verification Barbara Brown* NCAR July 2006

Application of the CRA Method Application of the CRA Method William A. Gallus, Jr. Iowa State University Beth Ebert Center for Australian Weather and Climate.

UERRA user workshop, Toulouse, 3./4. Feb 2016Cristian Lussana and Michael Borsche 1 Evaluation software tools Cristian Lussana (2) and Michael Borsche.

Intensity-scale verification technique

Systematic timing errors in km-scale NWP precipitation forecasts

Spatial Verification Intercomparison Meeting, 20 February 2007, NCAR

Multi-scale validation of high resolution precipitation products

Verifying and interpreting ensemble products

General framework for features-based verification

Adaption of an entity based verification

Probabilistic forecasts

Quantitative verification of cloud fraction forecasts

Hydrologically Relevant Error Metrics for PEHRPP

Drivers Influencing Weather-related NAS Metrics

Verification of Tropical Cyclone Forecasts

Short Range Ensemble Prediction System Verification over Greece

Presentation transcript:

1 Verification of nowcasts and very short range forecasts Beth Ebert BMRC, Australia WWRP Int'l Symposium on Nowcasting and Very Short Range Forecasting, Toulouse, 5-9 Sept 2005

2 Why verify forecasts? To monitor performance over time  summary scores To evaluate and compare forecast  continuous and systems categorical scores To show impact of forecast  skill & value scores To understand error in order to  diagnostic methods improve forecast system The verification approach taken depends on the purpose of the verification

3 Verifying nowcasts and very short range forecasts Nowcast characteristicsImpact on verification concerned mainly with high impact weather rare events difficult to verify in systematic manner may detect severe weather elements storm spotter observations & damage surveys required observations-based same observations often used to verify nowcasts high temporal frequencymany nowcasts to verify high spatial resolution observation network usually not dense enough (except radar) small spatial domain relatively small number of standard observations

4 Observations – issues for nowcasts Thunderstorms and severe weather (mesocyclones, hail, lightning, damaging winds) Spotter observations may contain error Biased observations More observations during daytime & in populated areas More storm reports when warnings were in effect Cell mis-association by cell tracking algorithms Precipitation Radar rain rates contain error Scale mismatch between gauge observations and radar pixels Observation error can be large but is usually neglected  more research required on handling observation error

5 Matching forecasts and observations Matching approach depends on Nature of forecasts and observations Scale Consistency Sparseness Other matching criteria Verification goals Use of forecasts Matching approach can impact verification results Grid to grid approach Overlay forecast and observed grids Match each forecast and observation Forecast grid Observed grid point-to-gridgrid-to-point

1 – forecast and observed almost perfect overlap. 2 – majority of observed and forecast echoes overlap or offsets <50 km 3 – forecast and observed look similar but there are a number of echo offsets and several areas maybe missing or extra. 4 – the forecasts and observed are significantly different with very little overlap; but some features are suggestive of what actually occurred. 5 – there is no resemblance to forecast and observed. Forecast Quality Definitions Wilson subjective categories First rule of forecast verification – look at the results!

7 Systematic verification – many cases Aggregation and stratification Aggregation More samples  more robust statistics Across time - results for each point in space Space - results for each time Space and time - results summarized across spatial region and across time Stratification Homogeneous subsamples  better understanding of how errors depend on regime By location or region By time period (diurnal or seasonal variation)

8 Real-time nowcast verification Rapid feedback from latest radar scan Evaluate the latest objective guidance while it is still "fresh" Better understand strengths and weaknesses of nowcast system Tends to be subjective in nature Not commonly performed! Real time forecast verification system (RTFV) under development in BMRC

9 Post-event verification More observations may be available verification results more robust No single measure is adequate! several metrics needed distributions-oriented verification scatter plots (multi-category) contingency tables box-whisker plots Confidence intervals recommended, especially when comparing one set of results with another Bootstrap (resampling) method simple to apply Frequency bias POD FAR CSI

10 Accuracy – categorical verification Standard categorical verification scores PC = (H + CR) / Nproportion correct (accuracy) Bias = (F + H) / (M + H)frequency bias POD = H / (H + M)probability of detection POFD = F / (CR + F)probability of false detection FAR = F / (H + F)false alarm ratio CSI = H / (H + M + F)critical success index (threat score) ETS = (H – H random ) / (H + M + F – H random )equitable threat score HSS = (H + CR – PC random ) / (N – PC random )Heidke skill score HK = POD – POFDHanssen and Kuipers discriminant OR = (H * CR) / (F * M)odds ratio Estimated yes no yes H = hits M = misses no F = false CR = correct alarms rejections Observed forecast observations H F M CR

11 Standard continuous verification scores (scores computed over entire domain) bias = mean error MAE = mean absolute error RMSE = root mean square error r = correlation coefficient Accuracy – continuous verification Forecast F Observations O Domain

12 Standard probabilistic verification scores/methods Reliability diagram Brier score Brier skill score Ranked probability score Accuracy – probabilistic verification Relative operating characteristic (ROC)

13 A forecast has skill if it is more accurate than a reference forecast (usually persistence, cell extrapolation, or random chance). Skill scores measure the relative improvement of the forecast over the reference forecast: Forecast (min) Hanssen & Kuipers score ____ Nowcast _ _ _ Extrapolation Gauge persistence > 0 mm > 1 mm > 5 mm Skill Strategy 1: Plot the performance of the forecast system and the unskilled reference on the same diagram Forecast (min) Skill w.r.t. gauge persis. ____ Nowcast _ _ _ Extrapolation Strategy 2: Plot the value of the skill score

14 Practically perfect hindcast – upper bound on accuracy Approach: If the forecaster had all of the observations in advance, what would the "practically perfect" forecast look like? Apply a smoothing function to the observations to get probability contours, choose an appropriate yes/no threshold Did the actual forecast look like the practically perfect forecast? How did the performance of the actual forecast compare to the performance of the practically perfect forecast? SPC convective outlook CSI = 0.34Practically perfect hindcast CSI = 0.48 Kay and Brooks, 2000 Convective outlook was 75% of the way to being "practically perfect"

15 "Double penalty" Event predicted where it did not occur, no event predicted where it did occur Big problem for nowcasts and other high resolution forecasts Ex: Two rain forecasts giving the same volume High resolution forecast RMS ~ 4.7 POD=0, FAR=1, CSI=0 Low resolution forecast RMS ~ 2.7 POD~1, FAR~0.7, CSI~ fcst obs fcst obs

16 Value A forecast has value if it helps a user make a better decision Value scores measures the relative economic value of the forecast over some reference forecast: The most accurate forecast is not always the most valuable! Baldwin and Kain, 2004 fcst obs Expense depends on the cost of taking preventative action and the loss incurred for a missed event Small or rare events with high losses, value maximized by over-prediction fcst obs Events with high costs and displacement error likely, value maximized by under-prediction

17 Exact match vs. "close enough" Need we get a high resolution forecast exactly right? Often "close" is still useful to a forecaster YES High stakes situations (e.g. space shuttle launch, hurricane landfall) Hydrological applications (e.g. flash floods) Topographically influenced weather (valley winds, orographic rain, etc.) NO Guidance for forecasters Model validation (does it predict what we expect it to predict?) Observations may not allow standard verification of high resolution forecasts "Fuzzy" verification methods, diagnostic methods verify attributes of forecast Standard verification methods appropriate (POD, FAR, CSI, bias, RMSE, correlation, etc.)

18 "Fuzzy" verification methods Large forecast and observed variability at high resolution Fuzzy verification methods don't require an exact match between forecasts and observations to get a good score Vary the size of the space / time neighborhood around a point Damrath, 2004 Rezacova and Sokol, 2004 * Theis et al., 2005 Roberts, 2004 * Germann and Zawadski, 2004 Also vary magnitude, other elements Atger, 2001 Evaluate using categorical, continuous, probabilistic scores / methods * Giving a talk in this Symposium t t + 1 t - 1 Forecast value Frequency Sydney Forecasters don't (shouldn't!) take a high resolution forecast at face value – instead they interpret it in a probabilistic way.

19 Spatial multi-event contingency table Verify using the Relative Operating Characteristic (ROC) Measures how well the forecast can separate events from non-events based on some decision threshold Decision thresholds to vary: magnitude (ex: 1 mm h -1 to 20 mm h -1 ) distance from point of interest (ex: within 10 km,...., within 100 km) timing (ex: within 1 h,..., within 12 h) anything else that may be important in interpreting the forecast Can apply to ensembles, and to compare deterministic forecasts to ensemble forecasts ROC curve for varying rain threshold Atger, 2001 single threshold ROC curve for ensemble forecast, varying rain threshold

20 Object- and entity-based verification Consistent with human interpretation Provides diagnostic information on whole-system properties Location Amplitude Size Shape Techniques Contiguous Rain Area (CRA) verification (Ebert and McBride, 2000) NCAR object-oriented approach* (Brown et al., 2004) Cluster analysis (Marzban and Sandgathe, 2005) Composite method (Nachamkin, 2004) AfAf BfBf CfCf DfDf AoAo BoBo CoCo DoDo fcst obs MM5 8 clusters identified in x-y-p space NCAR

21 Contiguous Rain Area (CRA) verification Define entities using threshold (Contiguous Rain Areas) Horizontally translate the forecast until a pattern matching criterion is met: minimum total squared error maximum correlation maximum overlap The displacement is the vector difference between the original and final locations of the forecast. Compare properties of matched entities area mean intensity max intensity shape, etc. Ebert and McBride, 2000 Obs Fcst

22 Error decomposition methods Attempt to quantify the causes of the errors Some approaches: CRA verification (Ebert and McBride, 2000) MSE total = MSE displacement + MSE volume + MSE pattern Feature calibration and alignment (Nehrkorn et al., 2003) E(x,y) = E phase (x,y) + E local bias (x,y) + E residual (x,y) Acuity-fidelity approach (Marshall et al., 2004) minimize cost function: J = J distance + J timing + J intensity + J misses from both perspectives of forecast (fidelity) and observations (acuity) Error separation (Ciach and Krajewski, 1999) MSE forecast = MSE true + MSE reference

23 Scale separation methods Measure correspondence between forecast and observations at a variety of spatial scales Some approaches: MODEL =1 RADAR =2 RAIN GAUGES =3 SATELLITE =0 Multiscale statistical properties (Zepeda-Arce et al., 2000; Harris et al., 2001) Scale recursive estimation (Tustison et al., 2003) Intensity-scale approach* (Casati et al., 2004)

24 Summary Nowcasts and very short range forecasts present some unique challenges for verification High impact weather High resolution forecasts Imperfect observations There is still a place for standard scores Historical reasons When highly accurate forecasts are required Useful for monitoring improvement Must use several metrics Please quantify uncertainty, especially when intercomparing forecast schemes Compare with unskilled forecast such as persistence

25 Summary (cont'd) Evolving concept of what makes a "good" forecast Recognizing value of "close enough" Probabilistic view of deterministic forecasts Exciting new developments of diagnostic methods to better understand the nature and causes of forecast errors Object- and entity-based Error decomposition Scale separation

26