Standard Verification Strategies Proposal from NWS Verification Team NWS Verification Team Draft03/23/2009 These slides include notes, which can be expanded.

Slides:

Advertisements

Similar presentations

Climate Prediction Applications Science Workshop

Advertisements

Climate Modeling LaboratoryMEASNC State University An Extended Procedure for Implementing the Relative Operating Characteristic Graphical Method Robert.

ECMWF Slide 1Met Op training course – Reading, March 2004 Forecast verification: probabilistic aspects Anna Ghelli, ECMWF.

Slide 1ECMWF forecast User Meeting -- Reading, June 2006 Verification of weather parameters Anna Ghelli, ECMWF.

Model Evaluation Tools MET. What is MET Model Evaluation Tools ( MET )- a powerful and highly configurable verification package developed by DTC offering:

1 Verification Continued… Holly C. Hartmann Department of Hydrology and Water Resources University of Arizona RFC Verification Workshop,

14 May 2001QPF Verification Workshop Verification of Probability Forecasts at Points WMO QPF Verification Workshop Prague, Czech Republic May 2001.

SIPR Dundee. © Crown copyright Scottish Flood Forecasting Service Pete Buchanan – Met Office Richard Maxey – SEPA SIPR, Dundee, 21 June 2011.

1 of Introduction to Forecasts and Verification.

1 Verification Introduction Holly C. Hartmann Department of Hydrology and Water Resources University of Arizona RFC Verification Workshop,

Verification of probability and ensemble forecasts

Details for Today: DATE:3 rd February 2005 BY:Mark Cresswell FOLLOWED BY:Assignment 2 briefing Evaluation of Model Performance 69EG3137 – Impacts & Models.

Seasonal Predictability in East Asian Region Targeted Training Activity: Seasonal Predictability in Tropical Regions: Research and Applications 『 East.

Verification and evaluation of a national probabilistic prediction system Barbara Brown NCAR 23 September 2009.

Dennis P. Lettenmaier Alan F. Hamlet JISAO Center for Science in the Earth System Climate Impacts Group and Department of Civil and Environmental Engineering.

Application of Forecast Verification Science to Operational River Forecasting in the National Weather Service Julie Demargne, James Brown, Yuqiong Liu.

MOS Performance MOS significantly improves on the skill of model output. National Weather Service verification statistics have shown a narrowing gap between.

Western Water Supply Kevin Werner, Andrew Murray, WR/SSD Jay Breidenbach, WFO Boise Cass Goodman, Steve Shumate, CBRFC Alan Takamoto, Scott Staggs, CNRFC.

Chapter 13 – Weather Analysis and Forecasting. The National Weather Service The National Weather Service (NWS) is responsible for forecasts several times.

Evaluation of Potential Performance Measures for the Advanced Hydrologic Prediction Service Gary A. Wick NOAA Environmental Technology Laboratory On Rotational.

Water Supply Forecast using the Ensemble Streamflow Prediction Model Kevin Berghoff, Senior Hydrologist Northwest River Forecast Center Portland, OR.

Hydrologic Statistics

Verification has been undertaken for the 3 month Summer period (30/05/12 – 06/09/12) using forecasts and observations at all 205 UK civil and defence aerodromes.

Copyright 2012, University Corporation for Atmospheric Research, all rights reserved Verifying Ensembles & Probability Fcsts with MET Ensemble Stat Tool.

© Crown copyright Met Office Operational OpenRoad verification Presented by Robert Coulson.

ECMWF WWRP/WMO Workshop on QPF Verification - Prague, May 2001 NWP precipitation forecasts: Validation and Value Deterministic Forecasts Probabilities.

Verification of ensembles Courtesy of Barbara Brown Acknowledgments: Tom Hamill, Laurence Wilson, Tressa Fowler Copyright UCAR 2012, all rights reserved.

Verification Summit AMB verification: rapid feedback to guide model development decisions Patrick Hofmann, Bill Moninger, Steve Weygandt, Curtis Alexander,

1 James Brown An introduction to verifying probability forecasts RFC Verification Workshop.

Guidance on Intensity Guidance Kieran Bhatia, David Nolan, Mark DeMaria, Andrea Schumacher IHC Presentation This project is supported by the.

Verification methods - towards a user oriented verification WG5.

Heidke Skill Score (for deterministic categorical forecasts) Heidke score = Example: Suppose for OND 1997, rainfall forecasts are made for 15 stations.

National Weather Service Application of CFS Forecasts in NWS Hydrologic Ensemble Prediction John Schaake Office of Hydrologic Development NOAA National.

Model validation Simon Mason Seasonal Forecasting Using the Climate Predictability Tool Bangkok, Thailand, 12 – 16 January 2015.

Requirements from KENDA on the verification NetCDF feedback files: -produced by analysis system (LETKF) and ‘stat’ utility ((to.

Toward Probabilistic Seasonal Prediction Nir Krakauer, Hannah Aizenman, Michael Grossberg, Irina Gladkova Department of Civil Engineering and CUNY Remote.

Verification of IRI Forecasts Tony Barnston and Shuhua Li.

Probabilistic Forecasting. pdfs and Histograms Probability density functions (pdfs) are unobservable. They can only be estimated. They tell us the density,

World Meteorological Organization Working together in weather, climate and water Enhanced User and Forecaster Oriented TAF Quality Assessment CAeM-XIV.

Feng Zhang and Aris Georgakakos School of Civil and Environmental Engineering, Georgia Institute of Technology Sample of Chart Subheading Goes Here Comparing.

Retrospective Evaluation of the Performance of Experimental Long-Lead Columbia River Streamflow Forecasts Climate Forecast and Estimated Initial Soil Moisture.

Refinement and Evaluation of Automated High-Resolution Ensemble-Based Hazard Detection Guidance Tools for Transition to NWS Operations Kick off JNTP project.

CBRFC Stakeholder Forum February 24, 2014 Ashley Nielson Kevin Werner NWS Colorado Basin River Forecast Center 1 CBRFC Forecast Verification.

Logistical Verification Forecast Services in IHFS Mary Mullusky RFC Verification Workshop, August

Colorado Basin River Forecast Center and Drought Related Forecasts Kevin Werner.

The Centre for Australian Weather and Climate Research A partnership between CSIRO and the Bureau of Meteorology Verification and Metrics (CAWCR)

1 Probabilistic Forecast Verification Allen Bradley IIHR Hydroscience & Engineering The University of Iowa RFC Verification Workshop 16 August 2007 Salt.

Verification of ensemble systems Chiara Marsigli ARPA-SIMC.

Alan F. Hamlet Andy Wood Dennis P. Lettenmaier JISAO Center for Science in the Earth System Climate Impacts Group and the Department.

Nathalie Voisin 1, Florian Pappenberger 2, Dennis Lettenmaier 1, Roberto Buizza 2, and John Schaake 3 1 University of Washington 2 ECMWF 3 National Weather.

Common verification methods for ensemble forecasts

Diagnostic verification and extremes: 1 st Breakout Discussed the need for toolkit to build beyond current capabilities (e.g., NCEP) Identified (and began.

DOWNSCALING GLOBAL MEDIUM RANGE METEOROLOGICAL PREDICTIONS FOR FLOOD PREDICTION Nathalie Voisin, Andy W. Wood, Dennis P. Lettenmaier University of Washington,

VERIFICATION OF A DOWNSCALING SEQUENCE APPLIED TO MEDIUM RANGE METEOROLOGICAL PREDICTIONS FOR GLOBAL FLOOD PREDICTION Nathalie Voisin, Andy W. Wood and.

CHPS-XEFS Ensemble Pre- Processing (EPP) Update Prepared by DJ Seo Feb 23,

Verification methods - towards a user oriented verification The verification group.

Findings and Recommendations from the Hydrologic Verification System Requirements Team Peter Gabrielsen, Julie Demargne, Mary Mullusky, Kevin Werner, Bill.

Guide to Choosing Time-Series Analyses for Environmental Flow Studies Michael Stewardson SAGES, The University of Melbourne.

RFC Verification Workshop What to Expect Julie Demargne and Mary Mullusky RFC Verification Workshop, August

National Oceanic and Atmospheric Administration’s National Weather Service Colorado Basin River Forecast Center Salt Lake City, Utah 11 The Hydrologic.

HIC Meeting, 02/25/2010 NWS Hydrologic Forecast Verification Team: Status and Discussion Julie Demargne OHD/HSMB Hydrologic Ensemble Prediction (HEP) group.

Addressing the environmental impact of salt use on roads

Verifying and interpreting ensemble products

Precipitation Products Statistical Techniques

Probabilistic forecasts

Application of a global probabilistic hydrologic forecast system to the Ohio River Basin Nathalie Voisin1, Florian Pappenberger2, Dennis Lettenmaier1,

Can we distinguish wet years from dry years?

Verification of SPE Probability Forecasts at SEPC

Short Range Ensemble Prediction System Verification over Greece

Presentation transcript:

Standard Verification Strategies Proposal from NWS Verification Team NWS Verification Team Draft03/23/2009 These slides include notes, which can be expanded with your feedback

2 Overview Verification goals Key metrics and products for goal #1: How good are the forecasts? Key metrics and products for goal #2: What are the strengths and weaknesses in the forecasts? Key verification analysis for goal #3: What are the sources of error/uncertainty in the forecasts? Recommendations for goals #4 and 5: How is new science improving the forecasts? What should be done to improve the forecasts? Examples of verification products (add examples if you want) Please send your feedback to

3 Verification Goals Verification helps us answer questions such as 1) How good are the forecasts?  Several aspects in quality => several metrics  Multiple users => several levels of sophistication 2) What are the strengths and weaknesses in the forecasts?  Several conditions to verify subsets of forecasts  Several baseline forecasts for comparison 3) What are the sources of error/uncertainty in the forecasts?  Several scenarios to separate different sources of uncertainty 4) How is new science improving the forecasts?  Comparison of verification results from current process vs. new process 5) What should be done to improve the forecasts?

4 Verification Goals Verification activity has value only if the information generated leads to a decision about the forecast/system being verified –User of the information must be identified –Purpose of the verification must be known in advance No single verification measure provides complete information about the quality of a forecast product –Different levels of sophistication with several verification metrics and products To inter-compare metrics across basins and RFCs, they should be normalized

5 Goal #1: How good are the forecasts? Different aspects in forecast quality  Accuracy (agreement w/ observations)  Bias in the mean (forecast mean agrees with observed mean)  Correlation (linear relationship between forecasts and obs.)  Skill (more accurate than reference forecast)  Reliability (agreement between categories; conditioned on fcst)  Resolution (discriminate between events and non-events; conditioned on observations)  Sharpness for probabilistic forecasts (prediction with strong probabilities) Need to compare deterministic and probabilistic forecasts for forcing inputs and hydrologic outputs

6 Goal #1: Key verification metrics Proposed key verification metrics to analyze different aspects in forecast quality: for deterministic forecasts  Accuracy: Mean Absolute Error (MAE) (equivalent to CRPS)  Bias: Mean Error (ME) (or relative measures)  Correlation: Correlation Coefficient (CC)  Skill: MAE-Skill Score w/ reference (MAE-SS ref )  Reliability: for 1 given event False Alarm Ratio FAR=FP/(TP+FP)  Resolution: for 1 given event Probability of Detection POD=TP/(TP+FN) Probability of False Detection POFD=FP/(FP+TN) ROC plot (POD vs. POFD) and ROC Area O !O F TP FP !F FN TN

7 Goal #1: Key verification metrics Proposed key verification metrics to analyze different aspects in forecast quality: for probabilistic forecasts  Accuracy: Mean Continuous Rank Probability Score (CRPS) to be decomposed CRPS = Reliability - Resolution + Uncertainty; Brier Score (BS) for 1 given threshold (e.g., PoP)  Reliability: Reliability CRPS, Cumulative Talagrand diagram, Reliability diagram for 1 given event  Resolution: Resolution CRPS, ROC plot (POD vs. POFD) and ROC Area for 1 given event  Skill: CRPS Skill Score w/ climatology (CRPSS clim ), Brier Skill Score (BSS clim ) for 1 given threshold  For forecast mean: bias ME mean and correlation CC mean

8 Other key verification metrics: timing error, peak error, shape error (under development) Process to pair forecasts and observations based on event (no longer forecast valid time) –Technique from spatial verification or curve registration to be adapted Goal #1: Key verification metrics Peak Timing OBS FCST OBS Peak Value, Shape

9 Other verification metric to describe forecast value: Relative Value (or Economic Value) is a skill score of expected expense using Cost/Loss ratio, w/ climatology as a reference See website Action C C No Action L 0 Event No event Envelop: potential value w/ all event thresholds Expense matrix for 1 event cost C and loss L for taking action based on a forecast Goal #1: Key verification metrics 1 user would pick its own Cost/Loss ratio to be multiplied by contingency table to get expense for this event

10 Goal #1: Key verification metrics Normalized verification metrics to inter-compare results at different forecast points (and across RFCs)  Thresholds:  specific percentiles in observed distribution (e.g., 10 th percentile, 25 th percentile, 90 th percentile)  specific impact thresholds (e.g., Action Stage, Flood Stage)  Skill scores:  Choice of metric: MAE / CRPS  Choice of reference: persistence, climatology (to be defined)  Relative measures (needs more work):  Relative bias = ME/(obs. mean)  Percent bias = 100 x [Sum(F i – O i ) / Sum(O i )]

11 Goal #1: Key verification products Multiple users of forecasts and verification information  different levels of sophistication for verification metrics/plots  different levels of spatial aggregation to provide verification info for individual forecast points and for groups of forecast points (up to RCF areas)  different levels of time aggregation to provide verification info from last “days” and from last “years”

12 Goal #1: Key verification products Proposed verification metrics/plots for 3 levels of sophistication  Summary info  Accuracy: MAE and CRPS  Bias: ME  Skill: MAE-SS ref and CRPSS clim  Detailed info:  Scatter plots (deterministic forecasts), box-whisker plots (probabilistic forecasts)  Time series plots for deterministic and probabilistic forecasts  Sophisticated info:  Reliability: FAR (for 1 given event), Reliability CRPS, Cumulative Talagrand Diagram, Reliability Diagram (for 1 given event)  Resolution: ROC and ROC Area (for 1 given event), Resolution CRPS  Forecast value: Relative Value

13 Goal #1: Key verification products Proposed verification metrics/plots  Summary info could be based on  a few key metrics/scores (more than 1 metric)  a combination of several metrics into one plot  a combination of several metrics into one score (TBD) ME MAE Lead time Example of 2D plot: different metrics vs. lead times or time periods Example of bubble plot from OHRFC MAE ME Skill ROC Area

14 Goal #1: Key verification products Proposed verification metrics/plots  Detailed info for current forecast: time series plot displayed w/ forecasts from the last 5-10 days provides basic information on  forecast agreement with observation in recent past  forecast uncertainty given the previous forecasts  Detailed info for ‘historical’ event: time series plots for specific events selected from the past Time series plots for probabilistic forecasts (with all lead times): no example yet

15 Goal #2: What are the strengths and weaknesses? Forecast quality varies in time based on different conditions => different subsets of forecasts need to be verified to see how quality varies  Time of the year: by season, by month  Atmospheric/hydrologic conditions: by categories defined from precipitation/temperature/flow/stage observed or forecast values (e.g., high flow category if flow >= X) Forecast quality needs to be compared to baseline => different baseline forecasts to compare with  Persistence, climatology

16 Goal #2: Key verification products Verification statistics/products produced for different conditions  for each season and for each month  for different atmospheric and/or hydrologic conditions; categories to be defined from precipitation, temperature, flow, stage observed or forecast values  Categories defined from specific percentiles (e.g., 25 th percentile, 75 th percentile)  Categories defined from specific impact thresholds (e.g., Action Stage, Flood Stage) Impact of sample size on results: plot sample sizes; plot verification stats with confidence intervals (under development)

17 Goal #3: What are the sources of uncertainty? Forecast error/uncertainty comes from different sources:  forcing inputs  initial conditions  model parameters and structure These uncertainty sources interact with each other Different forecast scenarios are needed to analyze how these different sources of uncertainty impact the quality of hydrologic forecasts  Hindcasting capability to retroactively generate forecasts from a fixed forecast scenario with large sample size  Work under way to better diagnose sources of uncertainty

18 Goal #3: Key verification analysis Analyze impact of forcing input forecasts from 2 different sources or durations on flow (or stage) forecast quality  Generate ‘forecast’ flow w/ input forecast sets 1 and 2 using same initial conditions (wo/ on-the-fly mods) and same model  Generate ‘simulated’ flow from observed inputs using same initial conditions and same model  Compare flow forecasts w/ observed flow (impact of hydro. + meteo. uncertainties) and w/ simulated flow (impact of hydro. uncertainty mostly): the differences in results are due to meteorological uncertainty mostly if interaction between meteo. and hydro uncertainties is not significant

19 Goal #3: Key verification analysis Analyze impact of run-time mods on flow (or stage) forecast quality  Define reference model states (to be discussed; could use model states valid at T – 5 days, even if it includes past mods)  Defined basic mods to include (e.g. for regulated points)  Generate flow forecasts from reference model states w/ best available observed inputs (but no forecast input) wo/ on-the-fly mods and w/ on-the-fly mods  Generate flow forecasts from reference model states w/ best available observed and forecast inputs wo/ on-the-fly mods and w/ on-the-fly mods

20 Goals #4 and 5: How is new science improving the forecasts? What should be done to improve the forecasts? The impact of any (newly developed) forecasting process on forecast quality needs to be demonstrated via rigorous and objective verification studies Verification results form the basis for accepting (or rejecting) proposed improvements to the forecasting system and for prioritizing system development and enhancements This will be easier in the future –when verification standards are agreed upon –when a unified verification system for both deterministic and probabilistic forecasts will be available in CHPS –when scientists and forecasters are trained on verification

21 Verification product examples User analysis is needed to identify standard verification products for the different user groups –This analysis will be done with NWS Verification Team, CAT and CAT2 Teams, Interactive/Ensemble Product Generator (EPG) Requirements Team, and the SCHs –New RFC verification case studies to work with proposed standards Here are verification product examples to be discussed by the NWS Verification Team and to be presented in the final team report –Please send feedback to

22 Verification product examples Summary verification maps: 1 metric for given lead time, given time period and for given forecast point(s) Potential metrics:  deterministic forecasts: MAE, MAE-SS ref, Bias  probabilistic forecasts: CRPS, CRPSS ref, Reliability CRPS MAE = xxx MAE-SS clim = xxx MAE-SS pers. = xxx Bias = xxx Correlation = xxx Example of MAE map MAE <= threshold X MAE > threshold X Click on map to get more details to be defined by user

23 Verification product examples Summary verification maps w/ animation: 1 metric for several lead times, time periods (e.g. months, seasons), thresholds Example of MAE maps for 4 seasons MAE <= threshold X MAE > threshold X DJF MAM JJA SON Potential metrics:  deterministic forecasts: MAE, MAE-SS ref, Bias  probabilistic forecasts: CRPS, CRPSS ref, Reliability CRPS

24 Verification product examples Summary verification maps w/ spatial aggregation: 1 metric for several levels of aggregation (segments, forecast groups, carryover groups, RFCs) Example of MAE maps for 2 aggregation levels MAE <= threshold X MAE > threshold X Segments Forecast groups

25 Verification product examples Examples of verification maps from NDFD verification website –User selects Variable, Metric, Forecast, Forecast period (month), Lead time, Animation option Map of Brier Score for POP Map of Bias for Minimum Temperature Overall statistic + statistics on 4 regions

26 Verification product examples Detailed graphics for deterministic forecasts: scatter plots for given lead time IVP scatter plots for given lead time Example of comparison of 2 sets of forecasts in blue and red Forecast value Observed value User-defined threshold O !O F TP FP !F FN TN Data ‘behind’ contingency table

27 Verification product examples Detailed graphics for water supply forecasts: scatter plots for given issuance dates WR water supply website scatter plots (years identified) for different issuance dates

28 Verification product examples 2 user-defined thresholds Example from Allen Bradley Detailed graphics for probabilistic forecasts: box-whisker plots for given lead time Forecast distribution 50 th perc. 20 th perc. Lowest Highest 80 th perc.

29 Verification product examples Detailed graphics for probabilistic forecasts: box-whisker plots vs. time for given lead time Example from Allen Bradley x observation Forecast distribution 50 th perc. 20 th perc. Lowest Highest 80 th perc.

30 Verification product examples Detailed graphics for probabilistic forecasts: box-whisker plots of forecast error vs. observations for given lead time EVS box-whisker plot for given lead time Observation Forecast errors (F-O) Highest 90 th percentile 80 th percentile 20 th percentile 10 th percentile Lowest 50 th percentile (Median) ‘Errors’ for 1 forecast — Zero error line

31 Verification product examples Detailed graphics for probabilistic forecasts: box-whisker plots with identified dates for events of interest EVS box-whisker plot with identified dates for events of interest Observed value Ivan Ivan (09/18/04) Frances (09/09/04) Forecast errors (F-O) — Zero error line

32 Verification product examples Detailed graphics for deterministic forecasts: time series plots for given time period IVP time series plot for given sets of forecasts Forecast/Observed values Time User-selected time

33 Verification product examples Detailed graphics for water supply forecasts: historical plots for a given range of years WR water supply website historical plot

34 Verification product examples Detailed graphics for probabilistic forecasts: time series plots w/ box-whiskers for given time period Box-whisker plot for given ensemble time series (from ER RFCs) Forecast/Observed values Time Overlay other ensemble time series in different colors?

35 Verification product examples Detailed graphics for probabilistic forecasts: transform probabilistic forecasts into event forecasts and plot for 1 given event, probability forecast and observations 0.3 f = P{Y <6000} Vertical lines show when event occurred Examples from Allen Bradley – Event: flow volume < 6000 cfs-days 6000 f = P{Y <6000}

36 Verification product examples IVP plot for RMSE and sample size for 4 sets of forecasts Forecast lead time Sample Size RMSE Verification statistical graphics: metric values vs. lead times for different sets of forecasts

37 Verification product examples WR water supply website plot for RMSE Skill Score for different issuance dates and 5 sets of forecasts Verification statistical graphics: metric values vs. forecast issuance dates for different sets of forecasts Issuance Month RMSE Skill Score

38 Verification product examples Verification statistical graphics: metric values vs. lead time for given time period and given forecast point(s) EVS example with Mean CRPS plot as a function of lead time Metric computed from different conditioning: - all data - subset for which obs>= x - subset for which obs>= xx - subset for which obs>= xxx Forecast lead time Mean CRPS

39 Verification product examples Examples from Allen Bradley with Skill Scores Probability threshold Skill scores Verification statistical graphics: metric values vs. probability threshold for given time period and given forecast point(s)

40 Verification product examples Example w/ Mean CRPS plot with Confidence Intervals (CI) Verification statistical graphics: metric values vs. lead times w/ information on Confidence Intervals (under development)

41 Verification product examples EVS Reliability diagram for given lead time Metric computed for different events: - event >= x - event >= xx - event >= xxx Forecast probability Observed frequency Samples Sophisticated verification graphics: Reliability diagram for given lead time and given sets of events

42 Verification product examples Sophisticated verification graphics: Reliability diagram for given 1 given event and for given lead times Reliability diagram for several lead days for 1 specific event Event: > 85 th percentile (obs. distribution)

43 Verification product examples ROC for 1 specific event and for several lead days Comparison of deterministic and ensembles forecasts Sophisticated verification graphics: ROC diagram for 1 specific event Probability of False Detection POFD Probability of Detection POD Event: > 85 th percentile (obs. distribution)

44 Verification product examples Sophisticated verification graphics: 2-D plot to show metric values relative to different lead times and time periods CRPS for different lead times and different months Bias for different lead days relative to time in the year

45 Verification product examples Sophisticated verification graphics: 2-D plot to show metric values relative to different lead times and different thresholds Brier Score for different lead times and different thresholds (events) Event: > X th percentile

46 Verification product examples Sophisticated verification graphics: 2-D plot to compare Relative Value (or Economic Value) for different thresholds and different sets of forecasts Example from Environment Canada Comparison between forecasts F1 and forecasts F2 Cost/Loss Ratio Threshold Comparison of Relative Value V(F1) – V(F2) if V(F1) or V(F2) > V(Clim.) V(F1)>V(F2) and V(F1)>V(Clim) V(F2)>V(F1) and V(F2)>V(Clim) V(Clim)>V(F1) and V(Clim)> V(F2) Event: > X th perc.

47 Verification product examples Other verification graphics?