1 Verification Introduction Holly C. Hartmann Department of Hydrology and Water Resources University of Arizona RFC Verification Workshop,

Slides:



Advertisements
Similar presentations
Test Development.
Advertisements

1 Verification Continued… Holly C. Hartmann Department of Hydrology and Water Resources University of Arizona RFC Verification Workshop,
Guidance of the WMO Commission for CIimatology on verification of operational seasonal forecasts Ernesto Rodríguez Camino AEMET (Thanks to S. Mason, C.
6th WMO tutorial Verification Martin GöberContinuous 1 Good afternoon! नमस्कार नमस्कार Guten Tag! Buenos dias! до́брый день! до́брыйдень Qwertzuiop asdfghjkl!
14 May 2001QPF Verification Workshop Verification of Probability Forecasts at Points WMO QPF Verification Workshop Prague, Czech Republic May 2001.
1 of Introduction to Forecasts and Verification.
Verification of probability and ensemble forecasts
Details for Today: DATE:3 rd February 2005 BY:Mark Cresswell FOLLOWED BY:Assignment 2 briefing Evaluation of Model Performance 69EG3137 – Impacts & Models.
Daria Kluver Independent Study From Statistical Methods in the Atmospheric Sciences By Daniel Wilks.
Application of Forecast Verification Science to Operational River Forecasting in the National Weather Service Julie Demargne, James Brown, Yuqiong Liu.
NOAA CLIMAS NASA EOS NSF SAHRA NOAA GAPP Forecast Assessment: Tactics, Techniques, and Tools Holly C. Hartmann Department of Hydrology and Water Resources.
Barbara Casati June 2009 FMI Verification of continuous predictands
Today Concepts underlying inferential statistics
Chapter 7 Correlational Research Gay, Mills, and Airasian
Evaluation of Potential Performance Measures for the Advanced Hydrologic Prediction Service Gary A. Wick NOAA Environmental Technology Laboratory On Rotational.
Lecture II-2: Probability Review
Introduction to Regression Analysis, Chapter 13,
Hydrologic Statistics
CHAPTER 4 Research in Psychology: Methods & Design
Great Basin Verification Task 2008 Increased Variability Review of 2008 April through July Period Forecast for 4 Selected Basins Determine what verification.
Multivariate Statistical Data Analysis with Its Applications
Verification of ensembles Courtesy of Barbara Brown Acknowledgments: Tom Hamill, Laurence Wilson, Tressa Fowler Copyright UCAR 2012, all rights reserved.
1 LES of Turbulent Flows: Lecture 1 Supplement (ME EN ) Prof. Rob Stoll Department of Mechanical Engineering University of Utah Fall 2014.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
1 James Brown An introduction to verifying probability forecasts RFC Verification Workshop.
Verification methods - towards a user oriented verification WG5.
Measuring forecast skill: is it real skill or is it the varying climatology? Tom Hamill NOAA Earth System Research Lab, Boulder, Colorado
Heidke Skill Score (for deterministic categorical forecasts) Heidke score = Example: Suppose for OND 1997, rainfall forecasts are made for 15 stations.
National Weather Service Application of CFS Forecasts in NWS Hydrologic Ensemble Prediction John Schaake Office of Hydrologic Development NOAA National.
Model validation Simon Mason Seasonal Forecasting Using the Climate Predictability Tool Bangkok, Thailand, 12 – 16 January 2015.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
61 st IHC, New Orleans, LA Verification of the Monte Carlo Tropical Cyclone Wind Speed Probabilities: A Joint Hurricane Testbed Project Update John A.
Probabilistic Forecasting. pdfs and Histograms Probability density functions (pdfs) are unobservable. They can only be estimated. They tell us the density,
Gregg Waller. Our Involvement… Volunteered to be an test site – installed under ATAN… Ran Apr 2007 national statistics with old verification system Installed.
Lecture 10: Correlation and Regression Model.
Statistical Analysis Topic – Math skills requirements.
CBRFC Stakeholder Forum February 24, 2014 Ashley Nielson Kevin Werner NWS Colorado Basin River Forecast Center 1 CBRFC Forecast Verification.
Chapter Eight: Using Statistics to Answer Questions.
Colorado Basin River Forecast Center and Drought Related Forecasts Kevin Werner.
Chapter 6: Analyzing and Interpreting Quantitative Data
Standard Verification Strategies Proposal from NWS Verification Team NWS Verification Team Draft03/23/2009 These slides include notes, which can be expanded.
© 2009 UCAR. All rights reserved. ATEC-4DWX IPR, 21−22 April 2009 National Security Applications Program Research Applications Laboratory Ensemble-4DWX.
1 Probabilistic Forecast Verification Allen Bradley IIHR Hydroscience & Engineering The University of Iowa RFC Verification Workshop 16 August 2007 Salt.
1 Module One: Measurements and Uncertainties No measurement can perfectly determine the value of the quantity being measured. The uncertainty of a measurement.
Chapter 16: Correlation. So far… We’ve focused on hypothesis testing Is the relationship we observe between x and y in our sample true generally (i.e.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Probabilistic seasonal water supply forecasting in an operational environment: the USDA-NRCS Perspective Tom Pagano
Sources of Skill and Error in Long Range Columbia River Streamflow Forecasts: A Comparison of the Role of Hydrologic State Variables and Winter Climate.
Verification of ensemble systems Chiara Marsigli ARPA-SIMC.
Nathalie Voisin 1, Florian Pappenberger 2, Dennis Lettenmaier 1, Roberto Buizza 2, and John Schaake 3 1 University of Washington 2 ECMWF 3 National Weather.
Common verification methods for ensemble forecasts
DOWNSCALING GLOBAL MEDIUM RANGE METEOROLOGICAL PREDICTIONS FOR FLOOD PREDICTION Nathalie Voisin, Andy W. Wood, Dennis P. Lettenmaier University of Washington,
Verification methods - towards a user oriented verification The verification group.
User-Focused Verification Barbara Brown* NCAR July 2006
National Oceanic and Atmospheric Administration’s National Weather Service Colorado Basin River Forecast Center Salt Lake City, Utah 11 The Hydrologic.
HIC Meeting, 02/25/2010 NWS Hydrologic Forecast Verification Team: Status and Discussion Julie Demargne OHD/HSMB Hydrologic Ensemble Prediction (HEP) group.
Statistical analysis.
Statistical analysis.
Verifying and interpreting ensemble products
Binary Forecasts and Observations
Statistical Methods For Engineers
Basic Statistical Terms
Analysis of NASA GPM Early 30-minute Run in Comparison to Walnut Gulch Experimental Watershed Rain Data Adolfo Herrera April Arizona Space Grant.
Probabilistic forecasts
N. Voisin, J.C. Schaake and D.P. Lettenmaier
Verification of Tropical Cyclone Forecasts
Can we distinguish wet years from dry years?
Chapter Nine: Using Statistics to Answer Questions
Short Range Ensemble Prediction System Verification over Greece
Introductory Statistics
Presentation transcript:

1 Verification Introduction Holly C. Hartmann Department of Hydrology and Water Resources University of Arizona RFC Verification Workshop, 08/14/2007

2 General concepts of verification Think about how to apply to your operations Be able to respond to and influence NWS verification program Be prepared as new tools become available Be able to do some of their own verification Be able to work with researchers on verification projects Contribute to development of verification tools (e.g., look at various options) Avoid some typical mistakes Goals

3 1.Introduction to Verification -Applications, Rationale, Basic Concepts -Data Visualization and Exploration -Deterministic Scalar measures 2. Categorical measures – KEVIN WERNER -Deterministic Forecasts -Ensemble Forecasts 3. Diagnostic Verification -Reliability -Discrimination -Conditioning/Structuring Analyses 4. Lab Session/Group Exercise - Developing Verification Strategies - Connecting to Forecast Operations and Users Agenda

4 Why Do Verification? It depends… Administrative: logistics, selected quantitative criteria Operations: inputs, model states, outputs, quick! Research: sources of error, targeting research Users: making decisions, exploit skill, avoid mistakes Concerns about verification?

5 Need for Verification Measures Verification statistics identify -accuracy of forecasts -sources of skill in forecasts - sources of uncertainty in forecasts - conditions where and when forecasts are skillful or not skillful, and why Verification statistics then can inform -improvements in terms of forecast skill and decision making with alternate forecast sources (e.g., climatology, persistence, new forecast systems) Adapted from: Regonda, Demargne, and Seo, 2006

6 Skill versus Value Assess quality of forecast system i.e. determine skill and value of forecast A forecast has skill if it predicts the observed conditions well according to some objective or subjective criteria. A forecast has value if it helps the user to make better decisions than without knowledge of the forecast. Forecasts with poor skill can be valuable (e.g. extreme event forecasted in wrong place) Forecasts with high skill can be of little value (e.g. blue sky desert) Credit: Hagedorn (2006) and Julie Demargne

7 Common across all groups Uninformed, mistaken about forecast interpretation Use of forecasts limited by lack of demonstrated forecast skill Have difficulty specifying required accuracy Unique among stakeholders Relevant forecast variables, regions (location & scale), seasons, lead times, performance characteristics Technical sophistication: base probabilities, distributions, math Role of of forecasts in decision making Common across many, but not all, stakeholders Have difficulty distinguishing between “good” & “bad” products Have difficulty placing forecasts in historical context Stakeholder Use of HydroClimate Info & Forecasts

8 Forecast evaluation concepts All happy families are alike; each unhappy family is unhappy in its own way. -- Leo Tolstoy (1876) All perfect forecasts are alike; each imperfect forecast is imperfect in its own way. -- Holly Hartmann (2002) What is a Perfect Forecast?

9 “Today’s high will be 76 degrees, and it will be partly cloudy, with a 30% chance of rain.” Deterministic Categorical Probabilistic Different Forecasts, Information, Evaluation

10 “Today’s high will be 76 degrees, and it will be partly cloudy, with a 30% chance of rain.” Deterministic Categorical Probabilistic Categorical Deterministic How would you evaluate each of these? Different Forecasts, Information, Evaluation Rain No rain 30% 76°

11 “Today’s high will be 76 degrees, and it will be partly cloudy, with a 30% chance of rain.” Deterministic Categorical Probabilistic Different Forecasts, Information, Evaluation Standard hydrograph Deterministic

12 ESP Forecasts: User preferences influence verification From: California-Nevada River Forecast Center

13 ESP Forecasts: User preferences influence verification From: California-Nevada River Forecast Center

14 ESP Forecasts: User preferences influence verification From: California-Nevada River Forecast Center

15 ESP Forecasts: User preferences influence verification From: California-Nevada River Forecast Center

16 Deterministic Bias Correlation RMSE Standardized RMSE Nash-Sutcliffe Linear Error in Probability Space Categorical Hit Rate Surprise rate Threat Score Gerrity Score Success Ratio Post-agreement Percent Correct Pierce Skill Score Gilbert Skill Score Heidke Skill Score Critical Success index Percent N-class errors Modified Heidke Skill Score Hannsen and Kuipers Score Gandin and Murphy Skill Scores… Probabilistic Brier Score Ranked Probability Score Distributions- oriented Measures Reliability Discrimination Sharpness So Many Evaluation Criteria!

17 RFC Verification System: Metrics CATEGORIESDETERMINISTIC FORECAST VERIFICATION METRICS PROBABILISTIC FORECAST VERIFICATION METRICS 1. Categorical (predefined threshold, range of values) Probability Of Detection (POD), False Alarm Ratio (FAR), Probability of False Detection (POFD) Lead Time of Detection (LTD), Critical Success Index (CSI), Pierce Skill Score (PSS), Gerrity Score (GS) Brier Score (BS), Rank Probability Score (RPS) 2. Error (accuracy) Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Error (ME), Bias (%), Linear Error in Probability Space (LEPS) Continuous RPS 3. Correlation Pearson Correlation Coefficient, Ranked correlation coefficient, scatter plots 4. Distribution Properties Mean, variance, higher moments for observation and forecasts Wilcoxon rank sum test, variance of forecasts, variance of observations, ensemble spread, Talagrand Diagram (or Rank Histogram) Source: Verification Group, courtesy J. Demargne

18 RFC Verification System: Metrics CATEGORIESDETERMINISTIC FORECAST VERIFICATION METRICS PROBABILISTIC FORECAST VERIFICATION METRICS 5. Skill Scores (relative accuracy over reference forecast) Root Mean Squared Error Skill Score (SS- RMSE) (with reference to persistence, climatology, lagged persistence), Wilson Score (WS), Linear Error in Probability Space Skill Score (SS-LEPS) Rank Probability Skill Score, Brier Skill Score (with reference to persistence, climatology, lagged persistence) 6. Conditional Statistics (based on occurrence of specific events) Relative Operating Characteristic (ROC), reliability measures, discrimination diagram, other discrimination measures ROC and ROC Area, other resolution measures, reliability diagram, discrimination diagram, other discrimination measures 7. Confidence (metric uncertainty) Sample size, Confidence Interval (CI) Ensemble size, sample size, Confidence Interval (CI) Source: Verification Group, courtesy J. Demargne

19 Accuracy - overall correspondence between forecasts and observations Bias - difference between average forecast and average observation Consistency - forecasts don’t waffle around Possible Performance Criteria Good consistency

20 Accuracy - overall correspondence between forecasts and observations Bias - difference between average forecast and average observation Consistency - forecasts don’t waffle around Sharpness/Refinement – ability to make bullish forecast statements Not Sharp Sharp Possible Performance Criteria

21 What makes a forecast “good”? Forecasts should agree with observations, with few large errors Accuracy Forecast mean should agree with observed meanBias Linear relationship between forecasts and observations Association Forecast should be more accurate than low-skilled reference forecasts (e.g., random chance, persistence, or climatology) Skill Adapted from : Ebert (2003)

22 What makes a forecast “good”? Binned forecast values should agree with binned observations (agreement between categories) Reliability Forecast can discriminate between events & non- events Resolution Forecast can predict with strong probabilities (i.e., 100% for event, 0% for non-event) Sharpness Forecast represents the associated uncertaintySpread (Variability) Adapted from : Ebert (2003)

23 False AlarmsSurprises warning without eventevent without warning No fire “False Alarm Ratio”“Probability of Detection” A forecaster’s fundamental challenge is balancing these two. Which is more important? Depends on the specific decision context… Forecasting Tradeoffs Forecast performance is multi-faceted

24 Skill Score: (0.50 – 0.54)/( ) = -8.6% ~worse than guessing~ Skill Score = S Forecast – S Baseline S Perfect – S Baseline How Good? Compared to What? What is the appropriate Baseline? = 1 - S Forecast S Baseline

25 Graphical Forecast Evaluation

26 Historical seasonal water supply outlooks Colorado River Basin Basic Data Display Morrill, Hartmann, and Bales, 2007

27 Historical seasonal water supply outlooks Colorado River Basin Scatter plots Morrill, Hartmann, and Bales, 2007

28 Historical seasonal water supply outlooks Colorado River Basin Histograms Morrill, Hartmann, and Bales, 2007

29 IVP Scatterplot Example Source: H. Herr

30 Empirical distribution of forecast probabilities for different observations categories Goal: Widely separated CDFs Cumulative Distribution Function (CDF): IVP Cat 1 = No Observed Precipitation Cat 2 = Observed Precipitation (>0.001”) Source: H. Herr, IVP Charting Examples, 2007

31 Probability Density Function (PDF): IVP Cat 1 = No Observed Precipitation Cat 2 = Observed Precipitation (>0.001”) Empirical distribution for 10 bins for IVP GUI Goal: Widely separated PDFs Source: H. Herr, IVP Charting Examples, 2007

32 “Box-plots”: Quantiles and Extremes Based on summarizing CDF computation and plot Goal: Widely separated box-plots Cat 1 = No Observed Precipitation Cat 2 = Observed Precipitation (>0.001”) Source: H. Herr, IVP Charting Examples, 2007

33 Scalar Forecast Evaluation

34 Bias Mean forecast = Mean observed Correlation Coefficient Variance shared between forecast and observed (r 2 ) Says nothing about bias or whether forecast variance = observed variance Pearson correlation coefficient: assumes normal distribution, can be + or – (Rank r: only +, non-normal ok) Root Mean Squared Error Distance between forecast/observation values Better than correlation, poor when error is heteroscedastic Emphasizes performance for high flows Alternative: Mean Absolute Error (MAE) fcst obs Observed Forecast Standard Scalar Measures

April 1 Forecasts for Apr-Sept Streamflow at Stehekin R at Stehekin, WA Forecast (1000’s ac-ft) Observed (1000’s ac-ft) January 1 Forecasts for Jan-May Streamflow at Verde R blw Tangle Crk, AZ Bias = 22 Corr = 0.92 RMSE = 74.4 Bias = Corr = 0.58 RMSE = Standard Scalar Measures (with Scatterplot)

36 IVP: Deterministic Scalar Measures ME: smallest; + and – errors cancel MAE vs. RMSE: RMSE influenced by large errors for large events MAXERR: largest Sample Size: small samples have large uncertainty Source: H. Herr, IVP Charting Examples, 2007

37 IVP: RMSE – Skill Scores Skill compared to Persistence Forecast Source: H. Herr, IVP Charting Examples, 2007