1 Verification Continued… Holly C. Hartmann Department of Hydrology and Water Resources University of Arizona RFC Verification Workshop,

Slides:

Advertisements

Similar presentations

ECMWF Slide 1Met Op training course – Reading, March 2004 Forecast verification: probabilistic aspects Anna Ghelli, ECMWF.

Advertisements

Guidance of the WMO Commission for CIimatology on verification of operational seasonal forecasts Ernesto Rodríguez Camino AEMET (Thanks to S. Mason, C.

14 May 2001QPF Verification Workshop Verification of Probability Forecasts at Points WMO QPF Verification Workshop Prague, Czech Republic May 2001.

1 of Introduction to Forecasts and Verification.

Psychology: A Modular Approach to Mind and Behavior, Tenth Edition, Dennis Coon Appendix Appendix: Behavioral Statistics.

Table of Contents Exit Appendix Behavioral Statistics.

Andy Wood Univ. of Washington Dept. of Civil & Envir. Engr. Statistics related to the merging of short and long lead precipitation predictions in the continental.

1 Verification Introduction Holly C. Hartmann Department of Hydrology and Water Resources University of Arizona RFC Verification Workshop,

Verification of probability and ensemble forecasts

Details for Today: DATE:3 rd February 2005 BY:Mark Cresswell FOLLOWED BY:Assignment 2 briefing Evaluation of Model Performance 69EG3137 – Impacts & Models.

Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Improving COSMO-LEPS forecasts of extreme events with.

ENSO Forcing of Streamflow Conditions in the Pearl River Basin R. Jason Caldwell, LMRFC and Robert J. Ricks, WFO LIX.

Recap of WY ENSO is typically very stable from Oct-Jan.

A Regression Model for Ensemble Forecasts David Unger Climate Prediction Center.

Colorado Basin River Forecast Center Water Supply Forecasting Method Michelle Stokes Hydrologist in Charge Colorado Basin River Forecast Center April 28,

Evaluation of Potential Performance Measures for the Advanced Hydrologic Prediction Service Gary A. Wick NOAA Environmental Technology Laboratory On Rotational.

Lecture II-2: Probability Review

Water Supply Forecast using the Ensemble Streamflow Prediction Model Kevin Berghoff, Senior Hydrologist Northwest River Forecast Center Portland, OR.

Measures of Central Tendency

Reclamation Mid-Term Operational Modeling Seasonal to Year-Two Colorado River Streamflow Prediction Workshop CBRFC March 21-22, 2011 Katrina Grantz, PhD.

Great Basin Verification Task 2008 Increased Variability Review of 2008 April through July Period Forecast for 4 Selected Basins Determine what verification.

Southwest Hydrometeorology Symposium Tempe, AZ September 28, 2011 Kevin Werner NWS Colorado Basin River Forecast Center : A Year of Extremes.

Verification of ensembles Courtesy of Barbara Brown Acknowledgments: Tom Hamill, Laurence Wilson, Tressa Fowler Copyright UCAR 2012, all rights reserved.

Tutorial. Other post-processing approaches … 1) Bayesian Model Averaging (BMA) – Raftery et al (1997) 2) Analogue approaches – Hopson and Webster, J.

1 James Brown An introduction to verifying probability forecasts RFC Verification Workshop.

National Weather Service Application of CFS Forecasts in NWS Hydrologic Ensemble Prediction John Schaake Office of Hydrologic Development NOAA National.

QUANTITATIVE RESEARCH AND BASIC STATISTICS. TODAYS AGENDA Progress, challenges and support needed Response to TAP Check-in, Warm-up responses and TAP.

Overview of the Colorado Basin River Forecast Center Lisa Holts.

Probabilistic Forecasting. pdfs and Histograms Probability density functions (pdfs) are unobservable. They can only be estimated. They tell us the density,

1 Objective Drought Monitoring and Prediction Recent efforts at Climate Prediction Ct. Kingtse Mo & Jinho Yoon Climate Prediction Center.

Retrospective Evaluation of the Performance of Experimental Long-Lead Columbia River Streamflow Forecasts Climate Forecast and Estimated Initial Soil Moisture.

CBRFC Stakeholder Forum February 24, 2014 Ashley Nielson Kevin Werner NWS Colorado Basin River Forecast Center 1 CBRFC Forecast Verification.

The Centre for Australian Weather and Climate Research A partnership between CSIRO and the Bureau of Meteorology Verification and Metrics (CAWCR)

Standard Verification Strategies Proposal from NWS Verification Team NWS Verification Team Draft03/23/2009 These slides include notes, which can be expanded.

RFC Climate Requirements 2 nd NOAA Climate NWS Dialogue Meeting January 4, 2006 Kevin Werner.

Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss A more reliable COSMO-LEPS F. Fundel, A. Walser, M. A.

Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques.

1 Probabilistic Forecast Verification Allen Bradley IIHR Hydroscience & Engineering The University of Iowa RFC Verification Workshop 16 August 2007 Salt.

Probabilistic seasonal water supply forecasting in an operational environment: the USDA-NRCS Perspective Tom Pagano

Sources of Skill and Error in Long Range Columbia River Streamflow Forecasts: A Comparison of the Role of Hydrologic State Variables and Winter Climate.

Verification of ensemble systems Chiara Marsigli ARPA-SIMC.

Alan F. Hamlet Andy Wood Dennis P. Lettenmaier JISAO Center for Science in the Earth System Climate Impacts Group and the Department.

Nathalie Voisin 1, Florian Pappenberger 2, Dennis Lettenmaier 1, Roberto Buizza 2, and John Schaake 3 1 University of Washington 2 ECMWF 3 National Weather.

Common verification methods for ensemble forecasts

Details for Today: DATE:13 th January 2005 BY:Mark Cresswell FOLLOWED BY:Practical Dynamical Forecasting 69EG3137 – Impacts & Models of Climate Change.

DOWNSCALING GLOBAL MEDIUM RANGE METEOROLOGICAL PREDICTIONS FOR FLOOD PREDICTION Nathalie Voisin, Andy W. Wood, Dennis P. Lettenmaier University of Washington,

VERIFICATION OF A DOWNSCALING SEQUENCE APPLIED TO MEDIUM RANGE METEOROLOGICAL PREDICTIONS FOR GLOBAL FLOOD PREDICTION Nathalie Voisin, Andy W. Wood and.

Probabilistic Forecasts Based on “Reforecasts” Tom Hamill and Jeff Whitaker and

Verification methods - towards a user oriented verification The verification group.

Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.

National Oceanic and Atmospheric Administration’s National Weather Service Colorado Basin River Forecast Center Salt Lake City, Utah 11 The Hydrologic.

HIC Meeting, 02/25/2010 NWS Hydrologic Forecast Verification Team: Status and Discussion Julie Demargne OHD/HSMB Hydrologic Ensemble Prediction (HEP) group.

Evaluation of Skill and Error Characteristics of Alternative Seasonal Streamflow Forecast Methods Climate Forecast and Estimated Initial Soil Moisture.

Data Mining: Concepts and Techniques

Unit 4 Statistical Analysis Data Representations

Verifying and interpreting ensemble products

Precipitation Products Statistical Techniques

Eric Jones Senior Hydrologist Lower Mississippi River Forecast Center

Applications of Medium Range To Seasonal/Interannual Climate Forecasts For Water Resources Management In the Yakima River Basin of Washington State Shraddhanand.

Nathalie Voisin, Andy W. Wood and Dennis P. Lettenmaier

Probabilistic forecasts

Alan F. Hamlet Andrew W. Wood Dennis P. Lettenmaier

Application of a global probabilistic hydrologic forecast system to the Ohio River Basin Nathalie Voisin1, Florian Pappenberger2, Dennis Lettenmaier1,

N. Voisin, J.C. Schaake and D.P. Lettenmaier

Andy Wood and Dennis P. Lettenmaier

Christoph Gebhardt, Zied Ben Bouallègue, Michael Buchhold

Can we distinguish wet years from dry years?

Alan F. Hamlet, Andrew W. Wood, Dennis P. Lettenmaier,

Verification of SPE Probability Forecasts at SEPC

Short Range Ensemble Prediction System Verification over Greece

Presentation transcript:

1 Verification Continued… Holly C. Hartmann Department of Hydrology and Water Resources University of Arizona RFC Verification Workshop, 08/14/2007

2 1.Introduction to Verification -Applications, Rationale, Basic Concepts -Data Visualization and Exploration -Deterministic Scalar measures 2. Categorical measures – KEVIN WERNER -Deterministic Forecasts -Ensemble Forecasts 3. Diagnostic Verification -Reliability -Discrimination -Conditioning/Structuring Analyses 4. Lab Session/Group Exercise - Developing Verification Strategies - Connecting to Forecast Operations and Users Agenda

3 Probabilistic Ensemble Forecasts From: California-Nevada River Forecast Center

4 Probabilistic Ensemble Forecasts From: California-Nevada River Forecast Center

5 Probabilistic Ensemble Forecasts From: A. Hamlet, University of Washington

6

7 Probabilistic Ensemble Forecasts From: A. Hamlet, University of Washington

8 Identifies systematic flaws of an ensemble prediction system. Shows effectiveness of ensemble distribution in sampling the observations. Does not indicate that the ensemble will be of practical use. Talagrand Diagram – Also Called Ranked Histogram

9 With only one ensemble member ( | ) all (100%) observations (  ) will fall “outside” With two ensemble members two out of three observations ( 2/3=67%) should fall outside With three ensemble members two out of four observations ( 2/4=50%) should fall outside  |   |  |   |  |  |  For any number of ensemble members, 2/#members should fall outside the ensemble Identifies systematic flaws of an ensemble prediction system. Shows effectiveness of ensemble distribution in sampling the observations. Does not indicate that the ensemble will be of practical use. Principle Behind Talagrand Diagram Talagrand Diagram – Also Called Ranked Histogram Adapted from A. Persson, 2006

10 Talagrand Diagram Computation Example YEARE1 E2 E3 E OBS Four sample ensemble members (E1 – E4) for daily flow forecasts (produced from reforecasts using carryover each year) Step 1: Rank lowest to highest for each year. Four members results in 5 bins. Step 2: Determine which bin the corresponding observation falls into. Step 3: Tally how many observations fall in each bin. Step 4: Plot frequency of observations for ranked bin. Bin # Bin1 Bin2 Bin3 Bin4 Bin5 Bin # Tally

11 Talagrand Diagram Computation Example YEARE1 E2 E3 E OBS Four sample ensemble members (E1 – E4) for daily flow forecasts (produced from reforecasts using carryover each year) Step 1: Rank lowest to highest for each year. Four members results in 5 bins. Step 2: Determine which bin the corresponding observation falls into. Step 3: Tally how many observations fall in each bin. Step 4: Plot frequency of observations for ranked bin. Bin # Bin1 Bin2 Bin3 Bin4 Bin5 Bin # Tally

12 Talagrand Diagram Computation Example YEARE1 E2 E3 E OBS Four sample ensemble members (E1 – E4) ranked lowest to highest for daily flow (produced from reforecasts using carryover in each year) Bin # Bin # Tally Bin1 Bin2 Bin3 Bin4 Bin5 Frequency

13 Talagrand Diagram: 25 traces/ensemble, 375 observations Example: “U-Shaped” Observations too often falling outside ensemble Indicates ensemble spread too small Example: “L-Shaped” Observations too often larger (smaller) than ensemble Indicates under- (over-) forecasting bias Example: “N-Shaped” (domed shaped) Observations too rarely falling outside ensemble Indicates ensemble spread is too big Example: “Flat-Shaped” Observations falling uniformly across ensemble Indicates appropriately sized ensemble distribution

14 Talagrand Diagram Example: Interpretation? YEARE1 E2 E3 E OBS Bin # Bin # Tally Bin1 Bin2 Bin3 Bin4 Bin5 Four sample ensemble members (E1 – E4) ranked lowest to highest for daily flow (produced from reforecasts using carryover in each year) ??? Frequency

15 Distributions-oriented Forecast Evaluation leads to Diagnostic Verification It’s all about conditional and marginal distributions! P(O|F), P(F|O), P(F), P(O) Reliability, Discrimination, Sharpness, Uncertainty

16 Forecast Reliability -- P(O|F) For a specified forecast condition, what does the distribution of observations look like? Forecasted Probability Relative frequency of observed Forecasted Probability Relative frequency of observed User perspective: “When you say 20% chance of flood flows, how often do flood flows actually happen?” User perspective: “When you say 80% chance of flood flows, how often do flood flows actually happen?”

17 l Good reliability – close to diagonal l Sharpness diagram (p(f)) –histogram of forecasts in each probability bin shows shows marginal distribution of forecasts The reliability diagram is conditioned on the forecasts. That is, given that X was predicted, what was the outcome? Reliability (Attributes) Diagram – Reliability, Sharpness

18 Reliability Diagram Example Computation YEARE1 E2 E3 E OBS Step 1: Choose threshold value to base probability forecasts on. For simplicity we’ll choose the mean forecast over all years and all ensembles (= 208).

19 Reliability Diagram Example Computation YEARE1 E2 E3 E OBS Step 2: Choose how many forecast probability categories to use (5 here: 0,.25,.5,.75,1) Step 3: For each forecast, calculate the forecast probability below the threshold value. P(peak for < 208)

20 Reliability Diagram Example Computation YEARE1 E2 E3 E OBS Step 2: Choose how many forecast probability categories to use (5 here: 0,.25,.5,.75,1) Step 3: For each forecast, calculate the forecast probability below the threshold value. P(peak for < 208)

21 OBS Step 4: Group the observations into groups of equal forecast probability (or, more generally, into forecast probability categories). P(peak for < 208) P(peak < 208) = P(peak < 208) = , 98, 233 P(peak < 208) = , 301, 245, 248, 227 P(peak < 208) = 0.75 N/A P(peak < 208) = P(peak < 208) = , 98, 233 P(peak < 208) = , 301, 245, 248, 227 Reliability Diagram Example Computation P(peak < 208) = 1.0

22 OBS Step 4: Group the observations into groups of equal forecast probability (or, more generally, into forecast probability categories). P(peak for < 208) P(peak < 208) = P(peak < 208) = , 98, 233 P(peak < 208) = , 301, 245, 248, 227 P(peak < 208) = 0.75 N/A P(peak < 208) = P(peak < 208) = , 98, 233 P(peak < 208) = , 301, 245, 248, 227 P(peak < 208) = , 156, 167 Reliability Diagram Example Computation

23 Step 5: For each group, calculate the frequency of observations above the threshold value, 208 cfs. P(peak < 208) = P(peak < 208) = , 98, 233 P(peak < 208) = , 301, 245, 248, 227 P(peak < 208) = 0.75 N/A P(peak < 208) = , 156, 167 P(obs peak < 208 given [P(peak for < 208) = 0.0]) = 0/1 = 0.0 P(obs peak < 208 given [P(peak for < 208) = 0.25]) = 1/3 = 0.33 P(obs peak < 208 given [P(peak for < 208) = 0.5]) = 1/5 = 0.2 P(obs peak < 208 given [P(peak for < 208) = 1.0]) = P(obs peak < 208 given [P(peak for < 208) = 0.75]) = 0/0 = NA Reliability Diagram Example Computation

24 Step 5: For each group, calculate the frequency of observations above the threshold value, 208 cfs. P(peak < 208) = P(peak < 208) = , 98, 233 P(peak < 208) = , 301, 245, 248, 227 P(peak < 208) = 0.75 N/A P(peak < 208) = , 156, 167 P(obs peak < 208 given [P(peak for < 208) = 0.0]) = 0/1 = 0.0 P(obs peak < 208 given [P(peak for < 208) = 0.25]) = 1/3 = 0.33 P(obs peak < 208 given [P(peak for < 208) = 0.5]) = 1/5 = 0.2 P(obs peak < 208 given [P(peak for < 208) = 1.0]) = 3/3 = 1 P(obs peak < 208 given [P(peak for < 208) = 0.75]) = 0/0 = NA Reliability Diagram Example Computation

25 Step 6: Plot centroid of the forecast category (just points in our case) on the x-axis against the observed frequency within each forecast category on the y-axis. Include the 45 degree diagonal for reference. Reliability Diagram Example Computation

26 Step 7: Include sharpness plot showing the number of observation/forecast pairs in each category. Reliability Diagram Example Computation

27 l Good reliability – close to diagonal l Sharpness diagram (p(f)) –histogram of forecasts in each probability bin shows marginal distribution of forecasts l Good resolution –wide range of frequency of observations corresponding to forecast probabilities l Skill – related to Brier Skill Score, in reference to sample climatology (not historical climatology) The reliability diagram is conditioned on the forecasts. That is, given that X was predicted, what was the outcome? Reliability Diagram – Reliability, Sharpness – P(O|F)

28 Overall relative frequency of observations (sample climatology) Points closer to perfect-reliability line than to no-resolution line: subsamples of probabilistic forecast contribute positively to overall skill (as defined by BSS) in reference to sample climatology No-skill line : halfway between perfect-reliability line and no- resolution line, with sample climatology as a reference Attributes Diagram – Reliability, Resolution, Skill/No-skill

29 ClimatologyMinimal RESolutionUnderforecasting Good RES, at expense of REL Reliable forecasts of rare event Small sample size Source: Wilks (1995) Interpretation of Reliability Diagrams

30 Interpretation of Reliability Diagrams Reliability P[O|F] Does the frequency of occurrence match your probability statement? Identifies conditional bias Relative frequency of observations Forecasted probability No resolution

31 EVS Reliability Diagram Examples 25 th Percentile Observed Flows (low flows) Sharp forecasts, but low resolution Arkansas-Red Basin, 24-hr flows, lead time 1-14 days 85 th Percentile Observed Flows (high flows) Good reliability at shorter lead times, long-leads miss high events From: J. Brown, EVS Manual

32 Historical seasonal water supply outlooks Colorado River Basin Morrill, Hartmann, and Bales, 2007

33 Forecast probability Relative Frequency of Observations Jan 1 2) These months show best reliability; low resolution limiting reliability 1) Few high prob. fcasts, good reliability between 10-70% probability; reliability improves. Reliability: Colorado Basin ESP Seasonal Supply Outlooks Apr 1 Mar 1 Jun 1 Jan 1 Apr 1 LC JM (5 mo. lead) LC MM (3 mo. lead) LC AM (2 mo. lead) UC JJy (7 mo. lead) UC AJy (4 mo. lead) UC JnJy (2 mo. lead) 3) Reliability decreases for later forecasts as resolution increases; UC good at extremes. high 30% mid 40% low 30% Franz, Hartmann, and Sorooshian, 2003

34 For a specified observation category, what do the forecast distributions look like? Discrimination – P(F|O) “When dry conditions happen… What do the forecasts usually look like? You sure hope that forecasts look different when there’s a drought, compared to when there’s a flood!

35 Discrimination – P(F|O) You sure hope that forecasts look different when there’s a drought, compared to when there’s a flood! Example: NWS CPC Seasonal climate outlooks, sorted into DRY cases (lowest tercile), , all forecasts, all lead-times Good discrimination! Not much discrimination! Forecasted Probability Relative frequency of indicated forecast Climatology Probability of dry Probability of wet Forecasted Probability Relative frequency of indicated forecast Climatology Probability of dry Probability of wet

36 Relative Frequency of Forecasts High Mid- Low There is some discrimination… Early forecasts warned “High flows less likely” Jan 1 Jan-May When unusually low flows happened… P(F|Low flows). Low < 30 th percentile Franz, Hartmann, and Sorooshian (2003) Forecast probability Discrimination: Lower Colorado ESP Supply Outlooks

37 Relative Frequency of Forecasts Good Discrimination… Forecasts were saying: 1) high and mid- flows less likely. 2) Low flows more likely Jan 1 Forecast probability Apr 1 Jan-May Apr-May High Mid- Low There is some discrimination… Early forecasts warned “High flows less likely” Discrimination: Lower Colorado ESP Supply Outlooks When unusually low flows happened… P(F|Low flows). Low < 30 th percentile Franz, Hartmann, and Sorooshian (2003)

38 Relative Frequency of Forecasts high 30% mid 40% low 30% 1)High flows less likely. 2) No discrimination between mid and low flows. 3) Both UC and LC show good discrimination for low flows at 2-month lead time. Jan 1 Forecast probability Apr 1 Lower Colorado Basin Jan-May (5 mo. lead) April-May (2 mo. lead) Jan 1 Jun 1 Upper Colorado Basin Jan-July (7 mo. lead) June-July (2 mo. lead) For observed flows in lowest 30% of historic distribution Discrimination: Colorado Basin ESP Supply Outlooks Franz, Hartmann, and Sorooshian (2003)

39 Historical seasonal water supply outlooks Colorado River Basin

40 All observation CDF is plotted and color coded by tercile. Forecast ensemble members are sorted into 3 groups according to which tercile its associated observation falls into. The CDF for each group is plotted in the appropriate color. i.e. high is blue. Discrimination: CDF Perspective Credit: K. Werner

41 In this case, there is relatively good discrimination since the three conditional forecast CDFs separate themselves. Discrimination Credit: K. Werner

42 Discrimination Example Computation YEARE1 E2 E3 E OBS Step 1: Order observations and divide ordered list into categories. Here we will use terciles (≤ 167, 206 ≤ ≤ 245, ≥ 248). OBS Tercile Low Middle High Low Middle High Middle Low Credit: K. Werner

43 Discrimination Example Computation YEARE1 E2 E3 E OBS Step 2: Group forecast ensemble members according to OBS tercile. Low OBS Forecasts: 42, 74, 82, 90, 114, 277, 351, 356, 98, 170, 204, 205, 94,135, 156, 158 OBS Tercile Low Middle High Low Middle High Middle Low Credit: K. Werner

44 Discrimination Example Computation YEARE1 E2 E3 E OBS Mid OBS Forecasts: 65, 143, 223, 227, 69, 169, 229, 236, 94, 219, 267, 270, OBS Tercile Low Middle High Low Middle High Middle Low Credit: K. Werner Step 2: Group forecast ensemble members according to OBS tercile.

45 Discrimination Example Computation YEARE1 E2 E3 E OBS Mid OBS Forecasts: 65, 143, 223, 227, 69, 169, 229, 236, 94, 219, 267, 270, 108, 189, 227, 228 OBS Tercile Low Middle High Low Middle High Middle Low Credit: K. Werner Step 2: Group forecast ensemble members according to OBS tercile.

46 Discrimination Example Computation YEARE1 E2 E3 E OBS Hi OBS Forecasts: 82, 192, 295, 300, 142, 291, 349, 356, 59, 175, 244, 250 OBS Tercile Low Middle High Low Middle High Middle Low Credit: K. Werner Step 2: Group forecast ensemble members according to OBS tercile.

47 Discrimination Example Computation YEARE1 E2 E3 E OBS Hi OBS Forecasts: 82, 192, 295, 300, 211, 397, 514, , 291, 349, 356, 59, 175, 244, 250 OBS Tercile Low Middle High Low Middle High Middle Low Credit: K. Werner Step 2: Group forecast ensemble members according to OBS tercile.

48 Discrimination Example Computation OBS Step 3: Plot all-observation CDF color coded by tercile (≤ 167, 206 ≤ ≤ 245, ≥ 248). Credit: K. Werner OBS Tercile Low Middle High Low Middle High Middle Low

49 Step 4: Add forecasts conditioned on observed terciles CDFs to plot. Low OBS Forecasts: 42, 74, 82, 90, 114, 277, 351, 356, 98, 170, 204, 205, 94, 135, 156, 158 Mid OBS Forecasts: 65, 143, 223, 227, 69, 169, 229, 236, 94, 219, 267, 270, 108, 189, 227, 228 Hi OBS Forecasts: 82, 192, 295, 300, 211, 397, 514, 544, 142, 291, 349, 356, 59, 175, 244, 250 Discrimination Example Computation Credit: K. Werner

50 Step 5: Discrimination is shown by the degree to which the conditional forecast CDFs are separated from each other. In this case, high forecasts discriminate better than mid and low forecasts. Discrimination Example Computation Credit: K. Werner

51 How well do April – July volume forecasts discriminate when they are made in Jan, Mar, and May? Poor discrimination in Jan between forecasting high and medium flows. Best discrimination in May. Discrimination Credit: K. Werner

52 Another way to look at discrimination using PDF’s in lieu of CDF’s. The more separation between the PDF’s the better the discrimination. Discrimination Credit: K. Werner

53 Deterministic forecasts traditional in hydrology sub-optimal for decision making Common perspective “Deterministic model simulations and probabilistic forecasts … are two entirely different types of products. Direct comparison of probabilistic forecasts with deterministic single valued forecasts is extremely difficult” Comparing Deterministic & Probabilistic Forecasts - Anonymous

54 How can we compare deterministic and probabilistic forecasts? Deterministic Probabilistic Source: XEFS Design Team, 2007 Option: Use ensemble median with standard metrics – No! x

55 From: A. Hamlet, University of Washington The ensemble mean minimizes error, but doesn’t represent the overall behavior. “Pretend Determinism”

56 What’s wrong with using ‘deterministic’ metrics? Metrics using only central tendency of each forecast pdf fail to distinguish between forecasts 1-3, but will identify 4 as inferior. Metrics that reward accuracy but punish spread will rank the forecast skill from 1 to 4. Obs Value PDF From: A. Hamlet, University of Washington

57 How can we compare deterministic and probabilistic forecasts? Deterministic Probabilistic Source: XEFS Design Team, 2007 Option: Use ensemble median with standard metrics – No! x

58 PDF Climatology distribution Forecast distribution Tercile boundaries (equal probability) Deterministic forecast Jack-knife calibration error = PDF of error distribution can determine any quantiles Deterministic vs. Probabilistic Forecasts Observation Flow, Q Approach used by Morrill, Hartmann, Bales 2007

59 Lab Session -- Group Exercise Choose a set of forecasts. Develop strategies for verifying these forecasts from two perspectives: - Users - Forecasters during operations Report back to group. Repeat for second set of forecasts, if time permits.