Presentation is loading. Please wait.

Presentation is loading. Please wait.

Verifying and interpreting ensemble products

Similar presentations


Presentation on theme: "Verifying and interpreting ensemble products"— Presentation transcript:

1 Verifying and interpreting ensemble products

2 What do we mean by “calibration” or “post-processing”?
“bias” obs Forecast PDF Probability Probability Forecast PDF obs “spread” or “dispersion” calibration Temperature [K] Temperature [K] Post-processing has corrected: the “on average” bias as well as under-representation of the 2nd moment of the empirical forecast PDF (i.e. corrected its “dispersion” or “spread”)

3 Specific Benefits of Post-Processing
Improvements in: statistical accuracy, bias, and, reliability Correcting basic forecast statistics (increasing user “trust”) discrimination and sharpness Increasing “information content”; in many cases, gains equivalent to years of NWP model development! Relatively inexpensive! Statistical post-processing can improve not only statistical (unconditional) accuracy of forecasts (as measured by reliability diagrams, rank histograms, etc), but also factors more related to “conditional” forecast behavior (as measured by, say, RPS, skill-spread relations,etc.) Last point: inexpensive statistically-derived skill improvements can equate to significant NWP developments (expensive)

4 (cont) Benefits of Post-Processing
Essential for tailoring to local application: NWP provides spatially- and temporally-averaged gridded forecast output => Applying gridded forecasts to point locations requires location specific calibration to account for spatial- and temporal- variability ( => increasing ensemble dispersion) Statistical post-processing can improve not only statistical (unconditional) accuracy of forecasts (as measured by reliability diagrams, rank histograms, etc), but also factors more related to “conditional” forecast behavior (as measured by, say, RPS, skill-spread relations,etc.) Last point: inexpensive statistically-derived skill improvements can equate to significant NWP developments (expensive)

5 Raw versus Calibrated PDF’s
Blue is “raw” ensemble Black is calibrated ensemble Red is the observed value Notice: significant change in both “bias” and dispersion of final PDF (also notice PDF asymmetries) obs

6 Verifying ensemble (probabilistic) forecasts
Overview: Rank histogram Mean square error (MSE) Brier score Rank Probability Score (RPS) Reliability diagram Relative Operating Characteristic (ROC) curve Skill score

7 Example: January T Before Calibration After Calibration
Quartiles? Note underbias of forecasts on plot on the left Plot on right, observations now generally embedded with the ensemble bundle Comparing left/right ovals, notice the significant variation in the PDF’s both in their non-gaussian structure and the change in the ensemble spread Black curve shows observations; colors are ensemble

8 You cannot verify an ensemble forecast with a single observation.
Rank Histograms – measuring the reliability of an ensemble forecast You cannot verify an ensemble forecast with a single observation. The more data you have for verification, (as is true in general for other statistical measures) the more certain you are. Rare events (low probability) require more data to verify => as do systems with many ensemble members. From Barb Brown

9 From Tom Hamill

10 Troubled Rank Histograms
Counts Counts Ensemble # Ensemble # Slide from Matt Pocernic

11 After quantile regression, rank histograms uniform
Example: July T Before Calibration After Calibration Before callibration, U-shaped rank histogram (underdispersive) with larger under- than over-bias After calibration, observation falls within ensemble bundle (still not perfectly “uniform” rank histogram due to small sample size) -- NOTE: for our current application, improving the information content of the ensemble spread (I.e. as a representation of potential forecast error) is the most significant gain in our calibration. After quantile regression, rank histograms uniform

12 Continuous scores: MSE
Attribute: measures accuracy Average of the squares of the errors: it measures the magnitude of the error, weighted on the squares of the errors it does not indicate the direction of the error Quadratic rule, therefore large weight on large errors:  good if you wish to penalize large error  sensitive to large values (e.g. precipitation) and outliers; sensitive to large variance (high resolution models); encourage conservative forecasts (e.g. climatology) => For ensemble forecast, use ensemble mean Slide from Barbara Casati

13 Scatter-plot and Contingency Table
Brier Score Does the forecast detect correctly temperatures above 18 degrees ? y = forecasted event occurence o = observed occurrence (0 or 1) i = sample # of total n samples => Note similarity to MSE Slide from Barbara Casati

14 Rank Probability Score
for multi-categorical or continuous variables

15 Conditional Distributions
Conditional histogram and conditional box-plot Slide from Barbara Casati

16 Reliability (or Attribute) Diagram
Slide from Matt Pocernic

17 Scatter-plot and Contingency Table
Does the forecast detect correctly temperatures above 18 degrees ? Does the forecast detect correctly temperatures below 10 degrees ? Slide from Barbara Casati

18 Discrimination Plot Decision Threshold Outcome = No Outcome = Yes Hits
False Alarms Slide from Matt Pocernic

19 Receiver Operating Characteristic (ROC) Curve
Slide from Matt Pocernic

20 Skill Scores Single value to summarize performance.
Reference forecast - best naive guess; persistence, climatology A perfect forecast implies that the object can be perfectly observed Positively oriented – Positive is good

21 References: Jolliffe and Stephenson (2003): Forecast Verification: a practitioner’s guide, Wiley & Sons, 240 pp. Wilks (2005): Statistical Methods in Atmospheric Science, Academic press, 467 pp. Stanski, Burrows, Wilson (1989) Survey of Common Verification Methods in Meteorology


Download ppt "Verifying and interpreting ensemble products"

Similar presentations


Ads by Google