Verification of probabilistic forecasts: comparing proper scoring rules Thordis L. Thorarinsdottir and Nina Schuhen 11.04.2018.

Verification of probabilistic forecasts: comparing proper scoring rules
Thordis L. Thorarinsdottir and Nina Schuhen

Introduction Proper scoring rules:
measure the accuracy of a forecast assign numerical penalty Often used to rank different models or forecasters For both deterministic and probabilistic verification Propriety: expected score is optimized for true distribution

Real-life forecast szenario
Which proper scoring rule should I use? What if they give conflicting results? How should I report results? Is my data set sufficient? Which should I use for model parameter optimization? Short: How to use proper scores in practice!

Proper scoring rules Squared error: Absolute error: Ignorance score:
Continuous ranked probability score:

Scores behave differently…

Simulation study: concept
Draw random data from a «true» distribution with Verifying observations: 1000 data points Training data: 300 data points for each observation Estimate forecast distributions from the training data (method of moments) Make forecasts from the estimated distributions (50 members) Evaluate against observations

Forecasting distributions
Expected value Variance Normal non- central t log-normal Gumbel + true distribution = Euler-Mascheroni constant

Example forecast: scores vs. obs
IGN has different minimum due to skewness of the Gumbel distribution All scores are minimized at the same value => proper

Mean scores and bootstrap intervals
Only IGN has large difference between Gumbel and other forecasters 1000 forecasts log-normal is best if truth is unknown 10^6 forecasts True distribution always has the lowest score, same ranking for all scores

PIT histograms (normal sample size)

PIT histograms (huge sample size)

Variation: Gumbel true distribution
Estimated Gumbel has lower mean score than the true distribution 1000 forecasts 10^6 forecasts True distribution always has the lowest score, same ranking for all scores

Summary For a huge sample size, all proper scores give the same result
For more realistic sample sizes, they differ widely The best model doesn’t always get the best score AE and CRPS have trouble identifying appropriate distributions Ignorance score is sensitive to shape of distributions => There is no «best» scoring rule

Summary For robust results:
Use error bars! Use a combination of scores CRPS is very useful if the distribution is unknown or can not be easily specified Minimum score estimation: CRPS or Maximum Likelihood? No clear answer Depends on the forecast situation and model choice

Read more in… Statistical Postprocessing of Ensemble Forecasts
Editors: Stéphane Vannitsem, Daniel S. Wilks, Jakob W. Messner Elsevier Planned publication: September 2018

Verification of probabilistic forecasts: comparing proper scoring rules Thordis L. Thorarinsdottir and Nina Schuhen 11.04.2018.

Similar presentations

Presentation on theme: "Verification of probabilistic forecasts: comparing proper scoring rules Thordis L. Thorarinsdottir and Nina Schuhen 11.04.2018."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Verification of probabilistic forecasts: comparing proper scoring rules Thordis L. Thorarinsdottir and Nina Schuhen 11.04.2018.

Similar presentations

Presentation on theme: "Verification of probabilistic forecasts: comparing proper scoring rules Thordis L. Thorarinsdottir and Nina Schuhen 11.04.2018."— Presentation transcript:

Similar presentations

About project

Feedback