Verification of multi-model ensemble forecasts using the TIGGE dataset www.ec.gc.ca Lawrence.wilson@ec.gc.ca Verification of multi-model ensemble forecasts using the TIGGE dataset Laurence J. Wilson Environment Canada Anna Ghelli ECMWF With thanks to Marcel Vallée
Outline Introduction – TIGGE goals and verification Status of verification of TIGGE ensembles Standard methods Spatial methods Precipitation verification project: plan and early results Summary SEE the extended abstract also November 20, 2018
Verification and the goals of TIGGE Enhance collaborative research Enable evolution towards GIFS Develop ensemble combination methods; bias removal Essential question: If we are going to move towards a GIFS, then we must demonstrate that the benefits of combined ensembles are worth the effort with respect to single-center ensembles. OR: Do we get a “better” pdf by merging ensembles? Verification – Relevant, user-oriented November 20, 2018
Status of Verification of TIGGE ensembles Mostly model-oriented verification so far Upper air data Against analyses Standard scoring and case studies Studies on the TIGGE website Park et al, 2008 First study involving several months of data Found modest improvement with combined ensembles, greatest benefits in tropics and lower atmosphere “Advantage” of using one’s own analysis as truth Pappenberger et al. Case study of flooding event in Romania User-oriented, Q-Q plots, RPS and RMSE main scores used Multimodel ensemble has best average properties, ECMWF next. November 20, 2018
Studies using TIGGE data (cont’d) Johnson and Swinbank, 2008 Study of calibration/combination methods Used only 3 ensembles Mslp and 2m temperature, but from analyses Multimodel ensemble improves on individual ensembles, but not by much in general. More at 2m than 500mb Matsueda 2008 Comparison of 5 combined ensembles vs ECMWF alone RMSE skill and RPSS with ECMWF as standard forecast Multimodel eps outperforms ECMWF at medium and longer ranges. November 20, 2018
Status of TIGGE – related verification Current efforts – Verification of surface variables? This conference ---? Studies using spatial methods Ebert – application of CRA technique to ensemble forecasts. So far, only ECMWF. Application of Wilks minimum spanning tree or T. Gneiting’s multi-dimensional rank histogram for TC centers. (idea stage) Precipitation verification project: November 20, 2018
Precipitation verification project Goal: to verify global 24h precipitation forecasts from all the ensembles in the TIGGE archive and combinations One region at a time, using highest density observations Canada and Europe so far Methodology Cherubini et al upscaling, verify only where data available Single station, nearest gridpoint where data is sparser Kernel density fitting following Peel and Wilson to look at extremes of distributions. November 20, 2018
Precipitation verification project : methodology - Europe Upscaling: 1x1 gridboxes, limit of model resolution Average obs over grid boxes, at least 9 stns per grid box (Europe data) Verify only where enough data Matches obs and model resolution locally Answers questions about the quality of the forecasts within the capabilities of the model Most likely users are modelers. November 20, 2018
European Verification -Upscaled observations according to Cherubini et al (2002) -OBS from gauges in Spain, Portugal, France, Italy, Switzerland, Netherlands, Romania, Czech Republic, Croatia, Austria, Denmark, UK, Ireland, Finland and Slovenia -At least 9 stns needed per grid box to estimate average -24h precip totals, thresholds 1,3,5,10,15,20,25,30 mm -one year (oct 07 to oct 08 November 20, 2018
Reliability – Summer 08 – Europe – 42h November 20, 2018
Reliability – Summer 08- Europe 114 h November 20, 2018
Reliability – Winter 07-08 – Europe – 114h November 20, 2018
ROC – Summer 08 – Europe – 42h November 20, 2018
ROC – Summer 08 – Europe – 114 h November 20, 2018
Precipitation verification project: methodology - Canada Single station verification Canadian verification over 20 widely-spaced stations, only one station per gridbox; comparison of nearest gridpoint fcst to obs Pointwise verification, does not (we cannot) upscale properly because don’t have the data density necessary. Valid nevertheless as absolute verification of model predictions November 20, 2018
Results – Canada – ROC curves – 24h November 20, 2018
Results – Canada – ROC Curves – 144h November 20, 2018
RMSE of pcpn probability – Canada – Oct 07 to Oct 08 – 20 stns 2.0 mm 10 mm BOM in blue (darker blue); ECMWF in red; UKMET in green CMC in gray; NCEP in cyan (lighter blue) November 20, 2018
Combined ensemble verification Verification of TIGGE forecasts with respect to surface observations – next steps Combined ensemble verification Other regions – Southern Africa should be next. – Non-GTS data is available Evaluation of extreme events – kernel density fitting to ensembles. Other high-density observation datasets such as SHEF in the US Other variables: TC tracks and related surface weather Use of spatial verification methods THEN maybe we will know the answer to the TIGGE question. November 20, 2018
www.ec.gc.ca November 20, 2018
Issues for TIGGE verification Use of analyses as truth – advantage of one’s own model. Alternatives: Each own analysis Analyses as ensemble (weighted or not) Random selection from all analyses Use “best” analysis; eliminate the related model from comparison Average analysis (may have different statistical characteristics) Model-independent analysis (restricted to data – rich areas, but that is where verification might be most important for most users Problem goes away for verification against observations (as long as they are not qc’d with respect to any model) November 20, 2018
Park et al study – impact of analysis used as truth in verification November 20, 2018
Issues for TIGGE Verification (cont’d) Bias adjustment/calibration Reason: to eliminate “artifical” spread in combined ensemble arising from systematic differences in component models First (mean) and second (spread) moments Several studies have/are being undertaken Results on benefits not conclusive so far Due to too small sample for bias estimation? Alternative: Rather than correcting bias, eliminate inter-ensemble component of bias and spread variation. November 20, 2018
ROC – Winter 07-08 – Europe – 42h November 20, 2018
ROC – Winter 07-08 – Europe – 114h November 20, 2018
Reliability – Winter 07-08 – Europe – 42h November 20, 2018
Results – Canada – Brier Skill, Resolution and Reliability November 20, 2018