Fuzzy verification using the Fractions Skill Score

Slides:



Advertisements
Similar presentations
1 00/XXXX © Crown copyright Use of radar data in modelling at the Met Office (UK) Bruce Macpherson Mesoscale Assimilation, NWP Met Office EWGLAM / COST-717.
Advertisements

Details for Today: DATE:3 rd February 2005 BY:Mark Cresswell FOLLOWED BY:Assignment 2 briefing Evaluation of Model Performance 69EG3137 – Impacts & Models.
© Crown copyright Met Office RAINGAIN Kick-Off Meeting Benefits of High Spatial and Temporal Resolution for Urban Catchments from existing Radars.
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Quantitative precipitation forecasts in the Alps – first.
Univ of AZ WRF Model Verification. Method NCEP Stage IV data used for precipitation verification – Stage IV is composite of rain fall observations and.
Ensemble Post-Processing and it’s Potential Benefits for the Operational Forecaster Michael Erickson and Brian A. Colle School of Marine and Atmospheric.
© Crown copyright Met Office Operational OpenRoad verification Presented by Robert Coulson.
4th Int'l Verification Methods Workshop, Helsinki, 4-6 June Methods for verifying spatial forecasts Beth Ebert Centre for Australian Weather and.
ESA DA Projects Progress Meeting 2University of Reading Advanced Data Assimilation Methods WP2.1 Perform (ensemble) experiments to quantify model errors.
Page 1© Crown copyright 2007SRNWP 8-11 October 2007, Dubrovnik SRNWP – Revised Verification Proposal Clive Wilson Presented by Terry Davies at SRNWP Meeting.
Page 1© Crown copyright 2005 SRNWP – Revised Verification Proposal Clive Wilson, COSMO Annual Meeting September 18-21, 2007.
How can LAMEPS * help you to make a better forecast for extreme weather Henrik Feddersen, DMI * LAMEPS =Limited-Area Model Ensemble Prediction.
“High resolution ensemble analysis: linking correlations and spread to physical processes ” S. Dey, R. Plant, N. Roberts and S. Migliorini Mesoscale group.
“High resolution ensemble analysis: linking correlations and spread to physical processes ” S. Dey, R. Plant, N. Roberts and S. Migliorini NWP 4: Probabilistic.
© Crown copyright Met Office Preliminary results using the Fractions Skill Score: SP2005 and fake cases Marion Mittermaier and Nigel Roberts.
Verification of the distributions Chiara Marsigli ARPA-SIM - HydroMeteorological Service of Emilia-Romagna Bologna, Italy.
On the spatial verification of FROST-2014 precipitation forecast fields Anatoly Muraviev (1), Anastasia Bundel (1), Dmitry Kiktev (1), Nikolay Bocharnikov.
“New tools for the evaluation of convective scale ensemble systems” Seonaid Dey Supervisors: Bob Plant, Nigel Roberts and Stefano Migliorini.
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Priority project « Advanced interpretation and verification.
Page 1© Crown copyright Scale selective verification of precipitation forecasts Nigel Roberts and Humphrey Lean.
Use of Mesoscale Ensemble Weather Predictions to Improve Short-Term Precipitation and Hydrological Forecasts Michael Erickson 1, Brian A. Colle 1, Jeffrey.
DIAMET meeting 7 th-8th March 2011 “New tools for the evaluation of convective scale ensemble systems” Seonaid Dey Supervisors: Bob Plant, Nigel Roberts.
EMS 2013 (Reading UK) Verification techniques for high resolution NWP precipitation forecasts Emiel van der Plas Kees Kok Maurice.
Verification of Precipitation Areas Beth Ebert Bureau of Meteorology Research Centre Melbourne, Australia
Object-oriented verification of WRF forecasts from 2005 SPC/NSSL Spring Program Mike Baldwin Purdue University.
Diagnostic Evaluation of Mesoscale Models Chris Davis, Barbara Brown, Randy Bullock and Daran Rife NCAR Boulder, Colorado, USA.
Page 1© Crown copyright 2004 The use of an intensity-scale technique for assessing operational mesoscale precipitation forecasts Marion Mittermaier and.
Trials of a 1km Version of the Unified Model for Short Range Forecasting of Convective Events Humphrey Lean, Susan Ballard, Peter Clark, Mark Dixon, Zhihong.
Page 1© Crown copyright 2005 DEVELOPMENT OF 1- 4KM RESOLUTION DATA ASSIMILATION FOR NOWCASTING AT THE MET OFFICE Sue Ballard, September 2005 Z. Li, M.
WRF Verification Toolkit Workshop, Boulder, February 2007 Spatial verification of NWP model fields Beth Ebert BMRC, Australia.
NCAR, 15 April Fuzzy verification of fake cases Beth Ebert Center for Australian Weather and Climate Research Bureau of Meteorology.
Eidgenössisches Departement des Innern EDI Bundesamt für Meteorologie und Klimatologie MeteoSchweiz Weather type dependant fuzzy verification of precipitation.
Eidgenössisches Departement des Innern EDI Bundesamt für Meteorologie und Klimatologie MeteoSchweiz Weather type dependant fuzzy verification of precipitation.
Predicting Intense Precipitation Using Upscaled, High-Resolution Ensemble Forecasts Henrik Feddersen, DMI.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
Application of the CRA Method Application of the CRA Method William A. Gallus, Jr. Iowa State University Beth Ebert Center for Australian Weather and Climate.
Characteristics of precipitating convection in the UM at Δx≈200m-2km
SAL - Structure, Ampliutde, Location
I. Sanchez, M. Amodei and J. Stein Météo-France DPREVI/COMPAS
A few examples of heavy precipitation forecast Ming Xue Director
LEPS VERIFICATION ON MAP CASES
Nigel Roberts Met Reading
QPF sensitivity to Runge-Kutta and Leapfrog core
Intensity-scale verification technique
Hydrometeorological Predication Center
Systematic timing errors in km-scale NWP precipitation forecasts
Verifying Precipitation Events Using Composite Statistics
Spatial Verification Intercomparison Meeting, 20 February 2007, NCAR
Multi-scale validation of high resolution precipitation products
Statistical Methods for Model Evaluation – Moving Beyond the Comparison of Matched Observations and Output for Model Grid Cells Kristen M. Foley1, Jenise.
West Virginia Floods June 2016 NROW 2016 Albany NY
Convective Scale Modelling Humphrey Lean et. al
When are GCMs useful? A climate model will produce a lot of output. This game presents a number of potential questions someone could ask of a climate model.
Introduction to Summary Statistics
What temporal averaging period is appropriate for MM5 verification?
Numeric Accuracy and Precision
Post Processing.
forecasts of rare events
Graphing with Uncertainties
The Importance of Reforecasts at CPC
Quantitative verification of cloud fraction forecasts
New Developments in Aviation Forecast Guidance from the RUC
Composite Method Results Artificial Cases April 2008
Numerical Weather Prediction Center (NWPC), Beijing, China
Christoph Gebhardt, Zied Ben Bouallègue, Michael Buchhold
2007 Mei-yu season Chien and Kuo (2009), GPS Solutions
Some Verification Highlights and Issues in Precipitation Verification
Short Range Ensemble Prediction System Verification over Greece
Samples and Populations
Thinking about variation
Presentation transcript:

Fuzzy verification using the Fractions Skill Score Marion Mittermaier and Nigel Roberts Spatial verification methods intercomparison meeting, Boulder, 20.02.07 © Crown copyright

Verification approach We want to know How the forecast skill varies with neighbourhood size. The smallest neighbourhood size that can be used to give sufficiently accurate forecasts. Does higher resolution provide more accurate forecasts on scales of interest (e.g. river catchments) Compare forecast fractions with fractions from radar over different sized neighbourhoods (squares for convenience) using GRIDDED data. Use rainfall accumulations to apply temporal smoothing Scale-selective verification of rainfall accumulations from high-resolution forecasts of convective events by Roberts and Lean (accepted in MWR, Feb 2007) © Crown copyright

Schematic comparison of fractions observed forecast This would be considered as a perfect forecast on the scale of 5x5 grid squares, since the fraction of grid boxes exceeding the threshold is the same in both the forecast and the observation. Threshold exceeded where squares are blue © Crown copyright

A score for comparing fractions with fractions Brier score for comparing fractions Skill score for fractions/probabilities - Fractions Skill Score (FSS) Denominator in Fractions Skill Score is the Brier score for the worst possible match-up between forecast and observed grid box values © Crown copyright

Example graph of FSS against neighbourhood size Emphasizes which scales have useful skill. At grid scale FSS of a random forecast that has the same base rate as the observations, f0, is equal to f0. The target skill is 0.5+ f0 /2, which is where the skill is closer to perfect than to random. © Crown copyright

Strengths and weaknesses Measures skill on fair terms from the model perspective. It gets round the double penalty problem by sampling around precipitation areas. It can be used to determine the scale over which a forecast system has sufficient skill. The method is intuitive and can be directly related to the way forecasts are presented. i.e. generating spatial probability forecasts. It is particularly useful for high-resolution precipitation forecasts in which we expect the fine detail to be unpredictable. It can be used for single or composite events. 1. The spatial skill signal may be swamped by the bias. 2. Sensitivity to small base rates at higher thresholds, i.e. is threshold dependent (as any method using thresholds!) 3. Like any score, it doesn't tell the whole story on its own. © Crown copyright

13 May 2005 © Crown copyright

Hourly accumulations Max = 94 mm (3.7 in) Max = 78 mm (3.1 in) © Crown copyright

Physical thresholds Increasing bias for higher thresholds mm in 0.04 0.08 0.16 0.32 0.64 1.28 ~60 mi I’ve added in an arbitrary 0.55 line as a reference for “where forecasts become skilful”. This is approximate, meant as a guide. The 0.5 line is also shown. I’ve highlighted the 100 km range as a possible limit for maximum averaging length. Again only as an illustration. Someone (with local knowledge) would need to decide what an acceptable level of skill and maximum averaging length would be. NCEP WRF appears to verify worst of all. It also gives a classic example of when smoothing becomes detrimental to skill, i.e. there is an absolute maximum averaging length at which you may be able to maximise skill (would a forecast averaged to this length still be useful though?). Beyond this, the skill (not to mention the usefulness) decreases. NCAR WRF appears to be the most skilful at fairly small scales and for most thresholds. CAPS WRF comes a close second. © Crown copyright

Top 25, 5 and 1 % of the distribution (including zeros) Frequency thresholds Top 25, 5 and 1 % of the distribution (including zeros) representative of rain/no rain boundary 0.5-1 mm (0.02-0.04 in) 4-6 mm (0.16-0.24 in) ~60 mi Upon analysis of the results I’ve come to the conclusion that given the level of sparseness of data in the domain I need to re-evaluate the frequency thresholds in the code (that seemed reasonable for the UK!). As the code takes some time to run and I didn’t have time to tweak too much I only have these results to show. Clearly a 0.001 (10th of a percent) could still yield sensible results. Notice also that for the NCEP WRF the rain/no rain threshold has the best skill, whereas for the other two this appears to be worst. >75% zeros in domain top 1% are values ~ 5mm (0.2 in) or more (large range) 1% of pixels is ~ 3000 (still a lot) © Crown copyright

1 June 2005 © Crown copyright

Hourly accumulations Max = 120 mm (4.7 in) Max = 84 mm (3.3 in) © Crown copyright

Physical thresholds mm in 0.04 0.08 0.16 0.32 0.64 1.28 ~60 mi NCEP WRF still less skilful than the other two. Both NCAR AND CAPS WRF struggling to reach acceptable skill, even with considerable smoothing. © Crown copyright

Frequency thresholds representative of rain/no rain boundary 0.5-1 mm (0.02-0.04 in) representative of rain/no rain boundary 4-6 mm (0.16-0.24 in) ~60 mi © Crown copyright

Issues The following points have cropped up and are listed here as general issues or specific to the FSS. At the very least they require a bit more thought and possibly some extra tests. Currently FSS is computationally expensive (run time dependent on domain size). Results may be domain size dependent (a larger domain gives the scope for larger spatial errors). Other spatial methods may suffer in the same way. (Do we know enough about this?) Independence issues (regarding adjacent pixels). This affects all spatial-based methods. (Should we be worried?) Impact of data sparseness(?), domain edge effects. A bit of a grey area but again may apply more widely. © Crown copyright