Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Slides:



Advertisements
Similar presentations
Medium-range Ensemble Streamflow forecast over France F. Rousset-Regimbeau (1), J. Noilhan (2), G. Thirel (2), E. Martin (2) and F. Habets (3) 1 : Direction.
Advertisements

Robin Hogan Ewan OConnor University of Reading, UK What is the half-life of a cloud forecast?
Model Evaluation Tools MET. What is MET Model Evaluation Tools ( MET )- a powerful and highly configurable verification package developed by DTC offering:
ECMWF long range forecast systems
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Comparing and Contrasting Post-processing Approaches to Calibrating Ensemble Wind and Temperature Forecasts Tom Hopson Luca Delle Monache, Yubao Liu, Gregory.
Details for Today: DATE:3 rd February 2005 BY:Mark Cresswell FOLLOWED BY:Assignment 2 briefing Evaluation of Model Performance 69EG3137 – Impacts & Models.
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Improving COSMO-LEPS forecasts of extreme events with.
Gridded OCF Probabilistic Forecasting For Australia For more information please contact © Commonwealth of Australia 2011 Shaun Cooper.
1 Kalman filter, analog and wavelet postprocessing in the NCAR-Xcel operational wind-energy forecasting system Luca Delle Monache Research.
Ensemble Post-Processing and it’s Potential Benefits for the Operational Forecaster Michael Erickson and Brian A. Colle School of Marine and Atmospheric.
Water Management Presentations Summary Determine climate and weather extremes that are crucial in resource management and policy making Precipitation extremes.
Lecture II-2: Probability Review
Advanced Study Program
Hydrologic Statistics
Multi-Model Ensembling for Seasonal-to-Interannual Prediction: From Simple to Complex Lisa Goddard and Simon Mason International Research Institute for.
Performance of the MOGREPS Regional Ensemble
ESA DA Projects Progress Meeting 2University of Reading Advanced Data Assimilation Methods WP2.1 Perform (ensemble) experiments to quantify model errors.
Application of a Multi-Scheme Ensemble Prediction System for Wind Power Forecasting in Ireland.
Verification of ensembles Courtesy of Barbara Brown Acknowledgments: Tom Hamill, Laurence Wilson, Tressa Fowler Copyright UCAR 2012, all rights reserved.
Rank Histograms – measuring the reliability of an ensemble forecast You cannot verify an ensemble forecast with a single.
Operational Flood Forecasting for Bangladesh: Tom Hopson, NCAR Peter Webster, GT A. R. Subbiah and R. Selvaraju, ADPC Climate Forecast Applications for.
Tutorial. Other post-processing approaches … 1) Bayesian Model Averaging (BMA) – Raftery et al (1997) 2) Analogue approaches – Hopson and Webster, J.
Exploring sample size issues for 6-10 day forecasts using ECMWF’s reforecast data set Model: 2005 version of ECMWF model; T255 resolution. Initial Conditions:
Intraseasonal TC prediction in the southern hemisphere Matthew Wheeler and John McBride Centre for Australia Weather and Climate Research A partnership.
How can LAMEPS * help you to make a better forecast for extreme weather Henrik Feddersen, DMI * LAMEPS =Limited-Area Model Ensemble Prediction.
Verification methods - towards a user oriented verification WG5.
Verification Approaches for Ensemble Forecasts of Tropical Cyclones Eric Gilleland, Barbara Brown, and Paul Kucera Joint Numerical Testbed, NCAR, USA
Heidke Skill Score (for deterministic categorical forecasts) Heidke score = Example: Suppose for OND 1997, rainfall forecasts are made for 15 stations.
Dusanka Zupanski And Scott Denning Colorado State University Fort Collins, CO CMDL Workshop on Modeling and Data Analysis of Atmospheric CO.
Improving Ensemble QPF in NMC Dr. Dai Kan National Meteorological Center of China (NMC) International Training Course for Weather Forecasters 11/1, 2012,
Model dependence and an idea for post- processing multi-model ensembles Craig H. Bishop Naval Research Laboratory, Monterey, CA, USA Gab Abramowitz Climate.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Requirements from KENDA on the verification NetCDF feedback files: -produced by analysis system (LETKF) and ‘stat’ utility ((to.
Verification of IRI Forecasts Tony Barnston and Shuhua Li.
Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.
Probabilistic Forecasting. pdfs and Histograms Probability density functions (pdfs) are unobservable. They can only be estimated. They tell us the density,
Short-Range Ensemble Prediction System at INM José A. García-Moya SMNT – INM 27th EWGLAM & 12th SRNWP Meetings Ljubljana, October 2005.
. Outline  Evaluation of different model-error schemes in the WRF mesoscale ensemble: stochastic, multi-physics and combinations thereof  Where is.
18 September 2009: On the value of reforecasts for the TIGGE database 1/27 On the value of reforecasts for the TIGGE database Renate Hagedorn European.
© 2009 UCAR. All rights reserved. ATEC-4DWX IPR, 21−22 April 2009 National Security Applications Program Research Applications Laboratory Ensemble-4DWX.
Statistical Post Processing - Using Reforecast to Improve GEFS Forecast Yuejian Zhu Hong Guan and Bo Cui ECM/NCEP/NWS Dec. 3 rd 2013 Acknowledgements:
Renewable Energy Requirements Rapid Update Analysis/Nowcasting Workshop - June 4, 2015 Sue Ellen Haupt Director, Weather Systems & Assessment Program Research.
Kris Shrestha James Belanger Judith Curry Jake Mittelman Phillippe Beaucage Jeff Freedman John Zack Medium Range Wind Power Forecasts for Texas.
11th EMS & 10th ECAM Berlin, Deutschland The influence of the new ECMWF Ensemble Prediction System resolution on wind power forecast accuracy and uncertainty.
Nathalie Voisin 1, Florian Pappenberger 2, Dennis Lettenmaier 1, Roberto Buizza 2, and John Schaake 3 1 University of Washington 2 ECMWF 3 National Weather.
Common verification methods for ensemble forecasts
NATIONAL CENTER FOR ATMOSPHERIC RESEARCH The NCAR/ATEC Operational Mesoscale Ensemble Data Assimilation and Prediction System – “Ensemble-RTFDDA” Yubao.
A Random Subgrouping Scheme for Ensemble Kalman Filters Yun Liu Dept. of Atmospheric and Oceanic Science, University of Maryland Atmospheric and oceanic.
Details for Today: DATE:13 th January 2005 BY:Mark Cresswell FOLLOWED BY:Practical Dynamical Forecasting 69EG3137 – Impacts & Models of Climate Change.
DOWNSCALING GLOBAL MEDIUM RANGE METEOROLOGICAL PREDICTIONS FOR FLOOD PREDICTION Nathalie Voisin, Andy W. Wood, Dennis P. Lettenmaier University of Washington,
VERIFICATION OF A DOWNSCALING SEQUENCE APPLIED TO MEDIUM RANGE METEOROLOGICAL PREDICTIONS FOR GLOBAL FLOOD PREDICTION Nathalie Voisin, Andy W. Wood and.
Verification methods - towards a user oriented verification The verification group.
Figures from “The ECMWF Ensemble Prediction System”
Central Bank of Egypt Basic statistics. Central Bank of Egypt 2 Index I.Measures of Central Tendency II.Measures of variability of distribution III.Covariance.
National Oceanic and Atmospheric Administration’s National Weather Service Colorado Basin River Forecast Center Salt Lake City, Utah 11 The Hydrologic.
Google Meningitis Modeling Tom Hopson October , 2010.
Statistical analysis.
Statistical analysis.
Ensemble Forecasting: Calibration, Verification, and use in Applications Tom Hopson.
Verifying and interpreting ensemble products
Tom Hopson, Jason Knievel, Yubao Liu, Gregory Roux, Wanli Wu
Precipitation Products Statistical Techniques
S.Alessandrini, S.Sperati, G.Decimi,
Nathalie Voisin, Andy W. Wood and Dennis P. Lettenmaier
Google Meningitis Modeling
Integration of NCAR DART-EnKF to NCAR-ATEC multi-model, multi-physics and mutli- perturbantion ensemble-RTFDDA (E-4DWX) forecasting system Linlin Pan 1,
Christoph Gebhardt, Zied Ben Bouallègue, Michael Buchhold
Ensemble-4DWX update: focus on calibration and verification
Rapid Adjustment of Forecast Trajectories: Improving short-term forecast skill through statistical post-processing Nina Schuhen, Thordis L. Thorarinsdottir.
Presentation transcript:

Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu 1, Jason Knievel 1, Tom Warner 1, Scott Swerdlin 1, John Pace 2, Scott Halvorson 2 2 U.S. Army Test and Evaluation Command 1

Outline I.Motivation: ensemble forecasting and post- processing II.E-RTFDDA for Dugway Proving Grounds III.Introduce Quantile Regression (QR; Kroenker and Bassett, 1978) III.Post-processing procedure IV.Verification results V.Warning: dynamically finding ensemble dispersion at risk ensemble mean utility VI.Conclusions

Goals of an EPS Predict the observed distribution of events and atmospheric states Predict uncertainty in the day’s prediction Predict the extreme events that are possible on a particular day Provide a range of possible scenarios for a particular forecast

1.Greater accuracy of ensemble mean forecast (half the error variance of single forecast) 2.Likelihood of extremes 3.Non-Gaussian forecast PDF’s 4.Ensemble spread as a representation of forecast uncertainty => All rely on forecasts being calibrated Further … -- Argue calibration essential for tailoring to local application: NWP provides spatially- and temporally-averaged gridded forecast output -- Applying gridded forecasts to point locations requires location specific calibration to account for local spatial- and temporal- scales of variability ( => increasing ensemble dispersion) More technically …

Dugway Proving Grounds, Utah e.g. T Thresholds Includes random and systematic differences between members. Not an actual chance of exceedance unless calibrated.

Challenges in probabilistic mesoscale prediction Model formulation Bias (marginal and conditional) Lack of variability caused by truncation and approximation Non-universality of closure and forcing Initial conditions Small-scales are damped in analysis systems, and the model must develop them Perturbation methods designed for medium-range systems may not be appropriate Lateral boundary conditions After short time periods the lateral boundary conditions can dominate Representing uncertainty in lateral boundary conditions is critical Lower boundary conditions Dominate boundary-layer response Difficult to estimate uncertainty in lower boundary conditions

RTFDDA and Ensemble-RTFDDA Liu et al AMS Annual Meeting, 14 th IOAS-AOLS, Atlanta, GA. January 18 – 23, 2010

The Ensemble Execution Module Perturbations observations Member 1 Perturbations observations Member 2 Perturbations observations Member 3 Perturbations observations Member N … 36-48h fcsts 36-48h fcsts 36-48h fcsts 36-48h fcsts Input to decision support tools Post processing Archiving and verification RTFDDA Liu et al AMS Annual Meeting, 14 th IOAS-AOLS, Atlanta, GA. January 18 – 23, 2010

Operated at US Army DPG since Sep D1 D2 D3 Surface and X-sections – Mean, Spread, Exceedance Probability, Spaghetti, … Likelihood for SPD > 10m/s Mean T & Wind T Mean and SD Wind Speed T-2m Wind Rose Pin-point Surface and Profiles – Mean, Spread, Exceedance probability, spaghetti, Wind roses, Histograms … Real-time Operational Products for DPG

Forecast “calibration” or “post-processing” Probability calibration Flow rate [m 3 /s] Probability Post-processing has corrected: the “on average” bias as well as under-representation of the 2nd moment of the empirical forecast PDF (i.e. corrected its “dispersion” or “spread”) “spread” or “dispersion” “bias” obs Forecast PDF Forecast PDF Flow rate [m 3 /s] Our approach: under-utilized “quantile regression” approach probability distribution function “means what it says” daily variation in the ensemble dispersion directly relate to changes in forecast skill => informative ensemble skill-spread relationship

Example of Quantile Regression (QR) Our application Fitting T quantiles using QR conditioned on: 1)Ranked forecast ens 2)ensemble mean 3)ensemble median 4) ensemble stdev 5) Persistence

T [K] Time forecastsobserved Regressor set: 1. reforecast ens 2. ens mean 3. ens stdev 4. persistence 5. LR quantile (not shown) Probability/°K Temperature [K] climatological PDF Step I: Determine climatological quantiles Step 2: For each quan, use “forward step-wise cross-validation” to iteratively select best subset Selection requirements: a) QR cost function minimum, b) Satisfy binomial distribution at 95% confidence If requirements not met, retain climatological “prior” Step 3: segregate forecasts into differing ranges of ensemble dispersion and refit models (Step 2) uniquely for each range Time forecasts T [K] I.II.III.II.I. Probability/°K Temperature [K] Forecast PDF prior posterior Final result: “sharper” posterior PDF represented by interpolated quans

Measures Used: 1)Rank histogram (converted to scalar measure) 2)Root Mean square error (RMSE) 3)Brier score 4)Rank Probability Score (RPS) 5)Relative Operating Characteristic (ROC) curve 6)New measure of ensemble skill-spread utility => Using these for automated calibration model selection by using weighted sum of skill scores of each Utilizing Verification measures near-real-time …

Problems with Spread-Skill Correlation …  ECMWF spread-skill (black) correlation << 1  Even “perfect model” (blue) correlation << 1 and varies with forecast lead-time ECMWF r = 0.33 “Perfect” r = 0.68 ECMWF r = “Perfect” r = 0.56 ECMWF r = 0.39 “Perfect” r = 0.53 ECMWF r = 0.36 “Perfect” r = day 7 day 4 day 10 day

National Security Applications Program Research Applications Laboratory 3-hr dewpoint time series Before Calibration After Calibration Station DPG S01

42-hr dewpoint time series Before Calibration After Calibration Station DPG S01

obs Blue is “raw” ensemble Black is calibrated ensemble Red is the observed value Notice: significant change in both “bias” and dispersion of final PDF (also notice PDF asymmetries) PDFs: raw vs. calibrated

National Security Applications Program Research Applications Laboratory 3-hr dewpoint rank histograms Station DPG S01

National Security Applications Program Research Applications Laboratory Station DPG S01 42-hr dewpoint rank histograms

Skill Scores Single value to summarize performance. Reference forecast - best naive guess; persistence, climatology A perfect forecast implies that the object can be perfectly observed Positively oriented – Positive is good

National Security Applications Program Research Applications Laboratory Skill Score Verification RMSE Skill Score CRPS Skill Score Reference Forecasts: Black -- raw ensemble Blue -- persistence

Computational Resource Questions: How best to utilize a multi-model simulations (forecast), especially if under-dispersive? a)Should more dynamical variability be searched for? Or a)Is it better to balance post-processing with multi-model utilization to create a properly dispersive, informative ensemble?

National Security Applications Program Research Applications Laboratory 3-hr dewpoint rank histograms Station DPG S01

National Security Applications Program Research Applications Laboratory RMSE of ensemble members 3hr Lead-time 42hr Lead-time Station DPG S01

National Security Applications Program Research Applications Laboratory Significant calibration regressors 3hr Lead-time 42hr Lead-time Station DPG S01

Questions revisited: How best to utilize a multi-model simulations (forecast), especially if under-dispersive? a)Should more dynamical variability be searched for? Or b)Is it better to balance post-processing with multi-model utilization to create a properly dispersive, informative ensemble? Warning: adding more models can lead to decreasing utility of the ensemble mean (even if the ensemble is under-dispersive)

Summary  Quantile regression provides a powerful framework for improving the whole (potentially non-gaussian) PDF of an ensemble forecast – different regressors for different quantiles and lead-times  This framework provides an umbrella to blend together multiple statistical correction approaches (logistic regression, etc., not shown) as well as multiple regressors  As well, “step-wise cross-validation” based calibration provides a method to ensure forecast skill no worse than climatological and persistence for a variety of cost functions  As shown here, significant improvements made to the forecast’s ability to represent its own potential forecast error (while improving sharpness): – uniform rank histogram – significant spread-skill relationship (new skill-spread measure)  Care should be used before “throwing more models” at an “under-dispersive” forecast problem Further questions: or

Dugway Proving Ground

other options … Assign dispersion bins, then: 2) Average the error values in each bin, then correlate 3) Calculate individual rank histograms for each bin, convert to a scalar measure

Example: French Broad River Before Calibration => underdispersive Black curve shows observations; colors are ensemble

Rank Histogram Comparisons After quantile regression, rank histogram more uniform (although now slightly over-dispersive) Raw full ensembleAfter calibration

Frequency Used for Quantile Fitting of Method I: Best Model=76% Ensemble StDev=13% Ensemble Mean=0% Ranked Ensemble=6% What Nash-Sutcliffe (RMSE) implies about Utility

Note: Take home message: For a “calibrated ensemble”, error variance of the ensemble mean is 1/2 the error variance of any ensemble member (on average), independent of the distribution being sampled Probability obs Forecast PDF Discharge

Sequentially-averaged models (ranked based on NS Score) and their resultant NS Score => Notice the degredation of NS with increasing # (with a peak at 2 models) => For an equitable multi-model, NS should rise monotonically => Maybe a smaller subset of models would have more utility? (A contradiction for an under-dispersive ensemble?) What Nash-Sutcliffe (RMSE) implies about Utility (cont) -- degredation with increased ensemble size

Initial Frequency Used for Quantile Fitting: Best Model=76% Ensemble StDev=13% Ensemble Mean=0% Ranked Ensemble=6% What Nash-Sutcliffe implies about Utility (cont) Reduced Set Frequency Used for Quantile Fitting: Best Model=73% Ensemble StDev=3% Ensemble Mean=32% Ranked Ensemble=29% …using only top 1/3 of models To rank and form ensemble mean … … earlier results … => Appears to be significant gains in the utility of the ensemble after “filtering” (except for drop in StDev) … however “proof is in the pudding” … => Examine verification skill measures …

Skill Score Comparisons between full- and “filtered” ensemble sets Points: -- quite similar results for a variety of skill scores -- both approaches give appreciable benefit over the original raw multi-model output -- however, only in the CRPSS is there improvement of the “filtered” ensemble set over the full set => post-processing method fairly robust => More work (more filtering?)! GREEN -- full calibrated multi-model BLUE -- “filtered” calibrated multi-model Reference – uncalibrated set