Barbara Brown 1, Ed Tollerud 2, and Tara Jensen 1 1 NCAR/RAL, Boulder, CO and DTC 2 NOAA/GSD, Boulder, CO and DTC DET: Testing and Evaluation Plan Wally Clark
DTC and DET Testing and Evaluation T&E is one of the most important activities undertaken by the DTC DTC testing has involved WRF core comparisons, boundary layer schemes, and other aspects of NWP DTC has created “Reference Configurations” (RCs) that are to be re- tested in conjunction with model changes DET infrastructure is being developed to allow Testing and evaluation and Intercomparison of ensemble systems and system components
Major categories of testing Forecasting system comparisons Compare forecasts based on one configuration with forecasts based on a different model configuration Examples Two types of model initialization Two or more methods of statistical post-processing Individual reference configuration Model “setup” is evaluated Setup is re-evaluated when model changes are implemented Reference configurations may be defined by Operational centers Users RCs may also be community-contributed Forecasts contributed by a modeling group Ex: Forecasts evaluated in HWT and HMT projects
DTC Testing and Evaluation Principles A formal test plan is developed, defining all of the important aspects of the testing and evaluation Developer may have a role in helping to create the test plan Execution of test is independent of the developer Focus of test depends on the questions that are of interest Module being used Variables of interest Many cases evaluated for statistical significance Not just a few case studies Multiple seasons, times of day, etc. Meaningful stratifications Location/region Season Other user-based criteria
Components of a test plan (example) Goals Experiment design Codes Specification of the codes will be run as part of the test Model output What kinds of output will be produced? Forecast periods Post-processing Verification Statistical methods and measures Graphics generation and display Data archival and dissemination of results Computer resources Deliverables Example from QNSE evaluation (surface T and wind)
Questions to address when developing a test plan Which aspect(s) (or modules)of the ensemble system will be evaluated? What performance aspects are we trying to compare? Or evaluate? Who are the “users”? What are the variables of interest? Answers to these questions will lead to determination of the other aspects of the plan
Considerations for ensemble T&E Number of cases will likely need to be increased (over non-ensemble evaluations) Many probabilistic and ensemble verification scores (e.g., reliability) require relatively large subsamples Subsamples must be large enough to assess statistical significance But – Sampling must be focused enough for representativeness Verification approaches and metrics are somewhat unique Computer resources may be a limitation
Other considerations Real-time vs. post-analysis DTC intensive tests generally done in post-analysis Real-time demonstrations also have many benefits (e.g., HMT, HWT) Subjective evaluations – should these be considered for DET T&E? How much rigorous end-to-end testing required vs. evaluation of individual components? Example for HMT evaluation – winter 2010