Daria Kluver Independent Study From Statistical Methods in the Atmospheric Sciences By Daniel Wilks.

Slides:

Advertisements

Similar presentations

ECMWF Slide 1Met Op training course – Reading, March 2004 Forecast verification: probabilistic aspects Anna Ghelli, ECMWF.

Advertisements

Robin Hogan Ewan OConnor University of Reading, UK What is the half-life of a cloud forecast?

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.

Guidance of the WMO Commission for CIimatology on verification of operational seasonal forecasts Ernesto Rodríguez Camino AEMET (Thanks to S. Mason, C.

6th WMO tutorial Verification Martin GöberContinuous 1 Good afternoon! नमस्कार नमस्कार Guten Tag! Buenos dias! до́брый день! до́брыйдень Qwertzuiop asdfghjkl!

14 May 2001QPF Verification Workshop Verification of Probability Forecasts at Points WMO QPF Verification Workshop Prague, Czech Republic May 2001.

 These 100 seniors make up one possible sample. All seniors in Howard County make up the population.  The sample mean ( ) is and the sample standard.

Verification of probability and ensemble forecasts

Details for Today: DATE:3 rd February 2005 BY:Mark Cresswell FOLLOWED BY:Assignment 2 briefing Evaluation of Model Performance 69EG3137 – Impacts & Models.

Introduction to Probability and Probabilistic Forecasting L i n k i n g S c i e n c e t o S o c i e t y Simon Mason International Research Institute for.

Central Limit Theorem.

Statistical Weather Forecasting Independent Study Daria Kluver From Statistical Methods in the Atmospheric Sciences by Daniel Wilks.

Probability. Probability Definitions and Relationships Sample space: All the possible outcomes that can occur. Simple event: one outcome in the sample.

Chapter 4 Probability.

Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.

Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.

1 The Assumptions. 2 Fundamental Concepts of Statistics Measurement - any result from any procedure that assigns a value to an observable phenomenon.

PSY 307 – Statistics for the Behavioral Sciences Chapter 8 – The Normal Curve, Sample vs Population, and Probability.

Barbara Casati June 2009 FMI Verification of continuous predictands

PSY 307 – Statistics for the Behavioral Sciences Chapter 8 – The Normal Curve, Sample vs Population, and Probability.

Today Concepts underlying inferential statistics

Chapter 7 Correlational Research Gay, Mills, and Airasian

Chapter 14 Inferential Data Analysis

Applied Business Forecasting and Planning

Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.

Bootstrapping applied to t-tests

Inferential Statistics

Chapter 12 Inferential Statistics Gay, Mills, and Airasian

AM Recitation 2/10/11.

Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.

Census A survey to collect data on the entire population. Data The facts and figures collected, analyzed, and summarized for presentation and.

Chapter 15 Correlation and Regression

Verification of ensembles Courtesy of Barbara Brown Acknowledgments: Tom Hamill, Laurence Wilson, Tressa Fowler Copyright UCAR 2012, all rights reserved.

Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.

4IWVM - Tutorial Session - June 2009 Verification of categorical predictands Anna Ghelli ECMWF.

Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts Ian Jolliffe and David Stephenson 1EMS September Sampling uncertainty.

Chapter 1: Introduction to Statistics

Heidke Skill Score (for deterministic categorical forecasts) Heidke score = Example: Suppose for OND 1997, rainfall forecasts are made for 15 stations.

Model validation Simon Mason Seasonal Forecasting Using the Climate Predictability Tool Bangkok, Thailand, 12 – 16 January 2015.

Managerial Economics Demand Estimation & Forecasting.

Chapter 4 Probability ©. Sample Space sample space.S The possible outcomes of a random experiment are called the basic outcomes, and the set of all basic.

Chapter 7 Probability and Samples: The Distribution of Sample Means

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.4 Analyzing Dependent Samples.

Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.

CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.

Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.

© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.

Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.

ECMWF Training Course Reading, 25 April 2006 EPS Diagnostic Tools Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

© 2009 UCAR. All rights reserved. ATEC-4DWX IPR, 21−22 April 2009 National Security Applications Program Research Applications Laboratory Ensemble-4DWX.

Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.

Verification of ensemble systems Chiara Marsigli ARPA-SIMC.

Chapter 6 - Standardized Measurement and Assessment

Common verification methods for ensemble forecasts

Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.

Copyright © Cengage Learning. All rights reserved. 15 Distribution-Free Procedures.

DOWNSCALING GLOBAL MEDIUM RANGE METEOROLOGICAL PREDICTIONS FOR FLOOD PREDICTION Nathalie Voisin, Andy W. Wood, Dennis P. Lettenmaier University of Washington,

CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.

Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”

UERRA user workshop, Toulouse, 3./4. Feb 2016Cristian Lussana and Michael Borsche 1 Evaluation software tools Cristian Lussana (2) and Michael Borsche.

Criminal Justice and Criminology Research Methods, Second Edition Kraska / Neuman © 2012 by Pearson Higher Education, Inc Upper Saddle River, New Jersey.

Estimating standard error using bootstrap

Verifying and interpreting ensemble products

Binary Forecasts and Observations

Chapter 9 Hypothesis Testing.

Probabilistic forecasts

1. Homework #2 (not on posted slides) 2. Inferential Statistics 3

Verification of SPE Probability Forecasts at SEPC

the performance of weather forecasts

CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.

Presentation transcript:

Daria Kluver Independent Study From Statistical Methods in the Atmospheric Sciences By Daniel Wilks

Let’s review a few concepts that were introduced last time on Forecast Verification

Purposes of Forecast Verification Forecast verification- the process of assessing the quality of forecasts. Any given verification data set consists of a collection of forecast/observation pairs whose joint behavior can be characterized in terms of the relative frequencies of the possible combinations of forecast/observation outcomes. This is an empirical joint distribution

The Joint Distribution of Forecasts and Observations Forecast = Observation = The joint distribution of the forecasts and observations is denoted This is a discrete bivariate probability distribution function associating a probability with each of the IxJ possible combinations of forecast and observation.

The joint distribution can be factored in two ways, the one used in a forecasting setting is: Called calibration-refinement factorization The refinement of a set of forecasts refers to the dispersion of the distribution p(y i ) If y i has occurred, this is the probability of o j happening. Specifies how often each possible weather event occurred on those occasions when the single forecast y i was issued, or how well each forecast is calibrated. If y i has occurred, this is the probability of o j happening. Specifies how often each possible weather event occurred on those occasions when the single forecast y i was issued, or how well each forecast is calibrated. The unconditional distribution, which specifies the relative frequencies of use of each of the forecast values y i sometimes called the refinement of a forecast.

Scalar Attributes of Forecast Performance

Forecast Skill Forecast skill- the relative accuracy of a set of forecasts, wrt some set of standard control, or reference, forecast (like climatological average, persistence forecasts, random forecasts based on climatological relative frequencies) Skill score- a percentage improvement over reference forecast. accuracy Accuracy of reference Accuracy that would be achieved by a perfect forecast.

On to new material… 2x2 Contingency tables Scalar attributes of contingency tables Tornado example NWS vs Weather.com vs climatology Skill Scores Probabilistic Forecasts Multicategory Discrete Predictands Continuous Predictands Plots and score Probability forecasts for multicategory events Non-Probabilistic Field forecasts

Nonprobabilistic Forecasts of Discrete Predictands Nonprobabilistic – contains unqualified statement that a single outcome will occur. Contains no expression of uncertainty.

The 2x2 Contingency Table The simplest joint distribution is from I=J=2. (or nonprobabilistic yes/no forecasts) I=2 possible forecasts J=2 outcomes i=1 or y 1, event will occur i=2 or y 2, event will not occur j=1 or o 1, event subsequently occurs j=2 or o 2, event doesn’t subsequently occur

a forecast- observation pairs called “hits” their relative frequency, a/n is the sample estimate of the corresponding joint probability p(y 1,o 1 ) b occasions called “false alarms” the relative frequency estimates the joint probability p(y 1,o 2 ) C occasions called “misses” the relative frequency estimates the joint probability p(y 2,o 1 ) D occasions called “correct rejection or correct negative ” the relative frequency estimates the joint probability p(y 2,o 2 )

Scalar Attributes Characterizing 2x2 contingency tables Accuracy – proportion correct Threat Score (TS) Odds ratio Bias- Comparison of the average forecast with the average observation Reliability and Resolution- False Alarm Ratio Discrimination- Hit rate False Alarm Rate

NWS, weather.com,climatology example 12 random nights from Nov 6 to Dec 1 Will overnight lows be colder than or equal to freezing? wx.comyesno forecastyes505 no NWSyesno forecastyes606 no climyesno forecastyes101 no forecasterabcdPCTSodds ratiobiasFARH wx.com #DIV/0! NWS #DIV/0! clim #DIV/0!

Skill Scores for 2x2 Contingency Tables Heidke Skill Score- based on the proportion correct referenced with the proportion correct that would be achieved by random forecasts that are statistically independent of the observations. Peirce Skill Score- similar to Heidke Skill score, except the reference hit rate in the denominator is random and unbiased forecasts. Clayton Skill Score Gilbert Skill Score or Equitable Threat Score The Odds Ratio ( ɵ ) can be used as a skill score

Finley Tornado Forecasts example

Finley chose to evaluate his forecasts using the proportion correct, PC = ( )/2803= Dominated by the correct no forecast. Gilbert pointed out that never forecasting a tornado produces an even higher proportion correct:, PC = (0+2752)/2803= Threat score gives a better comparison, because large number of no forecasts are ignored. TS=28/( )=.228 Odds ratio is 45.3>1, suggesting better than random performance Bias ratio is B=1.96, indicating that approximately twice as many tornados were forecast as actually occurred FAR = 0.720, which expresses the fact that a fairly large fraction of the forecast tornados did not eventually occur. H=0.549 and F=0.0262, indicating that more than half of the actual tornados were forecast to occur, whereas a very small fraction of the non tornado cases falsely warned of a tornado. Skill Scores: HSS=0.355 PSS=0.523 CSS=0.271 GSS=0.216 Q=0.957 Threat score gives a better comparison, because large number of no forecasts are ignored. TS=28/( )=.228 Odds ratio is 45.3>1, suggesting better than random performance Bias ratio is B=1.96, indicating that approximately twice as many tornados were forecast as actually occurred FAR = 0.720, which expresses the fact that a fairly large fraction of the forecast tornados did not eventually occur. H=0.549 and F=0.0262, indicating that more than half of the actual tornados were forecast to occur, whereas a very small fraction of the non tornado cases falsely warned of a tornado. Skill Scores: HSS=0.355 PSS=0.523 CSS=0.271 GSS=0.216 Q=0.957

What if your data are Probabilistic? For a dichotomous predictand, to convert from a probabilistic to a nonprobabilistic format requires selection of a threshold probability, above which the forecast will be “yes”. Ends up somewhat arbitrary.

Climatological probability of precip Threshold that would maximize the Threat score Produce unbiased forecasts (b=1) Nonprobabilistic forecasts of the more likely of the two events.

Multicategory Discrete Predictands Make into 2x2 tables rain mix snow R m s R non-rain rain Non-rain

Nonprobabilistic Forecasts of continuous predictands It is informative to graphically represent aspects of the joint distribution of nonprobabilistic forecasts for continuous variables.

These plots are examples of a diagnostic verification technique, allowing diagnosis of a particular strengths and weakness of a set of forecasts through exposition of the full joint distribution. Conditional Quantile Plots a)performance of MOS forecasts b) performance of subjective forecasts Conditional distributions of the observations given the forecasts are represented in terms of selected quantiles, wrt the perfect 1:1 line. Contain 2 parts, representing the 2 factors in the calibration – refinement factorization of the joint distribution of forecasts and observations. MOS observed temps are consistently colder than the forecasts Subjective forecasts are essentially unbiased. Subjective forecasts are somewhat sharper, or more refined, more extreme temperatures being forecast more freq.

Scalar Accuracy Measures Only 2 scalar measures of forecast accuracy for continuous predictands in common use. Mean Absolute Error, and Mean Squared Error

Mean Absolute Error The arithmetic average of the absolute values of the differences between the members of each pair. MAE = 0 if forecasts are perfect. Often used to verify temp forecasts.

Mean Squared Error The average squared difference between the forecast and observed pairs More sensitive to larger errors than MAE More sensitive to outliers MSE = 0 for perfect RMSE = which has same physical dimensions as the forecasts and observations To calculate the bias of the forecast, compute the Mean Error:

Skill Scores Can be computed with MAE, MSE, or RMSE as the underlying accuracy statistics Climatological value for day k

Probability Forecasts of Discrete Predictands The joint Distribution for Dichotomous Events Not just using probabilities of 0 and 1 For each possible forecast probability we see the relative freq that forecast value was used, and the probability that the event o 1 occurred given the forecast y i

The Brier Score Scalar accuracy measure for verification of probabilistic forecasts of dichotomous events This is the mean squared error of the probability forecasts, where o 1 = 1 if the event occurs and o 2 = 0 if the event doesn’t occur. Perfect forecast BS = 0 less accurate forecasts receive higher BS. Briar Skill Score:

The Reliability Diagram Is a graphical device that shows the full joint distribution of forecasts and observations for probability forecasts of a binary predictand, in terms of its calibration-refinement factorization Allows diagnosis of particular strengths and weaknesses in a verification set.

The conditional event relative frequency is essentially equal to the forecast probability. Forecasts are consistently too small relative to the conditional event relative frequencies, avg forecast smaller than avg obs. Forecasts are consistently too large relative to the conditional event relative frequencies, avg forecast larger than avg obs. Overconfident: extreme probabilities forecast too often Underconfident: extreme probabilities forecast too infrequently

Well-calibrated probability forecasts mean what they say, in the sense that subsequent event relative frequencies are equal to the forecast probabilities.

Hedging and Strictly proper scoring rules If a forecaster is just trying to get the best score, they may improve scores by hedging, or gaming -> forecasting something other than our true belief in order to achieve a better score. Strictly proper – a forecast evaluation procedure that awards a forecaster’s best expected score only when his or her true beliefs are forecast. Cannot be hedged Brier score You can derive that it is proper, but I wont here.

Probability Forecasts for Multiple-category events For multiple-category ordinal probability forecasts: Verification should penalize forecasts increasingly as more probability is assigned to event categories further removed from the actual outcome. Should be strictly proper. Commonly used: Ranked probability score (RPS)

Probability forecasts for continuous predictands For an infinite number of predictand classes the ranked probability score can be extended to the continuous case. Continuous ranked probability score Strictly proper Smaller values are better It rewards concentration of probability around the step function located at the observed value. 1

Nonprobabilistic Forecasts of Fields General considerations for field forecasts Usually nonprobabilistic Verification is done on a grid

Scalar accuracy measures of these fields: S1 score, Mean Squared Error, Anomaly correlation

Thank you for your participation throughout the semester All presentations will be posted on my UD website Additional information can be found in Statistical Methods in the Atmospheric Sciences (second edition) by Daniel Wilks