10IMSC August 20071 Tests for Evaluating Rank Histograms from Ensemble Forecasts Ian Jolliffe University of Exeter Cristina Primo.

Slides:



Advertisements
Similar presentations
Excel Tutorial 3 Working with Formulas and Functions
Advertisements

Objectives Create an action query to create a table
Introductory Mathematics & Statistics for Business
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
LAW February A 5/3 approximation algorithm for a hard case of stable marriage Rob Irving Computing Science Department University of Glasgow (joint.
Chapter 7 Sampling and Sampling Distributions
Tests of Significance and Measures of Association
Objectives Explore a structured range of data Freeze rows and columns
Chapter 18: The Chi-Square Statistic
Testing Hypotheses About Proportions
Multiple Regression and Model Building
Chapter 6 Sampling and Sampling Distributions
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Hypothesis: It is an assumption of population parameter ( mean, proportion, variance) There are two types of hypothesis : 1) Simple hypothesis :A statistical.
Hypothesis Testing IV Chi Square.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Inferences About Process Quality
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Regression Analysis (2)
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
 Involves testing a hypothesis.  There is no single parameter to estimate.  Considers all categories to give an overall idea of whether the observed.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
For testing significance of patterns in qualitative data Test statistic is based on counts that represent the number of items that fall in each category.
CHAPTER 18: Inference about a Population Mean
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Testing Hypotheses about Differences among Several Means.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
Exploratory Data Analysis Observations of a single variable.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Copyright © 2010 Pearson Education, Inc. Slide
Statistics in IB Biology Error bars, standard deviation, t-test and more.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
HL Psychology Internal Assessment
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Estimating standard error using bootstrap
Comparing Counts Chi Square Tests Independence.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
P-values.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 16: Research with Categorical Data.
Statistical Data Analysis - Lecture10 26/03/03
Testing a Claim About a Mean:  Not Known
Hypothesis testing. Chi-square test
Chapter 25 Comparing Counts.
Chapter 9 Hypothesis Testing.
Hypothesis testing. Chi-square test
Chapter 10 Analyzing the Association Between Categorical Variables
Chi Square (2) Dr. Richard Jackson
One-Way Analysis of Variance
Paired Samples and Blocks
Analyzing the Association Between Categorical Variables
Chapter 26 Comparing Counts.
Copyright © Cengage Learning. All rights reserved.
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter 26 Comparing Counts.
Presentation transcript:

10IMSC August Tests for Evaluating Rank Histograms from Ensemble Forecasts Ian Jolliffe University of Exeter Cristina Primo ECMWF

10IMSC August Outline Rank histograms –Definition –Testing for Uniformity Elmore’s artificial example Chi-squared goodness-of-fit test Alternatives to the chi-squared test –Cramér-von Mises tests –Decomposition of the chi-squared statistic Examples –Elmore again –500hPa heights Final Remarks

10IMSC August Rank Histograms (Talagrand diagrams) Consider an ensemble of (k-1) members forecasting a variable X. These are used to divide the overall range of values for X into k bins. A verifying observation will then fall into one of these k bins. Suppose now we have n such ensemble forecasts and corresponding observations. These are used to construct a histogram with k bins.

10IMSC August Rank Histograms – Testing for Uniformity If the ensemble members and the verifying observation all come from the same probability distribution (desirable), then the probability of the verifying observation falling into a particular bin is the same for all bins. Thus the rank histogram should be roughly ‘flat’ or uniform. Because of sampling variation it won’t be exactly flat – we wish to test whether any deviations from ‘flatness’ are large enough that they unlikely to have arisen by chance.

10IMSC August Elmore’s artificial example 16 bins 60 observations Bottom right – bias/trend Top right – under-dispersion Bottom left – over-dispersion Top left – roughly flat?? Elmore K L (2005) Weather & Forecasting, 20,

10IMSC August Elmore’s artificial example II In fact the data in the top-left panel were generated randomly so that the ensemble members and verifying observations are from the same distribution. Hence deviations from flatness are due to chance. The other three panels have the same bin frequencies, but rearranged in a way that deviations from flatness appear unlikely to have arisen by chance.

10IMSC August The χ 2 goodness-of-fit test The best-known general test that data come from a particular distribution has test statistic Here n i, e i are the observed and expected (given the hypothesised distribution) number of observations in the i th bin. For the uniform distribution, e i = n/k Under the null hypothesis that the hypothesised distribution is correct, T has a χ 2 distribution with (k-1) degrees of freedom.

10IMSC August The χ 2 goodness-of-fit test II The χ 2 test is a good general test – it has some power to detect all types of deviation from the null hypothesis (NH). However because it spreads its power widely (and thus thinly) it is not very good at detecting specific alternatives to the NH.

10IMSC August Elmore’s artificial example III For the top-left panel, T= Comparing this to χ 2 15 gives a p- value of – the deviations from ‘flatness’ could have easily arisen by chance (as indeed they did). This seems plausible, but T has the same value for the other three panels, leading to the same conclusion, which seems a lot less plausible.

10IMSC August Alternatives to χ 2 (Cramér-von Mises) Elmore suggests using members of a family of Cramér–von Mises tests. These are based on comparing the cumulative distribution from the histogram with that of the hypothesised distribution. They have the advantage that they take the order of the bins into account, and are more powerful than χ 2 at detecting alternatives such as those in Elmore’s example (numbers later). One disadvantage is that they need special tables to assess their ‘significance’

10IMSC August Alternatives (decomposing χ 2 ) The overall χ 2 statistic T can be decomposed into the sum of (k-1) terms, each term having (approximately) independent χ 2 1 distributions under the null hypothesis of uniformity. There are restrictions on the way the decomposition is done, but by suitable choices we can isolate terms corresponding to bias/trend, over/under-dispersion etc.

10IMSC August Alternatives (decomposing χ 2 II) We need not find (k-1) individual terms. For example if trend/bias and over/under dispersion are the only deviations from uniformity of interest, we can decompose T into 3 components, one each for the deviations of interest, each with 1 degree of freedom, and a third component representing all other deviations, with (k-3) degrees of freedom.

10IMSC August Alternatives (decomposing χ 2 III) In the results that follow, we label various 1 degree of freedom components as: –Linear (= bias/trend) –Ends (contrasts end categories with all others) –V-shape (represents a different sort of over/under dispersion to Ends) Resid_1, Resid_2, represent the (k-3) degree of freedom terms after removing Linear + Ends and Linear + V-shape respectively. ‘Cramér-von Mises’ gives the smallest p-value found by Elmore.

10IMSC August Elmore example again

10IMSC August Elmore example – p-values Roughly flat Under- dispersion Over- dispersion Bias/trend T0.193 C-von M Linear Ends V-shape Resid_ Resid_

10IMSC August Elmore example - comments Decomposition is at least as powerful as Cramér-von Mises. V-shape does better than Ends, but note lower Ends p-value for under- than for over-dispersion. Very large p-values for Residuals reflect the artificial nature of the example, where deviations from uniformity are designed to be of a specific form.

10IMSC August Northern hemisphere 500hPa geopotential height Although not strictly correct, we treat these data as consisting of 420 independent observations of 14-member ensemble forecasts.

10IMSC August hPa heights – p-values TLinearEndsV-shapeResid_1Resid_ Highly significant value for T. Clear under-dispersion, and Ends has much smaller p-value than V-shape. Linear also indicates evidence of bias. Resid_2 clearly shows deviation other than Linear and V-shape is present.

10IMSC August Caveats P-values are approximate. Some restrictions (orthogonality) on decomposition. Assumes independence of forecasts.

10IMSC August Virtues More powerful than T – may identify deviations from uniformity when T does not. If T does identify deviations, decomposition can tell you the nature of these deviations. Easier to use and more flexible than Cramér-von Mises tests, and at least as powerful in the examples examined.

10IMSC August Questions ?

10IMSC August hPa Geopotential height data – more details Daily 24-hr forecasts of NH 500hPa heights for winter season, created from the NCEP GEFS system. 14 ensemble members – hence 15 bins. Total number of forecasts = = 84 days x 3456 grid points. Strong temporal and spatial dependence – take 25 spatial degrees of freedom and divide number of time points by 5 (Toth), and for illustration treat the data as 420 independent observations.

10IMSC August Some background mathematics Let L be a (k x k) matrix whose rows are orthonormal, with elements l ri, and whose last row’s elements are all 1/  k. Let Finally let

10IMSC August Background mathematics II Then are asymptotically independent χ 2 random variables each with one degree of freedom. By choosing the first few rows of L appropriately, it is possible to isolate parts of T which are sensitive to particular types of deviation from uniformity such as bias/trend and over/under dispersion.