Relative Operating Characteristics

Slides:

Advertisements

Similar presentations

ECMWF Slide 1Met Op training course – Reading, March 2004 Forecast verification: probabilistic aspects Anna Ghelli, ECMWF.

Advertisements

Understanding the Relative Operating Characteristic (ROC) Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia.

Test Development.

Guidance of the WMO Commission for CIimatology on verification of operational seasonal forecasts Ernesto Rodríguez Camino AEMET (Thanks to S. Mason, C.

Details for Today: DATE:3 rd February 2005 BY:Mark Cresswell FOLLOWED BY:Assignment 2 briefing Evaluation of Model Performance 69EG3137 – Impacts & Models.

Introduction to Probability and Probabilistic Forecasting L i n k i n g S c i e n c e t o S o c i e t y Simon Mason International Research Institute for.

THE IMPACT OF DIFFERENT SEA-SURFACE TEMPERATURE PREDICTION SCENARIOS ON SOUTHERN AFRICAN SEASONAL CLIMATE FORECAST SKILL Willem A. Landman Asmerom Beraki.

Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?

Item Analysis What makes a question good??? Answer options?

HYPOTHESIS TESTING Four Steps Statistical Significance Outcomes Sampling Distributions.

Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.

Chapter 11 Integration Information Instructor: Prof. G. Bebis Represented by Reza Fall 2005.

Determine whether each curve below is the graph of a function of x. Select all answers that are graphs of functions of x:

ROC Curve and Classification Matrix for Binary Choice Professor Thomas B. Fomby Department of Economics SMU Dallas, TX February, 2015.

Chapter 12 Inferential Statistics Gay, Mills, and Airasian

Creating Empirical Models Constructing a Simple Correlation and Regression-based Forecast Model Christopher Oludhe, Department of Meteorology, University.

Multi-Model Ensembling for Seasonal-to-Interannual Prediction: From Simple to Complex Lisa Goddard and Simon Mason International Research Institute for.

From Last week.

Getting to know the ACT/PSAE

Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.

Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts Ian Jolliffe and David Stephenson 1EMS September Sampling uncertainty.

Theoretical and Experimental Probability Today you will learn to: calculate the theoretical and experimental probabilities of an event. M07.D-S.3.1.1:

Task Team on CLIPS Evolution Simon J. Mason International Research Institute for Climate and Society The Earth Institute of Columbia.

Multiple Choice Test Strategies. Copyright © Houghton Mifflin Company. All rights reserved.8 | 2 Multiple Choice Test Strategies best There is not always.

Heidke Skill Score (for deterministic categorical forecasts) Heidke score = Example: Suppose for OND 1997, rainfall forecasts are made for 15 stations.

Model validation Simon Mason Seasonal Forecasting Using the Climate Predictability Tool Bangkok, Thailand, 12 – 16 January 2015.

Inequality. Household income thresholds for selected percentiles (U.S. 2013) 10 th percentile? 20 th percentile? 50 th percentile? 80 th percentile? 90.

Mathematics of PCR and CCA Simon Mason Seasonal Forecasting Using the Climate Predictability Tool Bangkok, Thailand, 12 – 16 January.

Forecasting in CPT Simon Mason Seasonal Forecasting Using the Climate Predictability Tool Bangkok, Thailand, 12 – 16 January 2015.

Statistical Inference Statistical Inference involves estimating a population parameter (mean) from a sample that is taken from the population. Inference.

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall 9-1 σ σ.

Can we distinguish wet years from dry years? Simon Mason Seasonal Forecasting Using the Climate Predictability Tool Bangkok, Thailand,

Climate Services for the Humanitarian Community Simon J. Mason International Research Institute for Climate and Society The Earth.

Linear Regression Simon Mason Seasonal Forecasting Using the Climate Predictability Tool Bangkok, Thailand, 12 – 16 January 2015.

Introduction to Item Analysis Objectives: To begin to understand how to identify items that should be improved or eliminated.

Details for Today: DATE:13 th January 2005 BY:Mark Cresswell FOLLOWED BY:Practical Dynamical Forecasting 69EG3137 – Impacts & Models of Climate Change.

Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.

Norm Referenced Your score can be compared with others 75 th Percentile Normed.

AP Economics The Exam. The AP Microeconomics Exam and the AP Macroeconomics Exam are each a little over two hours long. The multiple-choice section accounts.

Verification of Seasonal Forecasts

Introduction to Hypothesis Testing: The Binomial Test

Chapter 9 -Hypothesis Testing

Statistical Significance

Synthesis of Seasonal Prediction Skill

ARDHIAN SUSENO CHOIRUL RISA PRADANA P.

Understanding the Relative Operating Characteristic (ROC)

Binomials GrowingKnowing.com © 2011 GrowingKnowing.com © 2011.

Data Analysis and Standard Setting

Verifying and interpreting ensemble products

DJF SEASONAL FORECASTS Dominica

Analyzing and Interpreting Quantitative Data

StatCrunch Workshop Hector Facundo.

Binary Forecasts and Observations

Test Development Test conceptualization Test construction Test tryout

September 1, 2013.

Classroom Assessment Ways to improve tests.

Chapter Nine Part 1 (Sections 9.1 & 9.2) Hypothesis Testing

Predictability of Indian monsoon rainfall variability

How good (or bad) are seasonal forecasts?

Normal & Standard Normal Distributions

Seasonal Forecasting Using the Climate Predictability Tool

Can we distinguish wet years from dry years?

Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.

Seasonal Forecasting Using the Climate Predictability Tool

Skills test Numeracy support

Seasonal Forecasting Using the Climate Predictability Tool

Power Regression & Regression estimation of event probabilities (REEP)

Examples of item response theory item characteristics curves.

Presentation transcript:

Relative Operating Characteristics Simon J. Mason simon@iri.columbia.edu International Research Institute for Climate and Society The Earth Institute of Columbia University WMO CLIPS Focal Point Training Workshop for Regional Association II (Eastern Part) Bangkok, Thailand, 17 – 27 January, 2007

Two-Alternative Forced Choice Test If we give deterministic forecasts for a multi-category outcome accuracy scores can be misleading. For example, if we forecast for 3 equiprobable categories, the probability of a correct forecast is 33%, which would suggest that if our forecasts are correct 45% of the time then they are skilful. But to many users anything less than 50% sounds hopeless!

Two-Alternative Forced Choice Test

Two-Alternative Forced Choice Test It would be useful to have a score for which 50% represents no skill regardless of the number of categories. An option is a two-alternative forced choice (2AFC) test. A 2AFC test is a test to correctly identify which of two options has a characteristic of interest. (Normally, one, and only one, of these choices would have the characteristic.)

Two-Alternative Forced Choice Test Who has the highest income? Andriy Shevchenko David Beckham

Two-Alternative Forced Choice Test The commonest 2AFC tests are comparisons: does A score higher (or lower) on some characteristic than B? But 2AFC tests can also involve identifying the presence of a characteristic that is present in only one of the two options …

Two-Alternative Forced Choice Test Which of these two years was a “wet” year In Lusaka, Zambia (DJF rainfall was more than the climatological upper quartile)? What is the probability of getting the answer correct? 50% (assuming that you do not have inside information about Lusaka’s rainfall history or knowledge of ENSO teleconnections).

Two-Alternative Forced Choice Test Which of these two years was a “wet” year in Lusaka, Zambia (DJF rainfall was more than the climatological upper quartile)? What is the probability of getting the answer correct now? That depends on whether we can believe the forecasts.

Two-Alternative Forced Choice Test A simple score can be defined to indicate how good we are at 2AFC tests: calculate the proportion of tests for which we gave the correct answer. One advantage of this test is that the interpretation of the test’s score is intuitive: A score of 100% indicates a perfect set of answers. A score of 0% indicates a perfectly bad set of answers. A score of 50% would be expected by guessing the answer every time. A score of more than 50% suggests we have some skill in answering the questions.

Two-Alternative Forced Choice Test Now consider the case with multiple forecasts … Retroactive forecasts of DJF rainfall for Lusaka. In which years would we expect “droughts” (driest 25%) to occur?

Two-Alternative Forced Choice Test The most sensible strategy would be to list the years in order of increasing forecast rainfall. If the forecasts are good, the “drought” years should be at the top of the list.

Two-Alternative Forced Choice Test Now perform all possible 2AFC tests for each of the dry years. For 1994/95 the 2AFC tests are correct for all but 1997/98. 46 out of 75 tests are correct = 61%

Relative Operating Characteristics Alternatively, calculate cumulative hit and false-alarm rates. For the first guess: Repeat for all forecasts.

Relative Operating Characteristics

Relative Operating Characteristics We could do the same thing for the wet years …

Relative Operating Characteristics The area beneath the curve, 0.61 (0.77 for wet) is exactly the same as the 2AFC probabilities. It gives us the probability that we will successfully discriminate a “drought” year from a non-drought year.