Judging the credibility of climate projections

Slides:

Advertisements

Similar presentations

Measuring the performance of climate predictions Chris Ferro, Tom Fricker, David Stephenson Mathematics Research Institute University of Exeter, UK IMA.

Advertisements

Chapter 1 What is Science

On judging the credibility of climate predictions Chris Ferro (University of Exeter) Tom Fricker, Fredi Otto, Emma Suckling 12th International Meeting.

Level Ladder for RE Some suggestions for assessment using the eight level scale.

USING AND PROMOTING REFLECTIVE JUDGMENT AS STUDENT LEADERS ON CAMPUS Patricia M. King, Professor Higher Education, University of Michigan.

Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.

University of Oxford Quantifying and communicating the robustness of estimates of uncertainty in climate predictions Implications for uncertainty language.

WRITING the Research Problem.

PPA 503 – The Public Policy Making Process

Determining the Size of

Copyright © 2007 Pearson Education Canada 1 Chapter 12: Audit Sampling Concepts.

CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.

Evaluating decadal hindcasts: why and how? Chris Ferro (University of Exeter) T. Fricker, F. Otto, D. Stephenson, E. Suckling CliMathNet Conference (3.

Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.

Chapter 8 Introduction to Hypothesis Testing

Sekaran and Bougie; Research Methods for Business : A Skill Building Approach, 5th Ed; Wiley Scientific Investigation Ch. 2.

What Science Is and Is Not What is the goal of science?

Inductive Generalizations Induction is the basis for our commonsense beliefs about the world. In the most general sense, inductive reasoning, is that in.

1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.

Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.

Extreme values and risk Adam Butler Biomathematics & Statistics Scotland CCTC meeting, September 2007.

Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?

The inapplicability of traditional statistical methods for analysing climate ensembles Dave Stainforth International Meeting of Statistical Climatology.

Based on data to 2000, 20 years of additional data could halve uncertainty in future warming © Crown copyright Met Office Stott and Kettleborough, 2002.

Writing A Review Sources Preliminary Primary Secondary.

Slide 7.1 Saunders, Lewis and Thornhill, Research Methods for Business Students, 5 th Edition, © Mark Saunders, Philip Lewis and Adrian Thornhill 2009.

Predicting the performance of climate predictions Chris Ferro (University of Exeter) Tom Fricker, Fredi Otto, Emma Suckling 13th EMS Annual Meeting and.

Tom.h.wilson Department of Geology and Geography West Virginia University Morgantown, WV.

Tom.h.wilson Department of Geology and Geography West Virginia University Morgantown, WV.

Research Design

Prior beliefs Prior belief is knowledge that one has about a parameter of interest before any events have been observed – For example, you may have an.

Copyright © 2009 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.

BEHAVIOR BASED SELECTION Reducing the risk. Goals  Improve hiring accuracy  Save time and money  Reduce risk.

The Scientific Status of Psychology

What is Inductive Reasoning?

Lurking inferential monsters

Leacock, Warrican and Rose (2009)

What Is a Test of Significance?

making certain the uncertainties

AF1: Thinking Scientifically

The relationship between theory and methods

Moths n’ Stuff Jessica and Alanna.

of Heritage and New Hardware For Launch Vehicle Reliability Models

Statistical Data Analysis

Research Methods Lesson 1 choosing a research method types of data

Data, conclusions and generalizations

Copyright Pearson Prentice Hall

Future Studies (Futurology)

The Scientific Method in Psychology

Starter Look at the photograph, As a sociologist, you want to study a particular group in school. In pairs think about the following questions… Which group.

Chapter 6 Hypothesis tests.

Research proposal MGT-602.

Geology Geomath Chapter 7 - Statistics tom.h.wilson

Exam Skills Question 1 – Multiple choice question Worth 1 mark

Copyright Pearson Prentice Hall

EFnet: an added value of multi-model simulation

A LEVEL Paper Three– Section A

Statistical Data Analysis

Measuring the performance of climate predictions

Lesson Overview 1.1 What Is Science?.

Copyright Pearson Prentice Hall

Copyright Pearson Prentice Hall

Copyright Pearson Prentice Hall

Copyright Pearson Prentice Hall

the performance of weather forecasts

Chapter 1 The Science of Biology

What is a good ensemble forecast?

Lesson Overview 1.1 What Is Science?.

Chapter 5: Sampling Distributions

Assessing Similarity to Support Pediatric Extrapolation

Presentation transcript:

Judging the credibility of climate projections Chris Ferro (University of Exeter) Tom Fricker, Fredi Otto, Emma Suckling This session on evaluation of skill and relevance of climate projections. Anyone who has to decide how to use climate projections needs to make such an evaluation: how good do I expect the projections to be? Might choose to rely on judgment of experts, but potential frustration at insufficiently informative or contradictory judgments. For example, the majority of climate scientists have ‘considerable confidence that climate models provide credible quantitative estimates of future climate change’. Appeal to evidence that projections are constrained by physical principles (e.g. physical realism and agreement between climate models). Not all climate projections are issued with a measure of expected error, however, so how determine how to incorporate into decision? Worse, some authors argue that there is little justification for credibility of projections in most cases. In this talk, I shall encourage a more explicit use of empirical evidence to provide justified, quantitative assessments of how good we expect climate projections to be. Start by explaining what I mean by judging credibility, then mention one approach to justifying such judgments that doesn’t work for climate prediction, then describe and illustrate new approach that hope to see adopted more widely. Hope that such an approach will provide more useful information for those deciding how to use projections. Uncertainty in Weather, Climate and Impacts (13 March 2013, Royal Society, London)

Credibility and performance Many factors may influence credibility judgments, but should do so if and only if they affect our expectations about the performance of the predictions. Identify credibility with predicted performance. We must be able to justify and quantify (roughly) our predictions of performance if they are to be useful. e.g. reputation of predictor qualitative statements such as ‘fit for purpose’ are insufficient – at least need to be able to back up with quantitative evidence

Performance-based arguments Extrapolate past performance on basis of knowledge of the climate model and the real climate (Parker 2010). Define a reference class of predictions (including the prediction in question) whose performances you cannot reasonably order in advance, measure the performance of some members of the class, and infer the performance of the prediction in question. Popular for weather forecasts (many similar forecasts) but less use for climate predictions (Frame et al. 2007). e.g. performance of class of past weather forecasts taken as representative of future performance Reference classes defined on the basis of characteristics of the prediction problems. Past climate predictions: boundary conditions different to those specified in future, few out-of-sample cases, hindcasts potentially tuned

Climate predictions Few past predictions are similar to future predictions, so performance-based arguments are weak for climate. Other data may still be useful: short-range predictions, in-sample hindcasts, imperfect model experiments etc. These data are used by climate scientists, but typically to make qualitative judgments about performance. We propose to use these data explicitly to make quantitative judgments about future performance. some projections provide quantitative assessment of uncertainty, e.g. some AR4 and UKCP09. Even in those cases, expert judgment needs to be based on evidence and our approach can inform those judgments

Bounding arguments 1. Form a reference class of predictions that does not contain the prediction in question. 2. Judge if the prediction problem in question is harder or easier than those in the reference class. 3. Measure the performance of some members of the reference class. This provides a bound for your expectations about the performance of the prediction in question. e.g. bounding reference class of hindcasts Since the bounding class need not contain the prediction in question, there is more scope for forming such classes. Judge harder or easier based on characteristics of the predictions, resolving conflicting evidence, but usually harder for climate.

Bounding arguments S = performance of a prediction from reference class C S′ = performance of the prediction in question, from C′ Let performance be positive with smaller values better. Infer probabilities Pr(S > s) from a sample from class C. If C′ is harder than C then Pr(S′ > s) > Pr(S > s) for all s. If C′ is easier than C then Pr(S′ > s) < Pr(S > s) for all s. May not judge such a simple ordering, e.g. may change with s, but can’t be perfect and at least a useful approximation.

Hindcast example Global mean, annual mean surface air temperatures. Initial-condition ensembles of HadCM3 launched every year from 1960 to 2000. Measure performance by the absolute errors and consider a lead time of 10 years. 1. Perfect model: try to predict ensemble member 1 2. Imperfect model: try to predict CNRM-CM5 model 3. Reality: try to predict HadCRUT3 observations Not definitive analysis of performance – just illustration No bias correction. Similar results if use anomalies Many other ways to measure performance – see other work

Hindcast example

1. Errors when predict HadCM3 like probability of exceedance curve, shows proportion of prediction errors above different thresholds, e.g. 20% above 0.2 degrees do you expect errors tend to be larger, similar or smaller when try to predict Meteo-France model?

2. Errors when predict CNRM-CM5 They are all larger. Expect errors tend to be larger still when predict reality, similar or smaller?

3. Errors when predict reality They are smaller than the imperfect model errors – surprising. Shows difficulty of making these judgments. If expected larger then at least won’t be unpleasantly surprised – may have employed a precautionary approach. Testing ourselves in this way may help us to become better judges of future performance.

Recommendations Use existing data explicitly to justify quantitative predictions of the performance of climate projections. Collect data on more predictions, covering a range of physical processes and conditions, to tighten bounds. Design hindcasts and imperfect model experiments to be as similar as possible to future prediction problems. Train ourselves to be better judges of relative performance, especially to avoid over-confidence. Cover range of conditions to widen circumstances in which know something about performance. Coming across surprisingly good or bad performance can be a useful eye-opener. Make similar by cross-validating and avoiding over-tuning.

References Otto FEL, Ferro CAT, Fricker TE, Suckling EB (2012) On judging the credibility of climate projections. Climatic Change, submitted Allen M, Frame D, Kettleborough J, Stainforth D (2006) Model error in weather and climate forecasting. In Predictability of Weather and Climate (eds T Palmer, R Hagedorn) Cambridge University Press, 391-427 Frame DJ, Faull NE, Joshi MM, Allen MR (2007) Probabilistic climate forecasts and inductive problems. Phil. Trans. Roy. Soc. A 365, 1971-1992 Knutti R (2008) Should we believe model predictions of future climate change? Phil. Trans. Roy. Soc. A 366, 4647-4664 Parker WS (2010) Predicting weather and climate: uncertainty, ensembles and probability. Stud. Hist. Philos. Mod. Phys. 41, 263-272 Parker WS (2011) When climate models agree: the significance of robust model predictions. Philos. Sci. 78, 579-600 Smith LA (2002) What might we learn from climate forecasts? Proc. Natl. Acad. Sci. 99, 2487-2492 Stainforth DA, Allen MR, Tredger ER, Smith LA (2007) Confidence, uncertainty and decision-support relevance in climate predictions. Phil. Trans. Roy. Soc. A 365, 2145-2161

Make similar by cross-validating and avoiding over-tuning.

Future developments Bounding arguments may help us to form fully probabilistic judgments about performance. Let s = (s1, ..., sn) be a sample from S ~ F(∙|p). Let S′ ~ F(∙|cp) with priors p ~ g(∙) and c ~ h(∙). Then Pr(S′ ≤ s|s) = ∫∫F(s|cp)h(c)g(p|s)dcdp. Bounding arguments refer to prior beliefs about S′ directly rather than indirectly through beliefs about c. Cf. discrepancy modelling

Predicting performance We might try to predict performance by forming our own prediction of the predictand. If we incorporate information about the prediction in question then we must already have judged its credibility; if not then we ignore relevant information. Consider predicting a coin toss. Our own prediction is Pr(head) = 0.5. Then our prediction of the performance of another prediction is bound to be Pr(correct) = 0.5 regardless of other information about that prediction. Sometimes this is equivalent to predicting performance, however, e.g. probabilistic prediction for error of a deterministic forecast. More generally, we must predict performance directly.

1. Perfect model errors use just members 2 and 3 at 10-year lead time and note the similarities (performance-based argument) 1. Performance-based argument: expect member 3 to perform similarly to member 2 at each lead time, and it does! 2. Bounding argument: might expect 10-year hindcasts to perform worse than 2-year hindcasts, but they are similar. 3. Bounding argument: might expect imperfect model hindcasts to perform worse than perfect model hindcasts: see next slide. Ask audience!

1. Perfect model errors use just members 2 and 3 at 10-year lead time and note the similarities (performance-based argument) 1. Performance-based argument: expect member 3 to perform similarly to member 2 at each lead time, and it does! 2. Bounding argument: might expect 10-year hindcasts to perform worse than 2-year hindcasts, but they are similar. 3. Bounding argument: might expect imperfect model hindcasts to perform worse than perfect model hindcasts: see next slide. Ask audience!