Judging the credibility of climate projections

Judging the credibility of climate projections
Chris Ferro (University of Exeter) Tom Fricker, Fredi Otto, Emma Suckling This session on evaluation of skill and relevance of climate projections. Anyone who has to decide how to use climate projections needs to make such an evaluation: how good do I expect the projections to be? Might choose to rely on judgment of experts, but potential frustration at insufficiently informative or contradictory judgments. For example, the majority of climate scientists have ‘considerable confidence that climate models provide credible quantitative estimates of future climate change’. Appeal to evidence that projections are constrained by physical principles (e.g. physical realism and agreement between climate models). Not all climate projections are issued with a measure of expected error, however, so how determine how to incorporate into decision? Worse, some authors argue that there is little justification for credibility of projections in most cases. In this talk, I shall encourage a more explicit use of empirical evidence to provide justified, quantitative assessments of how good we expect climate projections to be. Start by explaining what I mean by judging credibility, then mention one approach to justifying such judgments that doesn’t work for climate prediction, then describe and illustrate new approach that hope to see adopted more widely. Hope that such an approach will provide more useful information for those deciding how to use projections. Uncertainty in Weather, Climate and Impacts (13 March 2013, Royal Society, London)

Credibility and performance
Many factors may influence credibility judgments, but should do so if and only if they affect our expectations about the performance of the predictions. Identify credibility with predicted performance. We must be able to justify and quantify (roughly) our predictions of performance if they are to be useful. e.g. reputation of predictor qualitative statements such as ‘fit for purpose’ are insufficient – at least need to be able to back up with quantitative evidence

Performance-based arguments
Extrapolate past performance on basis of knowledge of the climate model and the real climate (Parker 2010). Define a reference class of predictions (including the prediction in question) whose performances you cannot reasonably order in advance, measure the performance of some members of the class, and infer the performance of the prediction in question. Popular for weather forecasts (many similar forecasts) but less use for climate predictions (Frame et al. 2007). e.g. performance of class of past weather forecasts taken as representative of future performance Reference classes defined on the basis of characteristics of the prediction problems. Past climate predictions: boundary conditions different to those specified in future, few out-of-sample cases, hindcasts potentially tuned

Climate predictions Few past predictions are similar to future predictions, so performance-based arguments are weak for climate. Other data may still be useful: short-range predictions, in-sample hindcasts, imperfect model experiments etc. These data are used by climate scientists, but typically to make qualitative judgments about performance. We propose to use these data explicitly to make quantitative judgments about future performance. some projections provide quantitative assessment of uncertainty, e.g. some AR4 and UKCP09. Even in those cases, expert judgment needs to be based on evidence and our approach can inform those judgments

Bounding arguments 1. Form a reference class of predictions that does not contain the prediction in question. 2. Judge if the prediction problem in question is harder or easier than those in the reference class. 3. Measure the performance of some members of the reference class. This provides a bound for your expectations about the performance of the prediction in question. e.g. bounding reference class of hindcasts Since the bounding class need not contain the prediction in question, there is more scope for forming such classes. Judge harder or easier based on characteristics of the predictions, resolving conflicting evidence, but usually harder for climate.

Bounding arguments S = performance of a prediction from reference class C S′ = performance of the prediction in question, from C′ Let performance be positive with smaller values better. Infer probabilities Pr(S > s) from a sample from class C. If C′ is harder than C then Pr(S′ > s) > Pr(S > s) for all s. If C′ is easier than C then Pr(S′ > s) < Pr(S > s) for all s. May not judge such a simple ordering, e.g. may change with s, but can’t be perfect and at least a useful approximation.

Hindcast example Global mean, annual mean surface air temperatures. Initial-condition ensembles of HadCM3 launched every year from 1960 to Measure performance by the absolute errors and consider a lead time of 10 years. 1. Perfect model: try to predict ensemble member 1 2. Imperfect model: try to predict CNRM-CM5 model 3. Reality: try to predict HadCRUT3 observations Not definitive analysis of performance – just illustration No bias correction. Similar results if use anomalies Many other ways to measure performance – see other work

Hindcast example

1. Errors when predict HadCM3
like probability of exceedance curve, shows proportion of prediction errors above different thresholds, e.g. 20% above 0.2 degrees do you expect errors tend to be larger, similar or smaller when try to predict Meteo-France model?

2. Errors when predict CNRM-CM5
They are all larger. Expect errors tend to be larger still when predict reality, similar or smaller?

3. Errors when predict reality
They are smaller than the imperfect model errors – surprising. Shows difficulty of making these judgments. If expected larger then at least won’t be unpleasantly surprised – may have employed a precautionary approach. Testing ourselves in this way may help us to become better judges of future performance.

Recommendations Use existing data explicitly to justify quantitative predictions of the performance of climate projections. Collect data on more predictions, covering a range of physical processes and conditions, to tighten bounds. Design hindcasts and imperfect model experiments to be as similar as possible to future prediction problems. Train ourselves to be better judges of relative performance, especially to avoid over-confidence. Cover range of conditions to widen circumstances in which know something about performance. Coming across surprisingly good or bad performance can be a useful eye-opener. Make similar by cross-validating and avoiding over-tuning.

References Otto FEL, Ferro CAT, Fricker TE, Suckling EB (2012) On judging the credibility of climate projections. Climatic Change, submitted Allen M, Frame D, Kettleborough J, Stainforth D (2006) Model error in weather and climate forecasting. In Predictability of Weather and Climate (eds T Palmer, R Hagedorn) Cambridge University Press, Frame DJ, Faull NE, Joshi MM, Allen MR (2007) Probabilistic climate forecasts and inductive problems. Phil. Trans. Roy. Soc. A 365, Knutti R (2008) Should we believe model predictions of future climate change? Phil. Trans. Roy. Soc. A 366, Parker WS (2010) Predicting weather and climate: uncertainty, ensembles and probability. Stud. Hist. Philos. Mod. Phys. 41, Parker WS (2011) When climate models agree: the significance of robust model predictions. Philos. Sci. 78, Smith LA (2002) What might we learn from climate forecasts? Proc. Natl. Acad. Sci. 99, Stainforth DA, Allen MR, Tredger ER, Smith LA (2007) Confidence, uncertainty and decision-support relevance in climate predictions. Phil. Trans. Roy. Soc. A 365,

Make similar by cross-validating and avoiding over-tuning.

Future developments Bounding arguments may help us to form fully probabilistic judgments about performance. Let s = (s1, ..., sn) be a sample from S ~ F(∙|p). Let S′ ~ F(∙|cp) with priors p ~ g(∙) and c ~ h(∙). Then Pr(S′ ≤ s|s) = ∫∫F(s|cp)h(c)g(p|s)dcdp. Bounding arguments refer to prior beliefs about S′ directly rather than indirectly through beliefs about c. Cf. discrepancy modelling

Predicting performance
We might try to predict performance by forming our own prediction of the predictand. If we incorporate information about the prediction in question then we must already have judged its credibility; if not then we ignore relevant information. Consider predicting a coin toss. Our own prediction is Pr(head) = 0.5. Then our prediction of the performance of another prediction is bound to be Pr(correct) = 0.5 regardless of other information about that prediction. Sometimes this is equivalent to predicting performance, however, e.g. probabilistic prediction for error of a deterministic forecast. More generally, we must predict performance directly.

1. Perfect model errors use just members 2 and 3 at 10-year lead time and note the similarities (performance-based argument) 1. Performance-based argument: expect member 3 to perform similarly to member 2 at each lead time, and it does! 2. Bounding argument: might expect 10-year hindcasts to perform worse than 2-year hindcasts, but they are similar. 3. Bounding argument: might expect imperfect model hindcasts to perform worse than perfect model hindcasts: see next slide. Ask audience!

Judging the credibility of climate projections

Similar presentations

Presentation on theme: "Judging the credibility of climate projections"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Judging the credibility of climate projections

Similar presentations

Presentation on theme: "Judging the credibility of climate projections"— Presentation transcript:

Similar presentations

About project

Feedback