Download presentation
Presentation is loading. Please wait.
1
Evaluating RCM Experiments
RCM Workshop Meteo Rwanda, Kigali 17th – 20th July 2017
2
Objectives of the session
Understand the context for RCM evaluation Identify the main components of model evaluation Discuss different evaluation techniques and aspects to consider Practice RCM evaluation Go through objectives. Also, say upfront that the word validation and evaluation may be used interchangeably in this talk – in practice there is little difference but in theory a validation process, which explores whether a model is empirically adequate (i.e. does it reproduce observed behaviour that it was designed to reproduce) is part of a broader evaluation of the model’s reliability and usefulness in practical application.
3
What should we expect from a model?
“All models are wrong, but some are useful” – famous quote from the statistician George Box that is worth keeping in mind throughout the process of evaluating and using model output. What we want to know is “does our model provide reasonable representation of the behaviour of interest for the system in question” “All models are wrong, but some are useful” George Box, 1987 © Crown copyright Met Office
4
What should we expect from a model?
“For numerical weather prediction, skill is relatively well defined because forecasts can be verified on a daily basis. For a climate model, it is more difficult to define a unique overall figure of merit, metric or skill score for long-term projections. Each model tends to simulate some aspects of the climate system well and others not so well, and each model has its own set of strengths and weaknesses. We do not need a perfect model, just one that serves the purpose.” R Knutti (2008) Should we believe model predictions of future climate change? Reto Knutti has done a lot of work in regional climate modeling, extends the issue with not being able to verify climate model forecasts. Therefore predictive “skill” is not a good measure for assessing a climate model that produces multi-decadal projections – we don’t expect a future climate projection to have “deterministic” skill in the same way a weather forecast. Also, and perhaps more importantly, all models will have strengths and weaknesses so a “best” model will usually be illusive. Model doesn’t have to be perfect, just empirically adequate – we might not need to account for everything as long as all first order issues are sufficiently well represented in the model. © Crown copyright Met Office
5
What is a model evaluation?
An assessment of how well the model is able to simulate the “present day” climate A model evaluation is not a model verification. Why is it important? It enables you to gain familiarity with the model characteristics It indicates which aspects of the model simulation are most credible …and therefore indicates how to make the best, most credible, use of the data to answer relevant questions Some more concrete statements about what a model evaluation is and why it is important. Say that simulating the “present day” climate is often the target for the evaluation but often it is challenging since climate change is a continuous process and the climate in some locations are definitely non-stationary, thus making it hard to validate climatology statistics. Emphasise the final point, that is reiterated at the end of the session, that only once a model evaluation process has been conducted can we know whether or not the model output has any value for answering scientific or societally relevant questions. © Crown copyright Met Office
6
How to undertake a model evaluation
4 stages, to be explained and worked through... © Crown copyright Met Office
7
The model evaluation process:
1) Identify the target and purpose of the evaluation 2) Obtain multiple sources of observed data to evaluate model performance 3) Assess the errors and biases in the GCMs that provide the LBCs for the RCM 4) Evaluate the RCM acknowledging the multiple sources of uncertainty (splits into 4 types of model evaluation) Stages of model evaluation: Identify target and purpose Obtain multiple sources of observed data Assess errors and biases in the GCM Evaluate RCM The next part of the session describes each of theses stages of the evaluation process in some more detail. Say that this is not meant to be prescriptive but rather that it is important to have a clear idea of why you are evaluating the model. © Crown copyright Met Office
8
Stages of model evaluation:
Identify the target and purpose of the evaluation Stages of model evaluation: Identify target and purpose Obtain multiple sources of observed data Assess errors and biases in the GCM Evaluate RCM Imagine you are interested in the impacts of flooding in the future in Mumbai. One of the major contributors to heavy rainfall is the variability of the intensity in the Indian Monsoon. To understand how the monsoon rains may change in the future, we would want a model that accurately represents the monsoon atmospheric circulation. This would require us to analyse the wind flow during the monsoon season, comparing the model data to available observations. So the questions you need to ask yourself are: What climate processes are key to understanding climate variability/change in the focus region? What variables (e.g. temperature, precipitation, humidity) need to be analysed to understand the ability of a model to represent the processes? What aspects of the climate system are of most interest? © Crown copyright Met Office
9
Stages of model evaluation:
Identify the target and purpose of the evaluation Stages of model evaluation: Identify target and purpose Obtain multiple sources of observed data Assess errors and biases in the GCM Evaluate RCM Now imagine you are advising the health minister of Russia and are interested in... Using this case study i.e. Simulating heatwave trends in Russia: Are you interested in extreme or rare events, or multi-year averages? Does the model need to provide accurate data at a specific spatial scale? What time and space scales are of interest? © Crown copyright Met Office
10
Stages of model evaluation:
Identify the target and purpose of the evaluation Stages of model evaluation: Identify target and purpose Obtain multiple sources of observed data Assess errors and biases in the GCM Evaluate RCM What aspects of the climate system are of most interest? What climate processes are key to understanding climate variability/change in the focus region? What variables (e.g. temperature, precipitation, humidity) are of most interest? What time and space scales are of interest? Are you interested in extreme or rare events, or multi-year averages? Does the model need to provide accurate data at a specific spatial scale? The main point here to know what it is that you want to be able to say at the end of the evaluation? What is the “target” – e.g. representation of monsoon rainfall in India; ability to simulate heat wave trends in Russia; ability to predict changes to maximum summer temperatures in north Africa? This will affect the process and focus of the evaluation. The slide poses some questions that are necessary to ask establishing the focus of the evaluation. It is impossible to evaluate everything about the model and its output so make sure you focus on the most important issues. © Crown copyright Met Office
11
Station data vs Gridded data
David and Victoria Beckham have just purchased a £27 million pound country house in England. David wants to know if he needs to adapt his house for climate change, so he runs PRECIS over the UK. When the models finish he then extracts the grid box his house is in and starts performing analysis on the data for that grid box. What’s wrong with his method? Very important message but introduce this carefully (could be confusing) – compare oranges and oranges (so to speak). Especially important when comparing indices (like dry spells, wet days in a year etc) as they are often calculated differently. Also, point about interpolating/aggregating to common coarsest grid is important. Figure on right (acknowledging too small to see numbers) shows 30 year rainfall return period over the US calculated in different ways – see study Chen (2008) for precise details; yellow/red higher rainfall, blue lower rainfall for 30 year return period. The left shows actual precip data aggregated at different spatial scales and the calculated index from this data. On right the index is calculated from the data first, and the aggregated at different spatial scales – hence Average (Index) ≠ Index (Average)
12
Station data vs. Gridded data
Compare like with like Data only have skill at spatial scales resolved by their grids The grid box data values are an area average for the full area. They are not point data and therefore not directly comparable with single point time series. Very important message but introduce this carefully (could be confusing) – compare oranges and oranges (so to speak). Especially important when comparing indices (like dry spells, wet days in a year etc) as they are often calculated differently. Also, point about interpolating/aggregating to common coarsest grid is important. Figure on right (acknowledging too small to see numbers) shows 30 year rainfall return period over the US calculated in different ways – see study Chen (2008) for precise details; yellow/red higher rainfall, blue lower rainfall for 30 year return period. The left shows actual precip data aggregated at different spatial scales and the calculated index from this data. On right the index is calculated from the data first, and the aggregated at different spatial scales – hence Average (Index) ≠ Index (Average) © Crown copyright Met Office
13
Stages of model evaluation:
Individual station vs. area averages Area averages Station data Stages of model evaluation: Identify target and purpose Obtain multiple sources of observed data Assess errors and biases in the GCM Evaluate RCM Exeter Message: This demonstrates an important point about how within a single gridbox the average value can be really different from all the stations in the box, more difference in precip (above). Temp is often consistently higher/lower than the area average whilst precip may differ day-to-day. The area average is vastly different from the station data. Therefore (see below)- The take-home message is: don’t just look at one station and compare to area average, as properties of individual stations are different to an area average. DO NOT COMPARE AREA AVERAGES RESULTS (e.g. Grid box output) TO SINGLE STATION DATA – THEY WON’T MATCH. © Crown copyright Met Office
14
Stages of model evaluation:
Choice of observed data - Use as many relevant observed datasets as possible. - Ensure you’re comparing ‘like for like’ data. Gridded datasets Observed datasets – e.g. CRU (land surface), TRMM (satellite rainfall), GPCP (merged rain gauge and satellite rainfall) Reanalysis data – e.g. ERA-Interim (atmosphere) Station data Use with caution! It can be useful to compare directly to model output but be aware of differences in spatial scales; ultimately one would not expect the data to match. . Stages of model evaluation: Identify target and purpose Obtain multiple sources of observed data Assess errors and biases in the GCM Evaluate RCM This is an often overlooked issue. There is no “correct” observed dataset to use – all have different strengths, weaknesses and assumptions. This slide distinguishes between gridded and station data – main difference is that gridded data aggregates information to the scale of the grid. Issues associated with this point will be illustrated with examples later. Another “like for like” point is that comparisons must be made using same time periods – e.g. Comparing a 1971 to 2000 model mean with 1971 to 2000 observed mean. A reanalysis dataset is produced from a climate model simulation that incorporates observed data from different sources. Reanalysis data: gridded data from observations is likely to be unreliable in places such as central Africa where observations are sparse. © Crown copyright Met Office
15
Observed precipitation over the Alps
Stages of model evaluation: Identify target and purpose Obtain multiple sources of observed data Assess errors and biases in the GCM Evaluate RCM CRU 3.0 data set resolution 0.5 x 0.5˚ Frei and Schaer Alpine analysis resolution 0.3 x 0.22 ˚ An example of two observational gridded datasets of precipitation over the Alpine area. The higher resolution of the dataset on the bottom plot provides more detail over the Alps than the dataset shown on the top. Additionally, the methodologies used to produce each gridded dataset from the source data are different (the source data are the actual meteorological observation stations). The methodologies used may be optimised according to the properties of the area of interest. For example, the methodologies for the Alpine dataset need to cater for high mountains, and these methodologies may not be best suited for a lowland area. I.e. Select an observational dataset that is optimised for your area of interest, e.g. Does it take into consideration topography? Average observed rainfall Period:
16
Stages of model evaluation:
Limits of evaluating models against observations Stages of model evaluation: Identify target and purpose Obtain multiple sources of observed data Assess errors and biases in the GCM Evaluate RCM What are the limitations of evaluating climate models based on past climate observations? We can only evaluate those variables and phenomena for which observations exist. In some places, there is a lack of, or insufficient quality of, long-term observations. The presence of long-term climate variability. How can we act to reduce the impact of these limitations? Use multiple independent observations of the same variable Use of model ensembles General comments from IPCC on limits of evaluating against observations. © Crown copyright Met Office
17
Exercise: Choice of observed data
10 MINUTE EXERCISE (HANDOUTS) This is an example from analysis at the University of Cape Town, who was interested in climate change in semi-arid regions of southern Africa (hence blue lines), and shows differences between three observed datasets – Global Precipitation Climatology Centre (GPCC), Climate Research Unit (CRU) and University of Deleware (UDEL). With person next to you, answer following questions: (5 mins) 1) Which row (top or bottom) shows greater differences between observational datasets? Why? 2) In which locations are there particularly notable differences between datasets? What might cause these differences? Discuss briefly as whole group. Answer: different observed datasets may look very similar when averaged over long period (e.g. whole season over 47 years) – i.e. Top row–but there may be substantial discrepancies for more specific statistics – e.g. Wettest season (bottom row). Notice specific difference in northeast South Africa – shows very wet summer in UDEL; or drier summers in western Namibi in CRU. Could indicate differences in observed datasets used – e.g. Data from high altitude or coastal stations etc. Summer (December to February) average and maximum (i.e. wettest) rainfall accumulation at each grid cell over the period 1963 to 2010; data taken from the CRU TS3.1, GPCC, and UDEL dataset. (Blue lines show semi-arid regions).
18
Stages of model evaluation:
Assess GCM data providing LBCs What two improvements can we see between CMIP3 and CMIP5 GCMs? Stages of model evaluation: Identify target and purpose Obtain multiple sources of observed data Assess errors and biases in the GCM Evaluate RCM GCMs are getting better - discuss the general improvement in GCMs since CMIP3 (though varies according to model). Two improvements: converging range of agreement with observations, increase in number of models. The distance metric is relative to observations (NCEP, ERA40, and MERRA for temperature; GPCP and CMAP for precipitation, see MK11). Red solid and dashed lines represent mean and medians respectively for the different ensembles. Knutti et al 2013 GRL © Crown copyright Met Office
19
Stages of model evaluation:
Assess GCM data The driving GCM provides the large-scale circulation through LBCs to an RCM. The GCM must be realistic in order for the RCM to effectively simulate the regional climate. Stages of model evaluation: Identify target and purpose Obtain multiple sources of observed data Assess errors and biases in the GCM Evaluate RCM When evaluating RCMs, it is first good to know about the quality of the data being supplied at the boundaries by the RCM. This figure is taken from the latest AR5 report and shows (a) multi-model mean rainfall from the CMIP5 AOGCMs and (b) the bias of this field with the GPCP climatology (which combines obs with satellite data). The bottom panels show (c) the magnitude of absolute error (shows greater in tropics, where rainfall amount are generally higher) (d) looks a little misleading for percentage error as colours are unintuitive. Main point is that GCM output will produce some biases. Depends where you are and what variables you are looking at as to how well underlying GCM fields represent observed system behaviour. The plots will need thorough unpacking and explanation. Don’t assume the audience knows “bias” or “absolute/relative error” – explain this. E.g. Downscaling over Philippines – GCMs typically have a large precip bias in this region, which could be carried forward into the RCM. Other regions (e.g) Europe have less bias in the GCMs. © Crown copyright Met Office
20
Evaluating RCM Output Final stage of model evaluation.
© Crown copyright Met Office
21
Stages of model evaluation:
Evaluating how well the RCM represents the current climate Stages of model evaluation: Identify target and purpose Obtain multiple sources of observed data Assess errors and biases in the GCM Evaluate RCM Why might discrepancies exist between the RCM output and the driving GCM/ observations? Systematic model bias (error in the model’s physical formulation) Errors in GCM affecting LBCs - Physical errors in the RCM - Spatial sampling issues (differences in resolution of model and observations) Observational error (gridding issues, instrument dependent errors) Explain that all RCMs are part of a combined GCM-RCM system and hence this could be a source of errors. First point relates to the detection of errors or biases (noting that there is a difference), between GCM and downscaled output, as well as between model output and observed data. Detecting errors entails a specific type of analysis, typically using statistical tests and visually comparing output. Later points relate to the explanation for errors or biases. – go into detail about these. © Crown copyright Met Office
22
Stages of model evaluation:
Aspects to consider in evaluation Assess as many meteorological variables as possible At least: Surface air temperature, precipitation, upper air winds Examine the physical realism exhibited within the model E.g. In cool and wet conditions we may expect high soil moisture. Is this so? Use both spatial and temporal information Spatial: Temporal: Full fields Smaller areas Vertical profiles Area averages Stages of model evaluation: Identify target and purpose Obtain multiple sources of observed data Assess errors and biases in the GCM Evaluate RCM Determining how well the model represents the present day climate is an important precursor for assessing its ability to represent climate change. Also, it is possible as we have observed datasets, but there are many aspects to consider and choices to make. Climate models often are able to simulate variables which vary less spatially in nature (like temperature, winds and pressure) with more accuracy than localised and sub-grid scale variables (like precipitation and cloud cover). As such it is recommended that validation be carried out for as many variables as possible. Important not to focus only on variable of interest but also other variables, and attributes within variables – can get right answer for the wrong reasons (e.g. Rain at wrong time of day even if average is fine). Very important to examine physical realism – particularly if then using model to extrapolate beyond observed record. One PRECIS user related a story of initialising a global model run using an ancillary (i.e. surface boundary condition) file for soil temperature which was in degrees Celsius. The model (which uses SI units) interpreted degrees Celsius as degrees Kelvin, meaning the model treated the soil as frozen (and near absolute zero!). This propagated through the components of the model and led to the model failing. The user was able to identify the problem by tracing back through variables which were affected by the frozen soil. Spatial and temporal analysis are both important and there are different tests, types of analysis which can be informative. Time series Seasonal, annual and decadal means Higher order statistics (variability, extremes) Different seasons, different regimes © Crown copyright Met Office
23
Stages of model evaluation:
Aspects to consider in evaluation One cannot, in general, compare individual model years with their corresponding observed years! Rather, we are looking for agreement in the aggregated distribution of weather states (i.e. climate) over time. Model forecasts (or hindcasts) are not constrained by the observations (i.e. weather) that actually happened. They are, however, constrained by forcings – the data that’s input at every time step (i.e. CO2, lateral boundary data, surface boundary data). However, when models are run using observed boundary data from reanalyses, model year to actual year comparisons can be worthwhile – reanalysis data is “quasi-observed” data. Stages of model evaluation: Identify target and purpose Obtain multiple sources of observed data Assess errors and biases in the GCM Evaluate RCM Highlight that comparisons of distributions aggregated over time are usually more appropriate and informative for assessing models – we don’t expect them to simulate the past exactly, especially if they are not initialised and/or forced with observed forcing conditions. Models are not forecasts (or hindcasts) the weather in past (historical) climate model runs is not constrained by real weather than happened – its constaint is the forcings (CO2, lateral boundary data, surface boundary data). Comparison against observations is an apples and oranges thing. *Except* when reanalysis boundary data is used, because this is actually driving the model with “quasi observed” boundary data. © Crown copyright Met Office
24
Stages of model evaluation:
Evaluating how well the RCM represents the current climate GCM Obs RCM Stages of model evaluation: Identify target and purpose Obtain multiple sources of observed data Assess errors and biases in the GCM Evaluate RCM There is potential for four separate validations: GCM vs Observations RCM (driven by GCM) vs GCM RCM (driven by GCM) vs Observations RCM (driven by Observations) vs Observations In order to validate the GCM-RCM system, there are four comparisons that can be made. Explain that they are all necessary to fully understand where the errors and biases are coming from. Biases (in particular) come from comparisons with obs whereas “errors” can be explored by model to model comparisons. Bias strictly means a systematic difference whilst an error implies something is wrong. © Crown copyright Met Office
25
Exercise: practice different types of RCM evaluation
1) Driving GCM v obs 2) RCM vs obs 3) RCM vs GCM ) Reanalysis driven RCM vs obs EXERCISE: 40 mins total. Split into four groups. Assign one evaluation exercise to each group. 20 minutes to answer questions. Should be able to do 2 of 4 exercises in this time. 20 minutes to go though answers as a whole class, invite solutions from each of the 4 groups ANSWER slides are next.
26
GCM vs obs Left: seasonal observations (CRU)
Centre: seasonal GCM output Right: seasonal difference plots Answers to questions on associated exercise sheet
27
GCM vs obs Left: seasonal observations (CRU)
Centre: seasonal GCM output Right: seasonal difference plots Q1: When is the main monsoon season for India? JJAS Q2: Does the GCM exhibit a bias in temperature or precipitation? (i.e. is the model warmer / wetter than observations). In which season(s) is it most pronounced? Where in the region is most affected? Temperature: Cold bias over the Himalayas. Most pronounced over entire region in DJF. Precipitation: Strong dry bias over all of India in JJAS (monsoon season), especially on the W coast. Also a dry bias on the SE coast during secondary monsoon season (ON)
28
RCM vs Observations Much too cold over the CVA region during the winter months, yet recovers to match obs during the monsoon season Timing of rainfall peak is correct, just not strong enough. Even with uncertainty across obs datasets, RCM is far too dry during the monsoon season! Aphrodite is a higher resolution obs dataset which is optimised for mountainous regions, therefore likely to be closer to the RCM than other more coarse obs data. Figure: Annual cycle of temperature (left) and precipitation (right), from RCM (PRECIS 2.1) and observations, over India,
29
RCM vs obs Left: seasonal observations (CRU)
Centre: seasonal RCM output Right: seasonal difference plots Cold bias in all seasons, restricted to the NW Himalayas and strongest during DJF. During monsoon season over mainland India/Bangladesh, temperature biases are more reasonable.
30
RCM vs obs Left: seasonal observations (CRU)
Centre: seasonal RCM output Right: seasonal difference plots Extremely dry during JJAS across much of India and Bangladesh. Drier than obs during ON on the southern tip of India. Could the cold bias in DJF and pre-monsoon season (MAM) be inhibiting the development of the land/sea temperature contrast to kick off the monsoon?
31
Which model contains the error? What is it?
RCM vs driving GCM Model 1 contains the error. Intense winds in top left corner or RCM. This is because of mountains at the boundary. Problem resolved and model 2 was produced.
32
Comparison plots for RCM (right) vs GCM (left)
RCM vs driving GCM Wind at 850HPa (ms-1) Winds: divergence around mountains in Northern Philippines (which are better represented in RCM) Precip and temperature act as you would expect – adding value. Surface temperature (K) Precipitation (mm/day)
33
Reanalysis driven RCM vs obs Seasonal mean precipitation, from PRECIS
There is a reasonable match between the average summer rainfall in the obs and model. Where does it not match ... Mountains? This is a common issue with RCMs – mountains are tough to get right. It’s also true that observations themselves in mountains are tough to record correctly. At first glance the plots are very similar. Further scrutiny shows that the model is overproducing precipitation in high altitude areas, e.g. in Norway. This is a common bias in regional models. Figure: Multi-annual ( ) summer (JJA) mean precipitation (mm/day) over Western Europe. Left: simulated precipitation from PRECIS (RCM) as driven by the ERA-40 reanalysis. Left: gridded observations from the EU “Ensembles” project. © Crown copyright Met Office
34
Reanalysis driven RCM vs obs Frequency of wet days, from PRECIS
However, other more specific statistics (such as wet day frequency) might not give the same impression – i.e. averages smooth things out – see Spain for example, model produces far to many wet days. Figure 2b is for the same time period as in Figure 2a, but instead of total precipitation shows the wet day frequency. In other words, what percentage of days the model produces rainfall in a given area (left) versus how often the observations showed rainfall (right). These plots show that the model produced rainfall too often in comparison to the observations in high altitude areas e.g. the Alps, where the model erroneously produces rainfall over 90% of the time, and for northern and eastern Europe in general. Make the point here that just looking at the results from the last slide will hide the fact that the model is producing rainfall too often (or not enough). Figure: Multi-annual ( ) summer (JJA) wet day frequency (0-1) over Western Europe. Left: simulated precipitation from PRECIS (RCM) as driven by the ERA-40 reanalysis. Left: gridded observations from the EU “Ensembles” project. © Crown copyright Met Office
35
What now? Use of the RCM output beyond the evaluation
Having evaluated the RCM output, is it appropriate to use the simulated future climate output? For what scales, variables and types of questions is the model output able to provide “useful” information? A quick point to note that the evaluation informs how we can interpret the model output and how much confidence we can place in RCM projections. © Crown copyright Met Office
36
Stages of model evaluation:
Summary Stages of model evaluation: Identify target and purpose Obtain multiple sources of observed data Assess errors and biases in the GCM Evaluate RCM Model evaluation is ESSENTIAL: - It enables familiarisation with the model and its projected output. - A simulation may be over an area where the model performance is untested. - An evaluation provides a baseline for assessing the credibility of future projections from RCMs, which has implications for how the output can and should be used. Highlight that comparisons of distributions aggregated over time are usually more appropriate and informative for assessing models – we don’t expect them to simulate the past exactly, especially if they are not initialised and/or forced with observed forcing conditions. Models are not forecasts (or hindcasts) the weather in past (historical) climate model runs is not constrained by real weather than happened – its constaint is the forcings (CO2, lateral boundary data, surface boundary data). Comparison against observations is an apples and oranges thing. *Except* when reanalysis boundary data is used, because this is actually driving the model with “quasi observed” boundary data. © Crown copyright Met Office
37
Thanks for listening. Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.