Process-based modelling of vegetations and uncertainty quantification Marcel van Oijen (CEH-Edinburgh) Statistics for Environmental Evaluation Glasgow,

Process-based modelling of vegetations and uncertainty quantification Marcel van Oijen (CEH-Edinburgh) Statistics for Environmental Evaluation Glasgow, 2010-09-01

ContentsContents 1.Process-based modelling 2.The Bayesian approach 3.Bayesian Calibration (BC) of process-based models 4.Bayesian Model Comparison (BMC) 5.Examples of BC & BMC in other sciences 6.The future of BC & BMC? 7.References, Summary, Discussion

1. Process-based modelling

1.1 Ecosystem PBMs simulate biogeochemistry Atmosphere Tree Soil Subsoil H2OH2O H2OH2O H2OH2O H2OH2OC C C N N N N

1.2 I/O of PBMs Parameters & initial constants vegetation Parameters & initial constants soil Atmospheric drivers InputModel Output Management & land use Simulation of time series of plant and soil variables

1.3 I/O of empirical models Two parameters: P1 = slope P2 = intercept InputModel Output Y = P1 + P2 * t

1.4 Environmental evaluation: increasing use of PBMs C-sequestration (model output for 1920-2000) Uncertainty of C-sequestration

1.5 Forest models and uncertainty Model [Levy et al, 2004]

1.6 Forest models and uncertainty bgc century hybrid N dep UE (kg C kg -1 N) [Levy et al, 2004]

1.7 Many models! Status: 680 models (21.05.10) Search models (by free-text-search) Result of query : List of words : soil, carbon 96 models found logical operator: and type of search: word ANIMO: Agricultural NItrogen MOdel BETHY: Biosphere Energy-Transfer Hydrology scheme BIOMASS: Forest canopy carbon and water balance model BIOME-BGC: Biome model - BioGeochemical Cycles BIOME3: Biome model BLUEGRAMA: BLUE GRAMA CANDY: Carbon and Nitrogen Dynamics in soils CARBON: Wageningen Carbon Cycle Model CARBON_IN_SOILS: TURNOVER OF CARBON IN SOIL CARDYN: CARbon DYNamics CASA: Carnegie-Ames-Stanford Approach (CASA) Biosphere model CENTURY: grassland and agroecosystem dynamics model CERES_CANOLA : CERES-Canola 3.0 CHEMRANK: Interactive Model for Ranking the Potential of Organic Chemicals to Contaminte Groundwater COUPMODEL: Coupled heat and mass transfer model for soil-plant-atmosphere system (…) http://ecobas.org/www-server/index.html

1.8 Reality check ! How reliable are these model studies: Sufficient data for model parameterization? Sufficient data for model input? Would other models have given different results? In every study using systems analysis and simulation: Model parameters, inputs and structure are uncertain How to deal with uncertainties optimally?

2. The Bayesian approach

Probability Theory Uncertainties are everywhere: Models (environmental inputs, parameters, structure), Data Uncertainties can be expressed as probability distributions (pdfs) We need methods that: Quantify all uncertainties Show how to reduce them Efficiently transfer information: data models model application Calculating with uncertainties (pdfs) = Probability Theory

The Bayesian approach: reasoning using probability theory

2.1 Dealing with uncertainty: Medical diagnostics A flu epidemic occurs: one percent of people is ill Diagnostic test, 99% reliable Test result is positive (bad news!) What is P(diseased|test positive)? (a)0.50 (b)0.98 (c)0.99 P(dis) = 0.01 P(pos|hlth) = 0.01 P(pos|dis) = 0.99 P(dis|pos) = P(pos|dis) P(dis) / P(pos) Bayes Theorem

2.1 Dealing with uncertainty: Medical diagnostics A flu epidemic occurs: one percent of people is ill Diagnostic test, 99% reliable Test result is positive (bad news!) What is P(diseased|test positive)? (a)0.50 (b)0.98 (c)0.99 P(dis) = 0.01 P(pos|hlth) = 0.01 P(pos|dis) = 0.99 P(dis|pos) = P(pos|dis) P(dis) / P(pos) = P(pos|dis) P(dis) P(pos|dis) P(dis) + P(pos|hlth) P(hlth) Bayes Theorem

2.1 Dealing with uncertainty: Medical diagnostics A flu epidemic occurs: one percent of people is ill Diagnostic test, 99% reliable Test result is positive (bad news!) What is P(diseased|test positive)? (a)0.50 (b)0.98 (c)0.99 P(dis) = 0.01 P(pos|hlth) = 0.01 P(pos|dis) = 0.99 P(dis|pos) = P(pos|dis) P(dis) / P(pos) = P(pos|dis) P(dis) P(pos|dis) P(dis) + P(pos|hlth) P(hlth) = 0.99 0.01 0.99 0.01 + 0.01 0.99 = 0.50 Bayes Theorem

2.2 Bayesian updating of probabilities Model parameterization:P(params) P(params|data) Model selection:P(models) P(model|data) SPAM-killer:P(SPAM) P(SPAM|E-mail header) Weather forecasting:… Climate change prediction:… Oil field discovery:… GHG-emission estimation:… Jurisprudence:… Bayes Theorem:Prior probability Posterior prob. Medical diagnostics:P(disease) P(disease|test result)

2.3 What and why? We want to use data and models to explain and predict ecosystem behaviour Data as well as model inputs, parameters and outputs are uncertain No prediction is complete without quantifying the uncertainty. No explanation is complete without analysing the uncertainty Uncertainties can be expressed as probability density functions (pdfs) Probability theory tells us how to work with pdfs: Bayes Theorem (BT) tells us how a pdf changes when new information arrives BT: Prior pdf Posterior pdf BT: Posterior = Prior x Likelihood / Evidence BT: P(θ|D) = P(θ) P(D|θ) / P(D) BT: P(θ|D) P(θ) P(D|θ)

3. Bayesian Calibration (BC) of process-based models

Bayesian updating of probabilities for process-based models Model parameterization:P(params) P(params|data) Model selection:P(models) P(model|data) Bayes Theorem:Prior probability Posterior prob.

3.1 Process-based forest models Soil C NPP Height Environmental scenarios Initial values Parameters Model

3.2 Process-based forest model BASFOR BASFOR 40+ parameters 12+ output variables

3.3 BASFOR: outputs Volume (standing) Carbon in trees (standing + thinned) Carbon in soil

3.4 BASFOR: parameter uncertainty

3.5 BASFOR: prior output uncertainty Volume (standing) Carbon in trees (standing + thinned) Carbon in soil

3.6 Data Dodd Wood (R. Matthews, Forest Research) Volume (standing) Carbon in trees (standing + thinned) Carbon in soil

3.7 Using data in Bayesian calibration of BASFOR Prior pdf Posterior pdf Data Bayesian calibration

3.8 Bayesian calibration: posterior uncertainty Volume (standing) Carbon in trees (standing + thinned) Carbon in soil

3.9 Calculating the posterior using MCMC Sample of 10 4 -10 5 parameter vectors from the posterior distribution P( |D) for the parameters P( |D) P( ) P(D|f( )) 1.Start anywhere in parameter-space: p 1..39 (i=0) 2.Randomly choose p(i+1) = p(i) + δ 3.IF:[ P(p(i+1)) P(D|f(p(i+1))) ] / [ P(p(i)) P(D|f(p(i))) ] > Random[0,1] THEN: accept p(i+1) ELSE: reject p(i+1) i=i+1 4.IF i < 10 4 GOTO 2 Metropolis et al (1953) MCMC trace plots

3.10 BC using MCMC: an example in EXCEL Click here for BC_MCMC1.xls

install.packages("mvtnorm") require(mvtnorm) chainLength = 10000 data <- matrix(c(10,6.09,1.83, 20,8.81,2.64, 30,10.66,3.27), nrow=3, ncol=3, byrow=T) param <- matrix(c(0,5,10, 0,0.5,1), nrow=2, ncol=3, byrow=T) pMinima <- c(param[1,1], param[2,1]) pMaxima <- c(param[1,3], param[2,3]) logli <- matrix(, nrow=3, ncol=1) vcovProposal = diag( (0.05*(pMaxima-pMinima)) ^2 ) pValues <- c(param[1,2], param[2,2]) pChain <- matrix(0, nrow=chainLength, ncol = length(pValues)+1) logPrior0 <- sum(log(dunif(pValues, min=pMinima, max=pMaxima))) model <- function (times,intercept,slope) {y <- intercept+slope*times return(y)} for (i in 1:3) {logli[i] <- -0.5*((model(data[i,1],pValues[1],pValues[2])- data[i,2])/data[i,3])^2 - log(data[i,3])} logL0 <- sum(logli) pChain[1,] <- c(pValues, logL0) # Keep first values for (c in (2 : chainLength)){ candidatepValues <- rmvnorm(n=1, mean=pValues, sigma=vcovProposal) if (all(candidatepValues>pMinima) && all(candidatepValues<pMaxima)) {Prior1 <- prod(dunif(candidatepValues, pMinima, pMaxima))} else {Prior1 <- 0} if (Prior1 > 0) { for (i in 1:3){logli[i] <- -0.5*((model(data[i,1],candidatepValues[1],candidatepValues[2])- data[i,2])/data[i,3])^2 - log(data[i,3])} logL1 <- sum(logli) logalpha <- (log(Prior1)+logL1) - (logPrior0+logL0) if ( log(runif(1, min = 0, max =1)) < logalpha ) { pValues <- candidatepValues logPrior0 <- log(Prior1) logL0 <- logL1}} pChain[c,1:2] <- pValues pChain[c,3] <- logL0 } nAccepted = length(unique(pChain[,1])) acceptance = (paste(nAccepted, "out of ", chainLength, "candidates accepted ( = ", round(100*nAccepted/chainLength), "%)")) print(acceptance) mp <- apply(pChain, 2, mean) print(mp) pCovMatrix <- cov(pChain) print(pCovMatrix) MCMC in R

3.12 Using data in Bayesian calibration of BASFOR Prior pdf Data Bayesian calibration Posterior pdf

3.13 Parameter correlations 39 parameters

3.14 Continued calibration when new data become available Prior pdf Posterior pdf Bayesian calibration Prior pdf New data

3.14 Continued calibration when new data become available New data Bayesian calibration Prior pdf Posterior pdf Prior pdf

3.15 Bayesian projects at CEH-Edinburgh Selection of forest models ( NitroEurope team) Data Assimilation forest EC data (David Cameron, Mat Williams) Risk of frost damage in grassland (Stig Morten Thorsen, Anne-Grete Roer, MvO) Uncertainty in agricultural soil models (Lehuger, Reinds, MvO) Uncertainty in UK C-sequestration (MvO, Jonty Rougier, Ron Smith, Tommy Brown, Amanda Thomson) Parameterization and uncertainty quantification of 3-PG model of forest growth & C-stock (Genevieve Patenaude, Ronnie Milne, M. v.Oijen) Uncertainty in earth system resilience (Clare Britton & David Cameron) [CO 2 ] Time

3.16 BASFOR: forest C-sequestration 1920-2000 - Uncertainty due to model parameters only, NOT uncertainty in inputs / upscaling Soil N-content C-sequestration Uncertainty of C-sequestration

3.18 What kind of measurements would have reduced uncertainty the most ?

3.19 Prior predictive uncertainty & height-data Height Biomass Prior pred. uncertainty Height data Skogaby

3.20 Prior & posterior uncertainty: use of height data Height Biomass Prior pred. uncertainty Posterior uncertainty (using height data) Height data Skogaby

3.20 Prior & posterior uncertainty: use of height data Height Biomass Prior pred. uncertainty Posterior uncertainty (using height data) Height data (hypothet.)

3.20 Prior & posterior uncertainty: use of height data Height Biomass Prior pred. uncertainty Posterior uncertainty (using height data) Posterior uncertainty (using precision height data)

3.22 Summary for BC vs tuning Model tuning 1.Define parameter ranges (permitted values) 2.Select parameter values that give model output closest (r 2, RMSE, …) to data 3.Do the model study with the tuned parameters (i.e. no model output uncertainty) Bayesian calibration 1. Define parameter pdfs 2. Define data pdfs (probable measurement errors) 3. Use Bayes Theorem to calculate posterior parameter pdf 4. Do all future model runs with samples from the parameter pdf (i.e. quantify uncertainty of model results) BC can use data to reduce parameter uncertainty for any process-based model

4. Bayesian Model Comparison (BMC)

4.1 Multiple models -> structural uncertainty bgc century hybrid N dep UE (kg C kg -1 N) [Levy et al, 2004]

4.2 Bayesian comparison of two models Bayes Theorem for model probab.: P(M|D) = P(M) P(D|M) / P(D) The Integrated likelihood P(D|M i ) can be approximated from the MCMC sample of outputs for model M i ( * ) Model 1 Model 2 P(M 2 |D) / P(M 1 |D) = P(D|M 2 ) / P(D|M 1 ) The Bayes Factor P(D|M 2 ) / P(D|M 1 ) quantifies how the data D change the odds of M 2 over M 1 P(M 1 ) = P(M 2 ) = ½ (*)(*) harmonic mean of likelihoods in MCMC-sample (Kass & Raftery, 1995)

4.3 BMC: Tuomi et al. 2007

4.4 Bayes Factor for two big forest models MCMC 5000 steps Calculation of P(D|BASFOR) Calculation of P(D|BASFOR+) Data Rajec: Emil Klimo

4.5 Bayes Factor for two big forest models MCMC 5000 steps Calculation of P(D|BASFOR) Calculation of P(D|BASFOR+) Data Rajec: Emil Klimo P(D|M 1 ) = 7.2e-016 P(D|M 2 ) = 5.8e-15 Bayes Factor = 7.8, so BASFOR+ supported by the data

4.6 Summary of BMC: what do we need, what do we do? What do we need to carry out a BMC? 1. Multiple models:M 1, …, M n 2. For each model, a list of its parameters:θ 1, …, θ n 3. Data:D What do we do with the models, parameters and data? 1. We express our uncertainty about the correctness of models, parameter values and data by means of probability distributions. 2. We apply the rules of probability theory to transfer the information from the data to the probability distributions for models and parameters 3. The result tells us which model is the most plausible, and what its parameter values are likely to be

5. Examples of BC & BMC in other sciences

Linear regression using least squares Model: straight line Prior: uniform Likelihood: Gaussian (iid) BC, e.g. for spatiotemporal stochastic modelling with spatial correlations included in the prior = Note: Realising that LS-regression is a special case of BC opens up possibilities to improve on it, e.g. by having more information in the prior or likelihood (Sivia 2005) All Maximum Likelihood estimation methods can be seen as limited forms of BC where the prior is ignored (uniform) and only the maximum value of the likelihood is identified (ignoring uncertainty) Hierarchical modelling = BC, except that uncertainty is ignored 5.1 Bayes in other disguises

- Inverse modelling (e.g. to estimate emission rates from concentrations) - Geostatistics, e.g. Bayesian kriging - Data Assimilation (KF, EnKF etc.) 5.2 Bayes in other disguises (cont.)

5.3 Regional application of plot-scale models Upscaling methodModel structureModelling uncertainty 1. Stratify into homogeneous subregions & Apply UnchangedP(θ) unchanged Upscaling unc. 2. Apply to selected points (plots) & Interpolate Unchanged (but extend w. geostatistical model) P(θ) unchanged (Bayesian kriging only), Interpolation uncertainty 3. Reinterpret the model as a regional one & Apply UnchangedNew BC using regional I-O data 4. Summarise model behav. & Apply exhaustively (deterministic metamodel) E.g. multivariate regression model or simple mechanistic New BC needed of metamodel using plot-data 5. As 4. (stochastic emulator)E.g. Gaussian process emulator Code uncertainty (Kennedy & OH.) 6. Summarise model behaviour & Embed in regional model Unrelated new model New BC using regional I-O data

6. The future of BC & BMC?

6.1 Trends More use of Bayesian approaches in all areas of environmental science Improvements in computational techniques for BC & BMC of slow process-based models Increasing use of hierarchical models (to represent complex prior pdfs, or to represent spatial relationships) Replacement of informal methods (or methods that only approximate the full probability approach) by BMC

Bayes in climate science

Improvements in Markov Chain Monte Carlo algorithms

Hierarchical Bayesian modelling in ecology See also: Ogle, K. and J.J. Barber (2008) "Bayesian data-model integration in plant physiological and ecosystem ecology." Progress in Botany 69:281-311

Using BC to make model spin-up unnecessary (subm.)

Bayes & space Van Oijen, Thomson & Ewert (2009)

7. Summary, References, Discussion

7.1 Summary of BC&BMC: What is the Bayesian approach? 1. Express all uncertainties probabilistically Assign probability distributions to (1) data, (2) the collection of models, (3) the parameter-set of each individual model 2. Use the rules of probability theory to transfer the information from the data to the probability distributions for models and parameters Main tool from probability theory to do this: Bayes Theorem P(α|D) P(α) P(D|α) Posterior is proportional to prior times likelihood α = parameter set parameterisation (Bayesian Calibration, BC) α = model set model evaluation (Bayesian Model Comparison, BMC)

7.2 Bayesian methods: References Bayes, T. (1763) Metropolis, N. (1953) Kass & Raftery (1995) Green, E.J. / MacFarlane, D.W. / Valentine, H.T., Strawderman, W.E. (1996, 1998, 1999, 2000) Jansen, M. (1997) Jaynes, E.T. (2003) Van Oijen et al. (2005) Bayes Theorem MCMC BMC Forest models Crop models Probability theory Complex process- based models, MCMC

Bayesian Calibration (BC) and Bayesian Model Comparison (BMC) of process- based models: Theory, implementation and guidelines Freely downloadable from http://nora.nerc.ac.uk/6087/

7.4 Discussion statements / Conclusions Uncertainty (= incomplete information) is described by pdfs 1.Plausible reasoning implies probability theory (PT) (Cox, Jaynes) 2.Main tool from PT for updating pdfs: Bayes Theorem 3.Parameter estimation = quantifying joint parameter pdf BC 4.Model evaluation = quantifying pdf in model space requires at least two models BMC

7.4 Discussion statements / Conclusions Uncertainty (= incomplete information) is described by pdfs 1.Plausible reasoning implies probability theory (PT) (Cox, Jaynes) 2.Main tool from PT for updating pdfs: Bayes Theorem 3.Parameter estimation = quantifying joint parameter pdf BC 4.Model evaluation = quantifying pdf in model space requires at least two models BMC Practicalities: 1.When new data arrive: MCMC provides a universal method for calculating posterior pdfs 2.Quantifying the prior: Not a key issue in env. sci.: (1) many data, (2) prior is posterior from previous calibration 3.Defining the likelihood: Normal pdf for measurement error usually describes our prior state of knowledge adequately (Jaynes) 4.Bayes Factor shows how new data change the odds of models, and is a by-product from Bayesian calibration (Kass & Raftery) Overall: Uncertainty quantification often shows that our models are not very reliable

Appendix A: How to do BC The problem: You have: (1) a prior pdf P(θ) for your models parameters, (2) new data. You also know how to calculate the likelihood P(D|θ). How do you now go about using BT to calculate the posterior P(θ|D)? Methods of using BT to calculate P(θ|D): 1.Analytical. Only works when the prior and likelihood are conjugate (family-related). For example if prior and likelihood are normal pdfs, then the posterior is normal too. 2.Numerical. Uses sampling. Three main methods: 1.MCMC (e.g. Metropolis, Gibbs) Sample directly from the posterior. Best for high-dimensional problems 2.Accept-Reject Sample from the prior, then reject some using the likelihood. Best for low-dimensional problems 3.Model emulation followed by MCMC or A-R

Should we measure the sensitive parameters? Yes, because the sensitive parameters: are obviously important for prediction ? No, because model parameters: are correlated with each other, which we do not measure cannot really be measured at all So, it may be better to measure output variables, because they: are what we are interested in are better defined, in models and measurements help determine parameter correlations if used in Bayesian calibration Key question: what data are most informative?

Data have information content, which is additive = +

Process-based modelling of vegetations and uncertainty quantification Marcel van Oijen (CEH-Edinburgh) Statistics for Environmental Evaluation Glasgow,

Similar presentations

Presentation on theme: "Process-based modelling of vegetations and uncertainty quantification Marcel van Oijen (CEH-Edinburgh) Statistics for Environmental Evaluation Glasgow,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Process-based modelling of vegetations and uncertainty quantification Marcel van Oijen (CEH-Edinburgh) Statistics for Environmental Evaluation Glasgow,

Similar presentations

Presentation on theme: "Process-based modelling of vegetations and uncertainty quantification Marcel van Oijen (CEH-Edinburgh) Statistics for Environmental Evaluation Glasgow,"— Presentation transcript:

Similar presentations

About project

Feedback