Case studies in Gaussian process modelling of computer codes for carbon accounting Marc Kennedy, Clive Anderson, Stefano Conti, Tony OHagan.

Slides:



Advertisements
Similar presentations
Rachel T. Johnson Douglas C. Montgomery Bradley Jones
Advertisements

Key sources of uncertainty in forest carbon inventories Raisa Mäkipää with Mikko Peltoniemi, Suvi Monni, Taru Palosuo, Aleksi Lehtonen & Ilkka Savolainen.
Bayesian tools for analysing and reducing uncertainty Tony OHagan University of Sheffield.
1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.
MCMC estimation in MlwiN
Chapter 7 Sampling and Sampling Distributions
S A L T M O D A computer program for the prediction of the salinity of soil moisture, ground water and drainage water, the depth of the water table, and.
1 Slides revised The overwhelming majority of samples of n from a population of N can stand-in for the population.
Southampton workshop, July 2009Slide 1 Tony O’Hagan, University of Sheffield Simulators and Emulators.
Quantifying and managing uncertainty with Gaussian process emulators Tony O’Hagan University of Sheffield.
Sensitivity Analysis in GEM-SA Jeremy Oakley. Example  ForestETP vegetation model – 7 input parameters – 120 model runs  Objective: conduct a variance-based.
Statistical Analysis SC504/HS927 Spring Term 2008
Insert Date HereSlide 1 Using Derivative and Integral Information in the Statistical Analysis of Computer Models Gemma Stephenson March 2007.
Getting started with GEM-SA Marc Kennedy. This talk  Starting GEM-SA program  Creating input and output files  Explanation of the menus, toolbars,
Slide 1 John Paul Gosling University of Sheffield GEM-SA: a tutorial.
Running a model's adjoint to obtain derivatives, while more efficient and accurate than other methods, such as the finite difference method, is a computationally.
DTU Informatics Introduction to Medical Image Analysis Rasmus R. Paulsen DTU Informatics TexPoint fonts.
Multiple Regression and Model Building
Commonly Used Distributions
Probabilistic Reasoning over Time
Sensitivity Analysis in GEM-SA. GEM-SA course - session 62 Example ForestETP vegetation model 7 input parameters 120 model runs Objective: conduct a variance-based.
Uncertainty Analysis Using GEM-SA. GEM-SA course - session 42 Outline Setting up the project Running a simple analysis Exercise More complex analyses.
Validating uncertain predictions Tony O’Hagan, Leo Bastos, Jeremy Oakley, University of Sheffield.
Gaussian Processes I have known
MARLAP Measurement Uncertainty
Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.
1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi.
A Two Level Monte Carlo Approach To Calculating
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Inference for regression - Simple linear regression
Gaussian process modelling
Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.
Confidence Intervals for the Mean (σ known) (Large Samples)
29 May 2008IMA Scottish Branch1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan University of Sheffield.
1 Remote Sensing and Image Processing: 9 Dr. Hassan J. Eghbali.
A process-based, terrestrial biosphere model of ecosystem dynamics (Hybrid v. 3.0) A. D. Friend, A.K. Stevens, R.G. Knox, M.G.R. Cannell. Ecological Modelling.
Improving the representation of large carbon pools in ecosystem models Mat Williams (Edinburgh University) John Grace (Edinburgh University) Andreas Heinemeyer.
By: Karl Philippoff Major: Earth Sciences
Slide 1 Marc Kennedy, Clive Anderson, Anthony O’Hagan, Mark Lomas, Ian Woodward, Andreas Heinemayer and John Paul Gosling Uncertainty in environmental.
CO 2 - Net Ecosystem Exchange and the Global Carbon Exchange Question Soil respiration chamber at College Woods near Durham New Hampshire. (Complex Systems.
17 May 2007RSS Kent Local Group1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan CTCD, Sheffield.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.
BIOPHYS: A Physically-based Algorithm for Inferring Continuous Fields of Vegetative Biophysical and Structural Parameters Forrest Hall 1, Fred Huemmrich.
5-1 ANSYS, Inc. Proprietary © 2009 ANSYS, Inc. All rights reserved. May 28, 2009 Inventory # Chapter 5 Six Sigma.
Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat.
Using data assimilation to improve estimates of C cycling Mathew Williams School of GeoScience, University of Edinburgh.
FastOpt Quantitative Design of Observational Networks M. Scholze, R. Giering, T. Kaminski, E. Koffi P. Rayner, and M. Voßbeck Future GHG observation WS,
Slide 1 Marc Kennedy, Clive Anderson, Anthony O’Hagan, Mark Lomas, Ian Woodward, Andreas Heinemayer and John Paul Gosling Quantifying uncertainty in the.
Analyzing wireless sensor network data under suppression and failure in transmission Alan E. Gelfand Institute of Statistics and Decision Sciences Duke.
Goal: to understand carbon dynamics in montane forest regions by developing new methods for estimating carbon exchange at local to regional scales. Activities:
1 Hadley Centre for Climate Prediction and Research Vegetation dynamics in simulations of radiatively-forced climate change Richard A. Betts, Chris D.
Estimating the Reduction in Photosynthesis from Sapflow Data in a Throughfall Exclusion Experiment. Rosie Fisher 1, Mathew Williams 1, Patrick Meir 1,
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Chapter Confidence Intervals 1 of 31 6  2012 Pearson Education, Inc. All rights reserved.
Introduction to emulators Tony O’Hagan University of Sheffield.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield.
Bayesian analysis of a conceptual transpiration model with a comparison of canopy conductance sub-models Sudeep Samanta Department of Forest Ecology and.
Marc Kennedy, Tony O’Hagan, Clive Anderson,
CO2 sources and sinks in China as seen from the global atmosphere
Ruth Doherty, Edinburgh University Adam Butler & Glenn Marion, BioSS
3-PG The Use of Physiological Principles in Predicting Forest Growth
Ch3: Model Building through Regression
Ecosystem Demography model version 2 (ED2)
CHAPTER 29: Multiple Regression*
Adam Butler & Glenn Marion, Biomathematics & Statistics Scotland •
Presentation transcript:

Case studies in Gaussian process modelling of computer codes for carbon accounting Marc Kennedy, Clive Anderson, Stefano Conti, Tony OHagan

Talk Outline Centre for Terrestrial Carbon Dynamics Computer Models in CTCD Bayesian emulators Case Study 1: SPA Case Study 2: SDGVM

Centre for Terrestrial Carbon Dynamics The CTCD… is a NERC centre of excellence for Earth Observation made up of groups from Sheffield, York, Edinburgh, UCL, Forest Research brings together experts in vegetation modelling, soil science, earth observation, carbon flux measurement and statistics

Net Ecosystem Production Plant respiration Photosynthesis Gain Loss Soil respiration Loss – Terrestrial carbon source if NEP is negative – Terrestrial carbon sink if NEP is positive

Computer Models in CTCD SPA – Simulates plant processes at 30-minute time intervals ForestETP – Stand scale – Localised modelling SDGVM – Global scale – Coarse resolution

Statistical objectives within CTCD Contribute to the development of these models – through model testing using sensitivity analysis Identify the greatest sources of uncertainty Correctly reflect the uncertainty in predictions – Uncertainty analysis: propagating the parameter uncertainty through the model

Bayesian Emulation of Models Model output is an unknown function of its inputs – Convenient prior is a Gaussian process – Run code at set of well chosen input points – Obtain posterior distribution The emulator is the posterior distribution of the output – Fast approximation – Measure of uncertainty – Nice analytical form for further analysis

Case study 1: Soil Plant Atmosphere (SPA) Model SPA is a fine scale model created by Mat Williams – Aggregated SPA outputs were used to create the simpler up-scaled model (ACM: the Aggregated Canopy Model) by fitting a set of simple equations with 9 parameters Can an emulator do any better than ACM as an approximation to SPA?

ACM vs. Emulator for predicting SPA Bayesian emulator created using only 150 of the total 6561 points used to create ACM Predicted remaining 6411 SPA points using emulator and ACM – Compare Root Mean Square Errors (RMSE)

SPA Predictions Emulator Predictions RMSE = using emulator ACM Predictions RMSE = using ACM

Case Study 2: Sheffield Dynamic Global Vegetation Model SDGVM is a point model – each pixel represents an area, with an associated vegetation type / land use Vegetation type is described using 14 plant functional type parameters SDGVM is constantly being developed – To improve process modelling – To incorporate more detailed driving data

Plant Functional Type inputs Examples: Leaf life span Leaf area Temperature when bud bursts Temperature when leaf falls Wood density Maximum carbon storage Xylem conductivity Emulator will allow small groups of inputs to vary, others fixed at original default values

Soil inputs Soil clay % Soil sand % Soil depth Bulk density

Emulator for SDGVM Built an emulator for the NEP output of SDGVM – 80 runs in the 5-dimensional input space were used as training data – A maximin Latin hypercube design was used to ensure even coverage of the input space. Plant scientists specified the ranges Run code … …

Model testing: Sensitivity analysis We use sensitivity analysis for model checking and for model interpretation Calculate main effects of each code input – How does output change if we vary the input, averaged over other inputs? Building the emulator has uncovered bugs – simply by trying different combinations of input values

Main Effect: Leaf life span

Main Effect: Leaf life span (updated)

Main Effect: Senescence Temperature

Main Effects: Soil inputs Soil inputs had been fixed in SDGVM Output sensitive to sand content, but not clay content, over these ranges More detailed soil input data are now used

Error discovered in the soil module NEP Before…After… Bulk density

SDGVM: new sensitivity analysis We initially analysed uncertainty in the NEP output at a single test site, using rough ranges for the 14 plant functional type parameters Assumed default (uniform) probability distributions for the parameters The aim here is to identify the greatest potential sources of uncertainty

NEP (g/m 2 /y)

Leaf life span 69.1% Minimum growth rate 14.2% Water potential 3.4% Maximum age 1.0%

Plant Functional Type parameters Uncertainty is driven by just a few key parameters – Maximum age – Leaf life span – Water potential – Minimum growth rate The next step was to refine the rough probability distributions for these parameters

Elicitation We elicited formal probability distributions for the key parameters – based on discussion with Ian Woodward – representing his uncertainty about their values within the UK – noting that each really applies as an average over the species actually present in a given pixel

Leaf life span (days)Minimum growth rate (m) Maximum age (years)Water potential (M Pa)

Leaf life span 69.1% Minimum growth rate 14.2% Water potential 3.4% Maximum age 1.0% Mean NEP = 174 gCm -2 Std deviation = gCm -2 Mean NEP = 163 gCm -2 Std deviation = gCm -2 Uniform probability distributionsRefined probability distributions

Uncertainty analysis at sample sites We computed uncertainty analyses on NEP outputs from SDGVM for 9 sites/pixels NEP Stockten on the Forest (Nr York) Milton Keynes Barnstaple (Devon) Keswick (Lake District) Lowland (Scotland) Dartmoor New Forest (Hampshire) Kielder S. Ballater (Scotland)

Uncertainty is clearly substantial, even when we only take account of uncertainty in these parameters The most important parameter is minimum growth rate, which accounts for typically at least 60% of overall NEP uncertainty – This suggests targeting this parameter for research Seeding density?

Ongoing work We need to estimate uncertainty in the overall UK carbon budget – Developing new theory for aggregating uncertainty over many pixels Windows software will be made available later this year