14 May 2008RSS Oxford1 Towards quantifying the uncertainty in carbon fluxes Tony O’Hagan University of Sheffield.

Slides:



Advertisements
Similar presentations
Case studies in Gaussian process modelling of computer codes for carbon accounting Marc Kennedy, Clive Anderson, Stefano Conti, Tony OHagan.
Advertisements

Bayesian tools for analysing and reducing uncertainty Tony OHagan University of Sheffield.
Uncertainty and Sensitivity Analysis of Complex Computer Codes
Southampton workshop, July 2009Slide 1 Tony O’Hagan, University of Sheffield Simulators and Emulators.
Quantifying and managing uncertainty with Gaussian process emulators Tony O’Hagan University of Sheffield.
Emulators and MUCM. Outline Background Simulators Uncertainty in model inputs Uncertainty analysis Case study – dynamic vegetation simulator Emulators.
SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models.
Durham workshop, July 2008Slide 1 Tony O’Hagan, University of Sheffield MUCM: An Overview.
Project leader’s report MUCM Advisory Panel Meeting, November 2006.
Insert Date HereSlide 1 Using Derivative and Integral Information in the Statistical Analysis of Computer Models Gemma Stephenson March 2007.
Simulators and Emulators Tony O’Hagan University of Sheffield.
Running a model's adjoint to obtain derivatives, while more efficient and accurate than other methods, such as the finite difference method, is a computationally.
Validating uncertain predictions Tony O’Hagan, Leo Bastos, Jeremy Oakley, University of Sheffield.
Testing hydrological models as hypotheses: a limits of acceptability approach and the issue of disinformation Keith Beven, Paul Smith and Andy Wood Lancaster.
W. McNair Bostick, Oumarou Badini, James W. Jones, Russell S. Yost, Claudio O. Stockle, and Amadou Kodio Ensemble Kalman Filter Estimation of Soil Carbon.
Gaussian Processes I have known
Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.
Climate case study. Outline The challenge The simulator The data Definitions and conventions Elicitation Expert beliefs about climate parameters Expert.
Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.
Point estimation, interval estimation
Evaluating Hypotheses
Results 2 (cont’d) c) Long term observational data on the duration of effective response Observational data on n=50 has EVSI = £867 d) Collect data on.
Value of Information for Complex Economic Models Jeremy Oakley Department of Probability and Statistics, University of Sheffield. Paper available from.
The Calibration Process
Uncertainty in Engineering - Introduction Jake Blanchard Fall 2010 Uncertainty Analysis for Engineers1.
Lecture II-2: Probability Review
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Gaussian process modelling
Data Collection & Processing Hand Grip Strength P textbook.
Calibration and Model Discrepancy Tony O’Hagan, MUCM, Sheffield.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Error Analysis Accuracy Closeness to the true value Measurement Accuracy – determines the closeness of the measured value to the true value Instrument.
Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.
Julian Center on Regression for Proportion Data July 10, 2007 (68)
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
29 May 2008IMA Scottish Branch1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan University of Sheffield.
Slide 1 Marc Kennedy, Clive Anderson, Anthony O’Hagan, Mark Lomas, Ian Woodward, Andreas Heinemayer and John Paul Gosling Uncertainty in environmental.
17 May 2007RSS Kent Local Group1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan CTCD, Sheffield.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Integration of biosphere and atmosphere observations Yingping Wang 1, Gabriel Abramowitz 1, Rachel Law 1, Bernard Pak 1, Cathy Trudinger 1, Ian Enting.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Slide 1 Marc Kennedy, Clive Anderson, Anthony O’Hagan, Mark Lomas, Ian Woodward, Andreas Heinemayer and John Paul Gosling Quantifying uncertainty in the.
Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.
Biases in land surface models Yingping Wang CSIRO Marine and Atmospheric Research.
- 1 - Calibration with discrepancy Major references –Calibration lecture is not in the book. –Kennedy, Marc C., and Anthony O'Hagan. "Bayesian calibration.
Options and generalisations. Outline Dimensionality Many inputs and/or many outputs GP structure Mean and variance functions Prior information Multi-output,
Review of statistical modeling and probability theory Alan Moses ML4bio.
Inverse Modeling of Surface Carbon Fluxes Please read Peters et al (2007) and Explore the CarbonTracker website.
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Downscaling of European land use projections for the ALARM toolkit Joint work between UCL : Nicolas Dendoncker, Mark Rounsevell, Patrick Bogaert BioSS:
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Introduction to emulators Tony O’Hagan University of Sheffield.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
1 Life Cycle Assessment A product-oriented method for sustainability analysis UNEP LCA Training Kit Module k – Uncertainty in LCA.
8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield.
Stats Methods at IC Lecture 3: Regression.
Marc Kennedy, Tony O’Hagan, Clive Anderson,
MECH 373 Instrumentation and Measurements
Confidence Intervals Cont.
Ch3: Model Building through Regression
The Calibration Process
Statistical Methods For Engineers
Introduction to Instrumentation Engineering
Filtering and State Estimation: Basic Concepts
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Presentation transcript:

14 May 2008RSS Oxford1 Towards quantifying the uncertainty in carbon fluxes Tony O’Hagan University of Sheffield

14 May 2008RSS Oxford2 Outline The carbon flux problem Quantifying input uncertainties Propagating uncertainty Results

14 May 2008RSS Oxford3 Computer models In almost all fields of science, technology, industry and policy making, people use mechanistic models to describe complex real- world processes For understanding, prediction, control There is a growing realisation of the importance of uncertainty in model predictions Can we trust them? Without any quantification of output uncertainty, it’s easy to dismiss them

14 May 2008RSS Oxford4 Examples Climate prediction Molecular dynamics Nuclear waste disposal Oil fields Engineering design Hydrology

14 May 2008RSS Oxford5 Carbon flux Vegetation can be a major factor in mitigating the increase of CO 2 in the atmosphere And hence reducing the greenhouse effect Through photosynthesis, plants take atmospheric CO 2 Carbon builds new plant material and O 2 is released But some CO 2 is released again Respiration, death and decay The net reduction of CO 2 is called Net Biosphere Production (NBP) I will refer to it as the carbon flux Complex processes modelled in SDGVM Sheffield Global Dynamic Vegetation Model

14 May 2008RSS Oxford6 CTCD The Centre for Terrestrial Carbon Dynamics was a NERC Centre of Excellence Now part of National Centre for Earth Observation One major exercise was to estimate the carbon flux from vegetation in England and Wales in 2000 SDGVM run at each of 707 sites over England & Wales 4 plant functional types (PFTs) Principal output is NBP Many inputs

14 May 2008RSS Oxford7 SDGVM C flux outputs for 2000 Map of SDGVM estimates shows positive flux (C sink) in North, but negative (C source) in Midlands Total estimated flux is 9.06 Mt C Highly dependent on weather, so will vary greatly between years

14 May 2008RSS Oxford8 Accounting for uncertainty There are several sources of uncertainty Uncertain inputs PFT parameters, defining plant growth etc Soil structure Land cover types Weather Model structure All models are wrong! Two main challenges Formally quantifying these uncertainties Propagating input uncertainty through the model

14 May 2008RSS Oxford9 Progress to date A paper dealing with uncertainty in plant functional inputs and soil inputs Kennedy, O'Hagan, Anderson et al (2008). Quantifying uncertainty in the biospheric carbon flux for England and Wales. J. Royal Statistical Society A 171, A paper showing how to quantify uncertainty in land cover Cripps, O'Hagan, Quaife and Anderson (2008). Modelling uncertainty in satellite derived land cover maps. Recent work combines these Still need to account for uncertainty in weather and model structure

14 May 2008RSS Oxford10 Quantifying input uncertainties Plant functional type parameters Expert elicitation Soil composition Simple analysis from extensive data Land cover More complex analysis of ‘confusion matrix’ data

14 May 2008RSS Oxford11 Elicitation Beliefs of expert (developer of SDGVMd) regarding plausible values of PFT parameters Four PFTs – Deciduous broadleaf (DBL), evergreen needleleaf (ENL), crops, grass Many parameters for each PFT Key ones identified by preliminary sensitivity analysis Important to allow for uncertainty about mix of species in a site and role of parameter in the model In the case of leaf life span for ENL, this was more complex

14 May 2008RSS Oxford12 ENL leaf life span

14 May 2008RSS Oxford13 Correlations PFT parameter value at one site may differ from its value in another Because of variation in species mix Common uncertainty about average over all species induces correlation Elicit beliefs about average over whole UK ENL joint distributions are mixtures of 25 components, with correlation both between and within years

14 May 2008RSS Oxford14 Soil composition Percentages of sand, clay and silt, plus bulk density Soil map available at high resolution Multiple values in each SDGVM site Used to form average (central estimate) And to assess uncertainty (variance) Augmented to allow for uncertainty in original data (expert judgement) Assumed independent between sites

14 May 2008RSS Oxford15 Land cover map LCM2000 is another high resolution map Obtained from satellite images Vegetation in each pixel assigned to one of 26 classes Aggregated to give proportions of each PFT at each site But data are uncertain Field data are available at a sample of pixels Countryside Survey 2000 Table of CS2000 class versus LCM2000 class is called the confusion matrix

14 May 2008RSS Oxford16 CS2000 versus LCM2000 matrix Not symmetric Rather small numbers Bare is not a PFT and produces zero NBP LCM2000 CS2000 DBLENLGrassCropBare DBL Enl Grass Crop Bare203881

14 May 2008RSS Oxford17 Modelling land cover The matrix tells us about the probability distribution of LCM2000 class given the true (CS2000) class Subject to sampling errors But we need the probability distribution of true PFT given observed PFT Posterior probabilities as opposed to likelihoods We need a prior distribution for land cover We used observations in a neighbourhood Implicitly assuming an underlying smooth random field And the confusion matrix says nothing about spatial correlation of LCM2000 errors We again relied on expert judgement Using a notional equivalent number of independent pixels per site

14 May 2008RSS Oxford18 Overall proportions Red lines show LCM2000 proportions Clear overall biases Analysis gives estimates for all PFTs in each SDGVM site With variances and correlations

14 May 2008RSS Oxford19 Propagating uncertainty Uncertainty analysis Problems with simple Monte Carlo approach Emulation Gaussian process emulation The MUCM project

14 May 2008RSS Oxford20 Uncertainty analysis We have a computer model that produces output y = f (x) when given input x But for a particular application we do not know x precisely So X is a random variable, and so therefore is Y = f (X ) We are interested in the uncertainty distribution of Y How can we compute it?

14 May 2008RSS Oxford21 Monte Carlo The usual approach is Monte Carlo Sample values of x from its distribution Run the model for all these values to produce sample values y i = f (x i ) These are a sample from the uncertainty distribution of Y Typically requires thousands of samples of input parameters And in this case we would need to run SDGVM 4x707 times for each sample! Neat but impractical if it takes minutes or hours to run the model

14 May 2008RSS Oxford22 Emulation A computer model encodes a function, that takes inputs and produces outputs An emulator is a statistical approximation of that function Estimates what outputs would be obtained from given inputs With statistical measure of estimation error Given enough training data, estimation error variance can be made small

14 May 2008RSS Oxford23 So what? A good emulator estimates the model output accurately with small uncertainty and runs “instantly” So we can do uncertainty analysis etc fast and efficiently Conceptually, we use model runs to learn about the function then derive any desired properties of the model

14 May 2008RSS Oxford24 GP solution Treat f (.) as an unknown function with Gaussian process (GP) prior distribution Use available runs as observations without error, to derive posterior distribution (also GP) Make inference about the uncertainty distribution E.g. The mean of Y is the integral of f (x) with respect to the distribution of X Its posterior distribution is normal conditional on GP parameters

14 May 2008RSS Oxford25 Why GP emulation? Simple regression models can be thought of as emulators But error estimates are invalid We use Gaussian process emulation Nonparametric, so can fit any function Error measures can be validated Analytically tractable, so can often do uncertainty analysis etc analytically Highly efficient when many inputs Reproduces training data correctly

14 May 2008RSS Oxford26 2 code runs Consider one input and one output Emulator estimate interpolates data Emulator uncertainty grows between data points

14 May 2008RSS Oxford27 3 code runs Adding another point changes estimate and reduces uncertainty

14 May 2008RSS Oxford28 5 code runs And so on

14 May 2008RSS Oxford29 BACCO This has led to a wide ranging body of tools for inference about all kinds of uncertainties in computer models All based on building the GP emulator of the model from a set of training runs This area is now known as BACCO Bayesian Analysis of Computer Code Output

14 May 2008RSS Oxford30 BACCO includes Uncertainty analysis Sensitivity analysis Calibration Data assimilation Model validation Optimisation Etc… All within a single coherent framework

14 May 2008RSS Oxford31 MUCM Managing Uncertainty in Complex Models Large 4-year research grant Started in June postdoctoral research assistants 4 PhD studentships Based in Sheffield, Durham, Aston, Southampton, LSE Objective: to develop BACCO methods into a robust technology that is widely applicable across the spectrum of modelling applications

14 May 2008RSS Oxford32 Emulation of SDGVM We built GP emulators of all 4 PFTs at 30 of the 707 sites Estimates (posterior means) and uncertainties (variances and covariances) inter-/extrapolated to the other sites by kriging Uncertainty due to both emulation and kriging separately accounted for

14 May 2008RSS Oxford33 Sensitivity analysis for one site/PFT Used to identify the most important inputs. These are the ones we needed to formulate uncertainty about carefully.

14 May 2008RSS Oxford34 Results

14 May 2008RSS Oxford35 Mean NBP corrections

14 May 2008RSS Oxford36 NBP standard deviations

14 May 2008RSS Oxford37 Aggregate across 4 PFTs Mean NBPStandard deviation

14 May 2008RSS Oxford38 England & Wales aggregate PFT Plug-in estimate (Mt C) Mean (Mt C) Variance (Mt C 2 ) Grass Crop Deciduous Evergreen Covariances Total

14 May 2008RSS Oxford39 Sources of uncertainty The total variance of is made up as follows Variance due to PFT and soil inputs = Variance due to land cover uncertainty = Variance due to interpolation/emulation = Land cover uncertainty much larger for individual PFT contributions Dominates for ENL But overall tends to cancel out Changes estimates Larger mean corrections and smaller overall uncertainty

14 May 2008RSS Oxford40 Conclusions Bayesian methods offer a powerful basis for computation of uncertainties in model predictions Analysis of E&W aggregate NBP in 2000 Good case study for uncertainty and sensitivity analyses But need to take account of remaining sources of uncertainty Involved several technical extensions Has important implications for our understanding of C fluxes Policy implications

14 May 2008RSS Oxford41 Finally This was joint work with many others Plant, soil and earth observation – Shaun Quegan, Ian Woodward, Mark Lomas, Tristan Quaife, Andreas Heinemeyer, Phil Ineson Statistics – Marc Kennedy, John Paul Gosling, Ed Cripps, Keith Harris, Clive Anderson Links