Which level of model complexity is justified by your data

Slides:



Advertisements
Similar presentations
Pretty-Good Tomography Scott Aaronson MIT. Theres a problem… To do tomography on an entangled state of n qubits, we need exp(n) measurements Does this.
Advertisements

Costs and Benefits.
Approximate Analytical/Numerical Solutions to the Groundwater Transport Problem CWR 6536 Stochastic Subsurface Hydrology.
Combining hydraulic test data for building a site-scale model MACH 1.3 Modélisation des Aquifères Calcaires Hétérogènes Site Expérimental Hydrogéologique.
Bayesian Inference Chris Mathys Wellcome Trust Centre for Neuroimaging UCL SPM Course London, May 12, 2014 Thanks to Jean Daunizeau and Jérémie Mattout.
CS479/679 Pattern Recognition Dr. George Bebis
Biointelligence Laboratory, Seoul National University
Conceptualizing multi- criteria decisions: what’s wrong with weighted averages? Presentation to be given at EURO XXI 2006, Reykjavik, Iceland, July 2006.
Upscaling and effective properties in saturated zone transport Wolfgang Kinzelbach IHW, ETH Zürich.
Aspects of Conditional Simulation and estimation of hydraulic conductivity in coastal aquifers" Luit Jan Slooten.
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
I DENTIFICATION OF main flow structures for highly CHANNELED FLOW IN FRACTURED MEDIA by solving the inverse problem R. Le Goc (1)(2), J.-R. de Dreuzy (1)
Uncertainty analysis and Model Validation.
Results 2 (cont’d) c) Long term observational data on the duration of effective response Observational data on n=50 has EVSI = £867 d) Collect data on.
Interdisciplinary Modeling of Aquatic Ecosystems Curriculum Development Workshop July 18, 2005 Groundwater Flow and Transport Modeling Greg Pohll Division.
Statistical Background
Bayesian Learning Rong Jin.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Multi-Criteria Analysis – compromise programming.
The Calibration Process
Teaching Innovation Project Modelling in the environmental sciences - Enhancing employability for the environmental sector Stefan Krause, Zoe Robinson.
Robin McDougall, Ed Waller and Scott Nokleby Faculties of Engineering & Applied Science and Energy Systems & Nuclear Science 1.
TOPLHCWG. Introduction The ATLAS+CMS combination of single-top production cross-section measurements in the t channel was performed using the BLUE (Best.
A stepwise approximation for estimations of multilevel hydraulic tests in heterogeneous aquifers PRESENTER: YI-RU HUANG ADVISOR: CHUEN-FA NI DATE:
Quantify prediction uncertainty (Book, p ) Prediction standard deviations (Book, p. 180): A measure of prediction uncertainty Calculated by translating.
Bayesian Model Comparison and Occam’s Razor Lecture 2.
Calibration Guidelines 1. Start simple, add complexity carefully 2. Use a broad range of information 3. Be well-posed & be comprehensive 4. Include diverse.
Bayesian and Geostatistical Approaches to Inverse Problems Peter K. Kitanidis Civil and Environmental Engineering Stanford University.
Bringing Inverse Modeling to the Scientific Community Hydrologic Data and the Method of Anchored Distributions (MAD) Matthew Over 1, Daniel P. Ames 2,
Uncertainty Analysis and Model “Validation” or Confidence Building.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
School of something FACULTY OF OTHER “Complementary parameterization and forward solution method” Robert G Aykroyd University of Leeds,
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Approximate Analytical Solutions to the Groundwater Flow Problem CWR 6536 Stochastic Subsurface Hydrology.
Calibration Guidelines 1. Start simple, add complexity carefully 2. Use a broad range of information 3. Be well-posed & be comprehensive 4. Include diverse.
Bayesian evidence for visualizing model selection uncertainty Gordon L. Kindlmann
Reserve Variability – Session II: Who Is Doing What? Mark R. Shapland, FCAS, ASA, MAAA Casualty Actuarial Society Spring Meeting San Juan, Puerto Rico.
Importance Resampling for Global Illumination Justin F. Talbot Master’s Thesis Defense Brigham Young University Provo, UT.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Adaptive Hybrid EnKF-OI for State- Parameters Estimation in Contaminant Transport Models Mohamad E. Gharamti, Johan Valstar, Ibrahim Hoteit European Geoscience.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Eawag: Swiss Federal Institute of Aquatic Science and Technology Mechanism-Based Emulation of Dynamic Simulation Models – Concept and Application in Hydrology.
Indirect imaging of stellar non-radial pulsations Svetlana V. Berdyugina University of Oulu, Finland Institute of Astronomy, ETH Zurich, Switzerland.
Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.
(Z&B) Steps in Transport Modeling Calibration step (calibrate flow & transport model) Adjust parameter values Design conceptual model Assess uncertainty.
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Goal of Stochastic Hydrology Develop analytical tools to systematically deal with uncertainty and spatial variability in hydrologic systems Examples of.
Global Illumination (3) Path Tracing. Overview Light Transport Notation Path Tracing Photon Mapping.
Approximate Analytical Solutions to the Groundwater Flow Problem CWR 6536 Stochastic Subsurface Hydrology.
NASSP Masters 5003F - Computational Astronomy Lecture 4: mostly about model fitting. The model is our estimate of the parent function. Let’s express.
8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield.
Hierarchical Models. Conceptual: What are we talking about? – What makes a statistical model hierarchical? – How does that fit into population analysis?
Bayesian model averaging approach for urban drainage water quality modelling Gabriele Freni Università degli Studi di Enna “Kore”
Ground Water Modeling Concepts
The Calibration Process
Bayesian Averaging of Classifiers and the Overfitting Problem
Bayesian Model Comparison and Occam’s Razor
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
John Lafferty, Chengxiang Zhai School of Computer Science
LECTURE 09: BAYESIAN LEARNING
CS639: Data Management for Data Science
Model selection/averaging for subjectivists
Presentation transcript:

Which level of model complexity is justified by your data Which level of model complexity is justified by your data? A Bayesian answer Anneli Schöniger, Walter A. Illman, Thomas Wöhling, Wolfgang Nowak 22/04/2016, EGU General Assembly 2016

Model Complexity in Subsurface Hydrology When building subsurface flow or transport models, there are abundant choices of conceptualizations, covering wide ranges of model complexity Which level of model complexity should we choose? In the light of data: Which level of model complexity is justified by the available data?

Authors & Motivation of This Study Anneli Schöniger Walter A. Illman Thomas (Eddy) Wöhling Wolfgang Nowak

Bayesian Model Selection / Averaging (BMA) Posterior model probabilities are used as weights for model ranking model selection model averaging uncertainty estimation BMA implicitly follows the principle of parsimony or „Occam‘s razor“: The weights reflect an optimal tradeoff between performance and complexity Calibration data set d Model M1 Model M2 Model M3 Model M4 Draper (1995), Hoeting et al. (1999), Schöniger et al. 2014 (WRR) Posterior model weights P Gull 1988 (Acta Appl. Math.)

Application: Groundwater Flow Modelling Inversion of hydraulic tomography data to infer distribution of aquifer parameters Which parameterization to choose? Too simple vs. too complex Illman et al. 2010 (WRR) Geostatistical Interpolated Zonated Homogeneous Increasing complexity Schöniger et al. 2015 (JoH)

Results of BMA Analysis BMA favours the zonation approach over the more flexible approaches Is it actually closest to the true structure, or just parsimonious (remember the tradeoff implicit in BMA)? Schöniger et al. 2015 (JoH)

How to Interpret Model Weights in Light of Data? For an infinite amount of data, there is no need for a tradeoff Enough information about the system to choose the right model BMA will identify the true complexity (as opposed to other model selection techniques ) For a finite amount of data, identification is harder Difficulty in distinguishing models When in doubt, BMA prefers simpler models (performance – parsimony tradeoff) Test whether all models in the set could be identified (justified) if they were actually the true one Burnham & Anderson 2004

Concept of a Model Confusion Matrix True model Candidate model Schöniger et al. 2015 (JoH)

Model Justifiability as a Function of Data Set Size Increasing complexity Homogeneous Zonated Increasing complexity Interpolated Geostatistical Increasing data set size Schöniger et al. 2015 (JoH)

Results of the Justifiability Analysis The most complex geostatistical model is not yet justifiable with the available data Schöniger et al. 2015 (JoH)

Really... A Zonated Model is Best?! The zonated model is actually a best-case scenario, because it benefits from (almost) perfect geological knowledge How would a geologically uninformed zonated model perform?

Results of Repeated BMA Analysis The performance depends greatly on the quality of prior geological information A combination of geological knowledge and a stochastic description of heterogeneity at smaller scales could be promising Schöniger et al. 2015 (JoH)

Take-Home Messages The proposed model justifiability analysis provides support for model evaluation and model selection in the light of limited data; It is a straightforward extension of Monte Carlo-based Bayesian model averaging (BMA) and helps to interpret BMA results; And it provides information about model (dis-)similarity. Special Issue in Journal of Hydrology: “Groundwater flow and transport in aquifers: Insights from modeling and characterization at the field scale”

Which level of model complexity is justified by your data Which level of model complexity is justified by your data? A Bayesian answer anneli.schoeniger@uni-tuebingen.de