Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Introduction to Monte Carlo Markov chain (MCMC) methods
MCMC estimation in MlwiN
Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Bayesian Estimation in MARK
Markov-Chain Monte Carlo
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Chapter 7: Variation in repeated samples – Sampling distributions
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Learning Bayesian Networks (From David Heckerman’s tutorial)
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Lecture II-2: Probability Review
Robin McDougall, Ed Waller and Scott Nokleby Faculties of Engineering & Applied Science and Energy Systems & Nuclear Science 1.
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Chapter 4 – Modeling Basic Operations and Inputs  Structural modeling: what we’ve done so far ◦ Logical aspects – entities, resources, paths, etc. 
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Randomized Algorithms for Bayesian Hierarchical Clustering
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Lecture 2: Statistical learning primer for biologists
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Sampling and estimation Petter Mostad
Reducing MCMC Computational Cost With a Two Layered Bayesian Approach
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
Machine Learning 5. Parametric Methods.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
Gibbs Sampling and Hidden Markov Models in the Event Detection Problem By Marc Sobel.
CS Ensembles and Bayes1 Ensembles, Model Combination and Bayesian Combination.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Modeling and Simulation CS 313
Lecture 1.31 Criteria for optimal reception of radio signals.
MCMC Output & Metropolis-Hastings Algorithm Part I
Probability Theory and Parameter Estimation I
Modeling and Simulation CS 313
Ch3: Model Building through Regression
Markov Networks.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
CS 188: Artificial Intelligence
Parametric Methods Berlin Chen, 2005 References:
Markov Networks.
Berlin Chen Department of Computer Science & Information Engineering
Presentation transcript:

Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of the National Institute of Statistical Sciences

Some activities requiring numerous runs of a complex computer model Output analysis: with random inputs, what is the distribution of output variables? Optimization: finding the optimal setting for process control variables (e.g., signal timing). Design: of computer or field experiments. Bayesian Inference: learning about unknown model parameters or inputs from field data (i.e., data from the process being modeled).

The problem and solution If runs of the computer model are too slow, the activity cannot be completed. The natural solution is to approximate the computer model; most common is approximation by a faster computer model. –models of lower ‘resolution’ –linearized versions of the model –response surface (or Gaussian process) approximations –probability networks of various types.

An Example: Bayesian input analysis for CORSIM The microsimulator CORSIM is a computer model of street and highway traffic. It models vehicles, entering the network and moving according to interaction rules. The traffic network studied consists of a 44- intersection neighborhood in Chicago. CORSIM was applied to model a one-hour period during rush-hour.

Network (Chicago) Erie Ohio Ontario Illinois Grand Kingsbury Orleans Franklin Wells LaSalleClark Dearborn Hubbard Huron O’Hare LOOP

Key Unknown Inputs Demands, : the means of exponential inter-arrival time distributions that determine the (random) numbers of vehicles that enter the system from external streets. is 16-dimensional. Turning probabilities, P: the probabilities that vehicles turn right, left, or go through each intersection. P is 84-dimensional.

Data: vehicle counts, C Demand counts: the numbers of vehicles entering the network at each street, recorded by observers placed on the external streets. Turning counts: made by observers over short time intervals at all intersections. Video counts: At central intersections, cameras were placed that produced an exact count of vehicles.

Problems with the Data Demand counts are inaccurate, some as much as 40%. Turning counts were made over short time periods. Some of the turning counts were missing. The observer counts were incompatible with the video counts (reality) so they were tuned to bring them into accordance.

Example of a tuning adjustment Ontario LaSalle Erie Observer reported 1969 vehicles entering here. This was adjusted to 1790 vehicles to fit the observed video count here.

Problems with tuning Often, too few inputs are tuned, and those that are tuned are then over-tuned. The often considerable uncertainty in the tuned inputs is ignored, resulting in overly optimistic assessment of output variance. Tuning can mask model biases that actually exist, making the model less accurate for prediction outside the range of the data (not applicable here).

A solution: Bayesian analysis Compute the posterior distribution of the true model inputs, given the data. But this typically requires use of Markov chain Monte Carlo (MCMC) methods, involving thousands of model runs; too time consuming for CORSIM. Thus a fast simulator is needed, one which represents those features of CORSIM that allow the data to be related to model inputs.

Structure of the fast simulator It is a probability network –with the same nodal structure as CORSIMS; –with unknown inputs (vehicle inter-arrival rates) and P (turning probabilities) that mean the same as in CORSIM; –but, with ‘instantaneous’ vehicles, that (i) enter the network; (ii) turn appropriately; (iii) exit. Note: fast simulators often have a limited purpose, and are not general replacements for the computer model; here, we ignore the key features of time, interactions, signals, etc.

Modeling the demand counts data Demand counts: Each demand count, C i D, is modelled by a Poisson distribution with mean b i N i, where N i is the true count and b i -1 is the unknown “observer bias.” The b i are modelled as being i.i.d. Gamma( ,  ), with  <2  (so that the expected bias is less than 100%), but are otherwise unknown, and assigned a uniform prior distribution.

Modeling the turning counts data If N i vehicles arrive at an intersection from a given direction, the numbers turning right, left, and going through, (N iR, N iL, N iT ), are assumed to follow a multinomial distribution with probabilities (P iR, P iL, P iT ). The (P iR, P iL, P iT ) are assigned the Jeffreys prior distribution  (P iR P iL P iT ) -1/2. The observed turning counts, C i T, were assumed to be accurate.

Latent Variables and Restrictions Introduce ‘latent’ N i, counts on all streets: –the total number of vehicles entering an intersection must equal the number leaving; –the video counts, assumed to be accurate, lead to known values of some sums of these N i ; Eliminate ‘excess’ N i (from an initial ?? to 74), in such a way that the restrictions have a simple structure. (Poster by G. Molina.) Let N denote the constrained region of N i.

The posterior distribution By Bayes theorem, the posterior distribution,  ( N,, P, b, ,  | C), of all unknowns given the data C, is simply proportional to the product of the likelihood and the prior, i.e. f Poisson (C D | N D, b) f multinomial (C T | P)   multinomial (N | P)  Poisson (N D | )   Jeffreys (P, )  Gamma (b | ,  ) 1  1 N.

Computation The posterior has 192 unknown parameters. Computation must be done by MCMC. We utilize a Gibbs sampling scheme. –The full conditional distributions for P,, b, and  are, respectively, Dirichlet, Gamma, Gamma, and restricted Gamma; these are easy to sample. –  has a log-concave density; rejection sampling –Each N i is sampled directly from its discrete distribution (restricted range). –Roughly 100,000 iterations needed.

Gridlock and model constraints In CORSIM, gridlock (all vehicles stopped) can occur (20% of the runs in last graph). This essentially defines the unfeasibility region, , of the parameter space. This can be handled in CORSIM by simply ignoring runs that yield gridlock (in the Bayesian inference, this corresponds to multiplying the posterior by 1  ).

Conclusions ‘Tuning’ should be replaced by Bayesian inference for unknown parameters or inputs. It may be necessary to constrain the parameter space by ignoring model runs that lie outside the unfeasibility region. If evaluation of the computer model is too slow, fast simulators should be sought for which Bayesian inference is feasible.