MCMC for Stochastic Epidemic Models Philip D. O’Neill School of Mathematical Sciences University of Nottingham.

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Bayesian Estimation in MARK
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)
Markov Chains 1.
Statistical inference for epidemics on networks PD O’Neill, T Kypraios (Mathematical Sciences, University of Nottingham) Sep 2011 ICMS, Edinburgh.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Bayesian Reasoning: Markov Chain Monte Carlo
Bayesian statistics – MCMC techniques
Suggested readings Historical notes Markov chains MCMC details
BAYESIAN INFERENCE Sampling techniques
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Bayesian estimation Bayes’s theorem: prior, likelihood, posterior
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Evaluating Hypotheses
Presenting: Assaf Tzabari
Classical and Bayesian analyses of transmission experiments Jantien Backer and Thomas Hagenaars Epidemiology, Crisis management & Diagnostics Central Veterinary.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Bayes Factor Based on Han and Carlin (2001, JASA).
Queensland University of Technology CRICOS No J Towards Likelihood Free Inference Tony Pettitt QUT, Brisbane Joint work with.
Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas.
Priors, Normal Models, Computing Posteriors
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
Numerical Bayesian Techniques. outline How to evaluate Bayes integrals? Numerical integration Monte Carlo integration Importance sampling Metropolis algorithm.
MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
1 Immunisation with a Partially Effective Vaccine Niels G Becker National Centre for Epidemiology and Population Health Australian National University.
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Introduction to Sampling based inference and MCMC
(joint work with Ai-ru Cheng, Ron Gallant, Beom Lee)
MCMC Output & Metropolis-Hastings Algorithm Part I
Advanced Statistical Computing Fall 2016
Jun Liu Department of Statistics Stanford University
Bayesian inference Presented by Amir Hadadi
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Ch13 Empirical Methods.
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Presentation transcript:

MCMC for Stochastic Epidemic Models Philip D. O’Neill School of Mathematical Sciences University of Nottingham

This includes joint work with… Tom Britton (Stockholm) Niels Becker (ANU, Canberra) Gareth Roberts (Lancaster) Peter Marks (NHS, Derbyshire PCT)

Contents 1. MCMC: overview and basics 2. Example: Vaccine efficacy 3. Data augmentation 4. Example: SIR epidemic model 5. Model choice 6. Example: Norovirus outbreak 7. Other topics

1.Markov chain Monte Carlo (MCMC) Overview and basics The key problem is to explore a density function π known up to proportionality. The output of an MCMC algorithm is a sequence of samples from the correctly normalised π. These samples can be used to estimate summaries of π, e.g. its mean, variance.

How MCMC works Key idea is to construct a discrete time Markov chain X 1, X 2, X 3, … on state space S whose stationary distribution is π. If P(dy,dx) is the transitional kernel of the chain this means that

How MCMC works (2) Subject to some technical conditions, Distribution of X n → π as n →  Thus to obtain samples from π we simulate the chain and sample from it after a “long time”.

Example: π (x)  x e -2x

X N = X 1, X 2, …

Example: π (x)  x e -2x X N = X N+1 =

Example: π (x)  x e -2x X N = X N+1 = X N+2 =

Example: π (x)  x e -2x Suppose Markov chain output is..., X N = , X N+1 = , X N+2 = , ….,X N+M = (i.e. discard initial N values, burn-in)

How to build the Markov chain Surprisingly, there are many ways to construct a Markov chain with stationary distribution π. Perhaps the simplest is the Metropolis- Hastings algorithm.

Metropolis-Hastings algorithm Set an initial value X 1. If the chain is currently at X n = x, randomly propose a new position X n+1 = y according to a proposal density q(y | x). Accept the proposed jump with probability If not accepted, X n+1 = x.

Why the M-H algorithm works Let P(dx,dy) denote the transition kernel of the chain. Then P(dx,dy) is approximately the probability that the chain jumps from a region dx to a region dy. We can calculate P(dx,dy) as follows:

Why M-H works (2)

Why M-H works (3) This last equation shows that π is a stationary distribution for the Markov chain.

Comments on M-H algorithm (1) The choice of proposal q(y|x) is fairly arbitrary. Popular choices include q(y|x) = q(y) (Independence sampler) q(y|x) ~ N(x,  2 ) (Gaussian proposal) q(y|x) = q(|y-x|) (Symmetric proposal)

Comments on M-H (2) In practice, MCMC is almost always used for multi-dimensional problems. Given a target density π(x 1, x 2, …,x n ) it is possible to update each component separately, or even in blocks, using different M-H schemes.

Comments on M-H (3) A popular multi-dimensional scheme is the Gibbs sampler, in which the proposal for a component x i is its full conditional density π (x i | (x 1,…,x i-1, x i+1, …,x n ) )  The M-H acceptance probability is equal to one in this case.

General comments on MCMC How to check convergence? There is no guaranteed way. Visual inspection of trace plots; diagnostic tools (e.g. looking at autocorrelation). Starting values – try a range Acceptance rates – not too large/small Mixing – how fast does the chain move around?

Contents 1. MCMC: overview and basics 2. Example: Vaccine efficacy 3. Data augmentation 4. Example: SIR epidemic model 5. Model choice 6. Example: Norovirus outbreak 7. Other topics

2. Example: Vaccine Efficacy Outbreak of Variola Minor, Brazil 1956 Data on cases in households (size 1 to 12) 338 households: 126 had no cases 1542 individuals: 809 vaccinated, 85 cases 733 unvaccinated, 425 cases Objective: estimate vaccine efficacy

Disease transmission model Population divided into separate households. Divide transmission into community-acquired and within-household. q = P( individual avoids outside infection )  = P ( one individual fails to infect another in the same household )

q  Disease transmission model

Vaccine response model For a vaccinated individual, three responses can occur: complete protection; vaccine failure; or partial protection and infectivity reduction. c = P(complete protection) f = P(vaccine failure) a = proportionate susceptibility reduction b = proportionate infectivity reduction

Vaccine response : (A,B) A convenient way of summarising the random response is to suppose that an individual’s susceptibility and infectivity reduction is given by a bivariate random variable (A,B). Thus P[ (A,B) = (0, -)] = c P[ (A,B) = (1,1) ] = f P[ (A,B) = (a,b) ] = 1-c-f

Efficacy Measures Furthermore it is sensible to define measures of vaccine efficacy using (A,B). VE S = 1- E[A] = 1 - f - a(1-f-c) is a protective measure VE I = 1 - E[AB] / E[A] = 1 - [f + ab(1-f-c)] / [f + a(1-f-c)] is a measure of infectivity reduction Note both are functions of basic model parameters

Bayesian inference Object of inference is the posterior density  (  | n ) =  ( a,b,c,f,q,  | n ) where n is the data set. By Bayes’ Theorem  (  | n )   (n |  )  (  ), where  (  | n ) is the likelihood, and  (  ) is the prior density for .

MCMC details There are six parameters: a,b,c,f,q,  Each parameter has range [0,1] Update each parameter separately using a Metropolis-Hastings step with Gaussian proposal centered on the current value

MCMC pseudocode Initialise parameters (e.g. a = 0.5, b=0.5,…) User input burn-in (B), sample size (S), thinning gap (T) LOOP: counter from –B to (S x T) Update a, update b, …, update  IF (counter > 0) AND (counter/T is integer) THEN store current values END LOOP

Updating details for a Propose ã~ N(a,  2 ) Accept with probability Note that the (symmetric) proposal cancels out The other parameters are updated similarly

Trace plot for a

Density estimate for a

Scatterplot of a versus c

Results for VE S Posterior mean: VE S = 1 – E(A) = 0.84 Posterior Standard Deviation = 0.03 These results are easily obtained using the raw Markov chain output for the model parameters.

Contents 1. MCMC: overview and basics 2. Example: Vaccine efficacy 3. Data augmentation 4. Example: SIR epidemic model 5. Model choice 6. Example: Norovirus outbreak 7. Other topics

4. Data augmentation Suppose we have a model with unknown parameter vector  = (  1,  2,…,  n ). Available data are y = ( y 1, y 2,…, y m ). If the likelihood π (y |  ) is intractable… …one solution is to introduce extra parameters (“missing data”) x = (x 1, x 2,…, x p ) such that π (y, x |  ) is tractable.

Data augmentation (2) The extra parameters x = (x 1, x 2,…, x p ) are simply treated as unknown model parameters as before. To obtain samples from π ( y |  ), take samples from π (y, x |  ) and ignore x. Such a scheme is often easy using MCMC.

Data augmentation (3) Can also add parameters to improve the mixing of the Markov chain (auxiliary variables). Choosing how to augment data is not always obvious!

Contents 1. MCMC: overview and basics 2. Example: Vaccine efficacy 3. Data augmentation 4. Example: SIR epidemic model 5. Model choice 6. Example: Norovirus outbreak 7. Other topics

4. SIR Epidemic Model Suppose we observe daily numbers of cases during an epidemic outbreak in some fixed population. Objective is to say something about infection rates and infectious period duration of the disease.

Epidemic curve (SARS in Canada)

Model definition Population of N individuals At time t there are: S t susceptibles I t infectives R t recovered/immune individuals Thus S t + I t + R t = N for all t. Initially (S 0, I 0,R 0 ) = (N-1,1,0).

Model definition (2) Each infectious individual remains so for a length of time T I ~ Exponential(  ). During this time, infectious contacts occur with each susceptible according to a Poisson process of rate  /N. Thus overall infection rate is  S t I t /N. Two model parameters,  and .

Data, likelihood, augmentation Suppose we observe removals at times 0 ≤ r 1 ≤ r 2 ≤ … ≤ r n ≤ . Define r = ( r 1, r 2, …, r n ). The likelihood of the data, π (r | ,  ), is practically intractable. However, given the (unknown) infection times i = ( i 1, i 2, …, i n ), π (i,r | ,  ) is tractable.

MCMC algorithm Specifically, It follows that if π(  ) ~ Gamma distribution then π(  | …) ~ Gamma distribution also. Same is true for . So can update  and  using a Gibbs step.

MCMC algorithm – infection times It remains to update the infection times i = ( i 1, i 2, …, i n ) Various ways of doing this. A simple way is to use a M-H scheme to randomly move the times. For example, propose a new i k by picking a new time uniformly at random in (0,  ).

Updating infection times I6I6 I6I6 Updating I 2 : Acceptance prob. π (i*,r | ,  ) / π (i,r | ,  ) I4I4 I2I2 I2*I2* I4I4

Extensions Epidemic not known to be finished by  Non-exponential infectious periods Multi-group models (e.g. age-stratified data) More sophisticated updates of infection times Inclusion of latent periods

Contents 1. MCMC: overview and basics 2. Example: Vaccine efficacy 3. Data augmentation 4. Example: SIR epidemic model 5. Model choice 6. Example: Norovirus outbreak 7. Other topics

5. Model Choice Bayesian model choice problems can also be implemented using MCMC. So-called “transdimensional MCMC” (alias “Reversible Jump MCMC”) is used. The basic idea is to construct the Markov chain on the union of the different sample spaces and (essentially) use M-H.

Simple example Model 1 has two parameters: ,  Model 2 has one parameter:  The Markov chain moves between models and within models E.g. X n = (1, , ,  ) for model 1, ignore  Practical question – how to jump between models?

Contents 1. MCMC: overview and basics 2. Example: Vaccine efficacy 3. Data augmentation 4. Example: SIR epidemic model 5. Model choice 6. Example: Norovirus outbreak 7. Other topics

6. Example: Norovirus outbreak Outbreak of gastroenteritis in summer 2001 at school in Derbyshire, England. A single strain of Norovirus virus found to be the causative agent. Believed to be person-to-person spread

Outbreak data 15 classrooms, each child based in one. Absence records plus questionnaires. 492 children of whom 186 were cases. Data include age, period of illness, times of vomiting episodes in classrooms.

Question of interest Does vomiting play a significant role in transmission? Total of 15 vomiting episodes in classrooms.

Epidemic curve in Classroom 10

Stochastic transmission model Assumption: A susceptible on weekday t remains so on day t+1 if they avoid infection from each infective child; per-infective daily avoidance probabilities are classmate : qc schoolmate: qs in class, vomiters : qv

Transmission model (2) At weekends, a susceptible remains so by avoiding infection from all infectives in the community, per-infective avoidance probability is q.

Two models M1 : Full model: qc, qv, qs, q Vomiters treated separately M2 : Sub-model: qc, qv=qc, qs, q Vomiters classed as normal infectives

MCMC algorithm Construct Markov chain on state space S = { ( qc, qs, qv, q, M) } where M = 1 or 2 is the current model Model-switching step to update M Random walk updates for the q’s

Between-model jumps Full model  sub-model: Propose qv = qc Sub-model  full model: Propose qv = qc + N(0,  2 ) Acceptance probabilities straightforward

Results Uniform(0,1) q. priors; P(M1) = 0.5 P(M1 | data) = (Full model) P(M2 | data) = (Sub model) qcqsqvq Mean S. dev

Contents 1. MCMC: overview and basics 2. Example: Vaccine efficacy 3. Data augmentation 4. Example: SIR epidemic model 5. Model choice 6. Example: Norovirus outbreak 7. Other topics

1. Improving algorithm performance 2. Perfect simulation 3. Some conclusions

Improving algorithm performance Choose parameters to reduce correlation Trade-off between ease of computation and mixing behaviour of chain Choice of M-H proposal distributions Choice of blocking schemes

Perfect simulation Detecting convergence can be a real problem in practice. Perfect simulation is a method for constructing a chain that is known to have converged by a certain time. Unfortunately it is far less applicable than MCMC.

Some conclusions MCMC methods are hugely powerful The methods enable analysis of very complicated models Sample-based methods easily permit exploration of both parameters and functions of parameters Implementation is often relatively easy Software available (e.g. BUGS)