1 MCMC and SMC for Nonlinear Time Series Models Chiranjit Mukherjee STA395 Talk Department of Statistical Science, Duke University February 16, 2009.

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

MCMC estimation in MlwiN
CSCE643: Computer Vision Bayesian Tracking & Particle Filtering Jinxiang Chai Some slides from Stephen Roth.
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Bayesian Estimation in MARK
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Visual Recognition Tutorial
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Sérgio Pequito Phd Student
Nonlinear and Non-Gaussian Estimation with A Focus on Particle Filters Prasanth Jeevan Mary Knox May 12, 2006.
Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.
Evaluating Hypotheses
Particle Filters for Mobile Robot Localization 11/24/2006 Aliakbar Gorji Roborics Instructor: Dr. Shiri Amirkabir University of Technology.
Today Introduction to MCMC Particle filters and MCMC
Comparative survey on non linear filtering methods : the quantization and the particle filtering approaches Afef SELLAMI Chang Young Kim.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.
Sampling Methods for Estimation: An Introduction
Particle Filters.
Gaussian Processes for Transcription Factor Protein Inference Neil D. Lawrence, Guido Sanguinetti and Magnus Rattray.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Component Reliability Analysis
Bayes Factor Based on Han and Carlin (2001, JASA).
Material Model Parameter Identification via Markov Chain Monte Carlo Christian Knipprath 1 Alexandros A. Skordos – ACCIS,
Particle Filtering in Network Tomography
1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.
Queensland University of Technology CRICOS No J Towards Likelihood Free Inference Tony Pettitt QUT, Brisbane Joint work with.
Computer vision: models, learning and inference Chapter 19 Temporal models.
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Priors, Normal Models, Computing Posteriors
SIS Sequential Importance Sampling Advanced Methods In Simulation Winter 2009 Presented by: Chen Bukay, Ella Pemov, Amit Dvash.
Particle Filtering (Sequential Monte Carlo)
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Probabilistic Robotics Bayes Filter Implementations.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
Overview Particle filtering is a sequential Monte Carlo methodology in which the relevant probability distributions are iteratively estimated using the.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.
An Efficient Sequential Design for Sensitivity Experiments Yubin Tian School of Science, Beijing Institute of Technology.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Mobile Robot Localization (ch. 7)
Maximum a posteriori sequence estimation using Monte Carlo particle filters S. J. Godsill, A. Doucet, and M. West Annals of the Institute of Statistical.
Improved Cross Entropy Method For Estimation Presented by: Alex & Yanna.
MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.
Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.
Sequential Monte-Carlo Method -Introduction, implementation and application Fan, Xin
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Introduction to Sampling based inference and MCMC
Advanced Statistical Computing Fall 2016
Latent Variables, Mixture Models and EM
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Ch13 Empirical Methods.
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Presentation transcript:

1 MCMC and SMC for Nonlinear Time Series Models Chiranjit Mukherjee STA395 Talk Department of Statistical Science, Duke University February 16, 2009

2 Outline 1.Problem Statement 2.Markov Chain Monte Carlo Dynamic Linear Models, Forward Filtering Backward Sampling, Nonlinear Models, Mixture of Gaussians, Approximate FFBS as a proposal to Metropolis-Hastings 3.Sequential Monte Carlo Importance Sampling, Sequential Importance Sampling, Optimal Proposal, Resampling, Auxiliary Particle Filters, Parameter Degeneracy, Marginal Likelihood Calculation, Issues with Resampling, Scalability of SMC techniques 4.Minimal Quorum Sensing Model Background, Differential Equations Model, Discretized Version, Features 5.Results 6.Summery 7.References

3 Problem Statement  We will focus on Markovian, nonlinear, non-Gaussian State Space Models: Priors: System Evolution: Observation:  Given the data y 1, y 2, …, y T the objective is to find the following posterior distribution:

4 MCMC Techniques for State Space Models  All primary MCMC algorithm designs perform the following two Gibbs steps in sequence:   Usually can be sampled using a Gibbs step or a Metropolis-Hastings within Gibbs step.  Sampling the latent states x 0:T from the joint conditional posterior is the primary challenge in this task.

5 Dynamic Linear Models [West and Harrison, 1997] where are all known. One can sample from the joint distribution of x 0:T given and using a Forward Filtering Backward Sampling Algorithm. [Carter & Kohn, 1994]

6 Forward Filtering Note that: If are all Gaussian distributions, is also Gaussian. Filtering: 1.Start with 2.For update

7 Backward Sampling Note that: Since and are Gaussian, is also Gaussian.  Sample:  For sample:

8 Nonlinear Dynamic Models where f t, g t are known nonlinear functions and are all known. An approximate FFBS is based on the Taylor Series expansion of the functions: where and

9 Mixture Normal Approximation Filtering: When each of are Normal or mixture of Normals then is also a mixture of Normals. Smoothing: is mixture Normal and is Normal implies is a mixture Normal.

10 Metropolis Step In order to sample from for a general State Space model, we can  Use the approximate FFBS procedure to propose a sample and accept it with a Metropolis-Hastings step.  Let us call this proposal.  One can explicitly write an expression for the joint density as it is a product of Normal Densities or Mixture of Normal Densities.  One accepts the proposed sample with probability:  The main problem is as T increases, the approximation goes bad and the Metropolis acceptance rate falls down quickly.

11 Sequential Monte Carlo

12 Importance Sampling  Objective: Want to sample from ¼ (x) which is difficult. We use an approximate distribution q(x) which is easy to sample from.  For any distribution q(.) such that ¼ (x) > 0 implies q(x) > 0, we have  where

13 A Sequential Importance Sampling Approach Let where Key Idea: If is not too different from then we should be able to reuse our estimate of as an Importance Distribution for. ALGORITHM  Start with sampling from the prior:  Suppose at time (n-1) we have the following particulate approximation:  Update:

14 Updating the IS Approximation  We want to reuse the samples from used to build approximation of.  This only makes sense if  We select :  Unnormalized particle weights are updated in the following way

15 A Simple SIS for State Space Models For a State Space model Let If we use the following proposal: Then

16 Optimal Importance Distribution  The algorithm described above collapses as n increases, because after a few steps only few particles have non-negligible weights.  An optimal zero-variance proposal at time t is simply given by:  For performing SIS in this optimal setting we need for which is not readily available in general.  Instead people deploy a Locally Optimum Importance Distribution, which consists of selecting at time t that minimizes the variance of the importance weights.

17 Locally Optimal Importance Distribution It follows that: and In the case of State Space Models: and

18 Resampling  Even with locally optimal proposal, as time index n increases, the variance of the unnormalized weights {w n (x 0:n )} tend to increase, and all the mass is concentrated on a few particles/samples.  We wish to focus our computational efforts on high-density regions of the space.  IS approximation:  Resample with weights M times: to build the new approximation  Now the samples become statistically dependent, so hard to analyze theoretically. However resampling is a necessary step to avoid particle degeneracy.

19 Resampling With the Locally Optimal Filter a Standard SIS Algorithm would be:  Sample:  Compute weights:  Resample to obtain equal weighted samples An alternative strategy is:  Calculate weights:  Resample to obtain equal weighted samples  Sample:  This algorithm ensures more diverse particles at time n.  Changing of the order can be performed because is independent of x n.

20 Auxiliary Particle Filter  For a general State Space Model it is not always possible to either explicitly sample from or calculate weights  We can use an approximation to, say  In literature it is often suggested to take where is the mean, median or the mode of the distribution  Let: ALGORITHM  Compute weights:  Resample to obtain  Sample:  Calculate weights:

21 Degeneracy Issues  The SMC strategy performs remarkably well in terms of estimating marginals  However the joint distribution is poorly estimated when n is large.  One can not hope to estimate a target distribution with increasing dimension with fixed precision when the number of particles remains fixed.  Since we are interested in the marginal, SMC serves well for our purpose.  For bounded functions Á and p>1, we can expect results of the following form if the model has nice forgetting/mixing properties. M L is increasing in L.

22 Degeneracy in the Parameter Space  All the algorithms we have described so far tries to minimize degeneracy in the state space. Resampling is performed in order to achieve diverse particles for x n.  However we have sampled particles for µ ~ ¼ ( µ ) right in the beginning and the resampling step would reduce the distinct particles of µ as time n increases.  [Liu & West, 2001] suggests using a smooth kernel density for and sample µ particles from the smoothed density to break degeneracy.  Let denote samples from time n posterior (not that µ is time dependent). If [Liu & West, 2001] suggests: where ; and are sample mean and variance of.

23 Liu & West, 2001  The authors suggested shrinkage in order avoid over-dispersion of the smooth kernel density  Choice of h comes from the choice of discounting factor usually  They also recommend using Auxiliary Particle Filtering to improve performance. ALGORITHM 1.For calculate, and 2.Resample with weights 3.Sample: and 4.Evaluate the corresponding weights:

24 Using Sufficient Statistics  Another approach to break particle degeneracy in the parameter space is to use conditional sufficient statistics s t for the parameters.  One can propagate the following joint distribution over time  Usually the conditional sufficient statistic follows a recursive relationship:  One can use any of the algorithms for updating the conditional distribution of the states. For example with the locally optimal importance distribution one should have the following relationship:  Note that unlike smooth kernel density approximation technique to avoid degeneracy, this is an exact technique. So it should be used whenever possible.

25 Marginal Likelihood Calculation  Often times we need to compute the marginal likelihood for model comparison purpose.  For a general State Space Model, marginal likelihood of is:  Note that for the case of Vanilla filter (with the resampling step):

26 Issues with Resampling  The most intuitive resampling scheme is Multinomial resampling. At time n we do where N i = # times particle i is replicated.  has complexity O(M).  has complexity O(M 2 ).  Resampling becomes the bottleneck computation in a SMC procedure if a Multinomial sampler is used. 0 W1W1 W2W2 W3W3 W4W4 W5W5 W6W6 W M-1 W M-2 1

27 A Faster Resampling Scheme Systematic Resampling :  Like Multinomial, but only one random sample  Complexity O(M) 01 W1W1 W2W2 W3W3 W4W4 W5W5 W6W6 W M-1 W M-2

28 Scalability of SMC  Every SMC algorithm has three essential steps: (i) Sampling Step - Generate from (ii) Importance Step – Compute particle weights (iii) Resampling Step – Draw M particles from with probability proportional to weight  Sampling and Importance steps are completely parallelizable without the need of any from of communication between the particles.  Resampling step needs communication while normalizing the weights.  Some algorithms need further communication, like Liu & West need to compute sample mean and variance and.  If we implement a SMC algorithm on a distributed architecture we should transfer some particles from particle surplus processors to particle deficient processors after a resampling step in order to keep the computational load even.

29 Resampling on a Distributed Architecture ALGORITHM (1 master, K slaves)  Each slave processor calculates the total weight of processor k and sends it to the master processor.  Master processor performs Inter-Resampling:  Master processor sends back to processor k.  Each slave processor performs Intra-Resampling: (in parallel)  Particle Routing – to equalize computational load of the processors: -- This depends on the architecture.

30 Minimal Quorum Sensing Model

31 Minimal Quorum Sensing Motif Tanouchi Y, Tu D, Kim J, You L (2008) : “Noise Reduction by Diffusional Dissipation in a Minimal Quorum Sensing Motif”. PLoS Comput Biol 4(8).  Two genes, encoding proteins LuxI and LuxR  LuxI is AHL synthase  AHL freely diffuses in and out of the cell  As cell density increases, AHL density increases in the environment and in the cell  At sufficient high concentration, AHL binds to and activates LuxR  This will in turn activate downstream genes. A i : Intracellular AHL level A e : Extracellular AHL level R: LuxR protein level C: AHL-LuxR complex level

32 Stochastic Differential Equations Model

33 Discretized Model The Stochastic Differential Equation When discretized, will yield the following difference equation: For our Minimal QS Motif the discretized version is the following: where

34 Some Notations Let With these notations our discretized model becomes: Let us use the notation for Then,

35 Some Notations

36 As a State Space Model Systems Equation: We assume that we can observe x t = (A i,t, A e,t, R t, C t )’ with some measurement errors. Let y t denote the observations made for the unknown states x t. Let us represent it as: Observational Equation: where V is unknown. This makes our model fall into the general category of State Space Models :

37 Features of this Model  This model is nonlinear. This model  System evolution variance matrix is not fixed. depends on System evolution variance matrix latent states and parameters.  So the basic assumption for a DLM (that are known) does not hold here.  This does not make any problem is Forward Filtering.  Note that for Backward Sampling the key identity is:  Now has x t appearing in the variance matrix, so is no longer a Gaussian density.

38 MCMC Algorithm  Note that and are linear in the mean.  We have used the following approximation to run a FFBS that approximates the distributions and  As mentioned before, proposed states are accepted with a Metropolis –Hastings acceptance step.   Complete conditional distribution of V is Inverse-Wishart. It is updated using a Gibbs step.  Component parameters of µ appear in. There the complete conditional for µ parameters are NOT Gaussian. We update them using a Random-Walk Metropolis-Hastings step within Gibbs.

39 Synthetic Data We do not have real data. For data simulation:  We use values for parameter µ as suggested in the literature.  For V we’ve made an arbitrary choice.  The choice for A i,0, A e,0, R 0, C 0 are the expected values at a steady state. We have generated synthetic observations y 1, y 2, …, y 999, y 1000

40 Bayesian Analysis Prior Distributions :  Relatively flat Normal distributions truncated over zero for the µ parameters.  Relatively flat Normal distributions truncated over zero for the initial states A i,0, A e,0, R 0, C 0.  Inverse Wishart distribution for the unknown variance matrix V. An Identification Issue : Since the parameters P, V c, V e are involved in the model only through the ratios and we do not have identifiability for all the three parameters. We can only learn about these two ratios. Therefore we use the ratios as model parameters rather than the individual ones.

41 MCMC Results  We have run the MCMC for 10 6 iterations and the following results are from the last 10 5 iterations of the generated Markov Chain.

42 Trace Plots and Autocorrelation Functions

43 SMC Algorithm  We have used Auxiliary Particle Filter to reduce particle degeneracy.  For the observational equation variance matrix V, a sufficient statistics structure exists. We use the sufficient statistics to exactly sample from on each step.  For the parameters in µ no sufficient statistics structure exists. We use the kernel smoothing technique to reduce particle degeneracy in the parameter space.  We have run our particle filters with 10 6 particles.

44 Quantiles Content RED for MCMC GREEN for SMC

45 Title Content RED MCMC GREEN SMC

46 Box Plots of Posterior Samples at time T = 1000 Content RED MCMC GREEN SMC

47 Smoothed Posteriors at time T = 1000 RED MCMC GREEN SMC GREY PRIOR

48 Marginal Likelihood Plot – Model Comparison Content

49 Summery  For nonlinear, non-Gaussian State space models with long time series data MCMC is slow, and has issues with convergence.  Sequential Monte Carlo techniques provide an alternative class of non-iterative algorithms to solve this class of problems.  For a long time series SMC methods suffer from degeneracy issues, particularly while computing entities like Marginal Likelihood.  SMC is scalable, so with enough resources one can imagine of tackling problems with big data which otherwise takes an enormous amount of time to solve with MCMC methods.  Model comparison becomes handy with easy computation of marginal likelihood.

50 References 1.M West. Approximating Posterior Distributions by Mixtures. Journal of Royal Statistical Socity, (55): 409–422, 1993a. 2.M West. Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models. J.H.Newton (ed.), Computing Science and Statistics: Proceedings of 24th Symposium on the Interface, pages 325–333, 1993b. 3.C K Carter and R Cohn. On Gibbs Sampling for State Space Models. Biometrica, 81(3):541–553, August J Liu and M West. Combined Parameter and State Estimation in Simulation-based Filtering. Sequential Monte Carlo Methods in Practice, pages 197–223, P Fearnhead. MCMC, Sufficient Statistics, and Particle Filters. Journal of Computational and Graphical Statistics, (11):848–862, G Storvik. Particle Filters in State Space Models with the Presence of Unknown Static Parameters. IEEE. Trans. of Signal Processing, (50):281–289, 2002.

51 References 7.S J Godsill, A Doucet, and M West. Monte Carlo Smoothing for Nonlinear Time Series. Journal of the American Statistical Association, 99(465):DOI: / , March M Boli´c, P M Djuri´c, and S Hong. New Resampling Algorithms for Particle Filters. IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings, April MBoli´c, PM Djuri´c, and S Hong. Resampling Algorithms for Particle Filters: A Computational Complexity Perspective. EURASIP Journal of Applied Signal Processing, (15):2267–2277, M S Johannes and N Polson. Particle Filtering and Parameter Learning. Social Science Research Network, page March C M Carvalho, M Johannes, H F Lopes, and N Polson. Particle Learning and Smoothing. Working Paper, Y Tanouchi, D Tu, J Kim, and L You. Noise Reduction by Diffusional Dissipation in a Minimal Quorum Sensing Motif. PLoS Computational Biology, 4(8):e doi: /journal.pcbi , August 2008.

52 THANK YOU