MCMC Methods in Harmonic Models Simon Godsill Signal Processing Laboratory Cambridge University Engineering Department www-sigproc.eng.cam.ac.uk/~sjg.

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Bayesian Estimation in MARK
Efficient Cosmological Parameter Estimation with Hamiltonian Monte Carlo Amir Hajian Amir Hajian Cosmo06 – September 25, 2006 Astro-ph/
Introduction of Markov Chain Monte Carlo Jeongkyun Lee.
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Markov-Chain Monte Carlo
Markov Chains Modified by Longin Jan Latecki
Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)
2008 SIAM Conference on Imaging Science July 7, 2008 Jason A. Palmer
Markov Chains 1.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Bayesian Reasoning: Markov Chain Monte Carlo
Bayesian statistics – MCMC techniques
Suggested readings Historical notes Markov chains MCMC details
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
BAYESIAN INFERENCE Sampling techniques
Exact Inference (Last Class) variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
1 CE 530 Molecular Simulation Lecture 8 Markov Processes David A. Kofke Department of Chemical Engineering SUNY Buffalo
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Today Introduction to MCMC Particle filters and MCMC
Bayesian Analysis for Extreme Events Pao-Shin Chu and Xin Zhao Department of Meteorology School of Ocean & Earth Science & Technology University of Hawaii-
Monte Carlo Methods in Partial Differential Equations.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Bayes Factor Based on Han and Carlin (2001, JASA).
Gaussian process modelling
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Priors, Normal Models, Computing Posteriors
Particle Filtering (Sequential Monte Carlo)
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Module 1: Statistical Issues in Micro simulation Paul Sousa.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
Exact Inference (Last Class) Variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
Bayesian Generalized Kernel Mixed Models Zhihua Zhang, Guang Dai and Michael I. Jordan JMLR 2011.
MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
A tutorial on Markov Chain Monte Carlo. Problem  g (x) dx I = If{X } form a Markov chain with stationary probability  i  I  g(x ) i  (x ) i  i=1.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.
Markov Chain Monte Carlo in R
Introduction to Sampling based inference and MCMC
MCMC Output & Metropolis-Hastings Algorithm Part I
Advanced Statistical Computing Fall 2016
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Multidimensional Integration Part I
Lecture 15 Sampling.
Robust Full Bayesian Learning for Neural Networks
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Presentation transcript:

MCMC Methods in Harmonic Models Simon Godsill Signal Processing Laboratory Cambridge University Engineering Department www-sigproc.eng.cam.ac.uk/~sjg

Overview MCMC Methods Metropolis-Hastings and Gibbs Samplers Design Considerations Case Study: Gabor Regression Models

MCMC Methods MCMC methods are sophisticated and general methods for simulation from a complex probability distribution, say  (x) – x may be high dimensional,  highly non-Gaussian, multimodal: Given a set of samples from  (x) we can compute Monte Carlo expectations for any quantities of interest by ergodic averages:

MCMC Contd. In a Bayesian setting  (x) will typically be the posterior distribution : Underlying concept is to construct an irreducible, aperiodic Markov chain having p(x) as its stationary distribution and transition kernel K(dx’;x) Initialise chain at arbitrary state x (0) ( say, random) and simulate repeatedly from K(dx’;x) until convergence achieved Convergence in distribution is guaranteed under mild conditions, easily verified for most models

MCMC, contd. Rates of convergence are hard to compute – lots of theory, but not typically applicable in practice. However, many models, e.g. many harmonic modelling cases, can be proven to have geometric convergence rates.

MCMC Algorithms MCMC schemes are constructed to satisfy the detailed balance condition The most basic scheme satisfying detailed balance is the Metropolis- Hastings (M-H) method At each iteration of M-H, propose to move from the current state x with a proposal density q(x’|x). This proposal is accepted randomly with probability Otherwise remain at x and go on to next iteration

Componentwise M-H In most cases this won’t be feasible as x is high dimensional -> low acceptance rates, poor convergence Instead, split x into components: Then perform M-H on each component k=1,…,N: Propose Accept with probability

Gibbs Sampler Possibly the simplest form of MCMC – choose (the `full conditional’ distribution of x k ) Acceptance probability is 1 – i.e. all moves accepted.

Other types of MCMC Reversible Jump MCMC – extension of M-H to cases where x can have varying dimension (e.g. in sparsity estimation) – see Green (1995) – Biometrika Perfect simulation – special MCMC schemes that achieve exact samples from  (x) – highly desirable, but slow and not yet practical for many cases

Design Issues and Recommendations A basic understanding of MCMC is relatively easy, but it is not so easy to construct effective and efficient samplers Some of the main considerations are: How to partition x into components (need not be same size, and usually aren’t) What algorithms to use – M-H, Gibbs, something else? In general Gibbs should only be used if the full conditionals are straightforward to sample from, e.g. Gaussian, gamma, etc., otherwise use M-H.

(Blocking) – it’s nearly always best to group large numbers of components of x into single partitions x k, provided efficient M-H or Gibbs steps can be constructed for the partitions (Rao-Blackwellisation) – a related issue is marginalisation – it is better (in terms of estimator variance) to integrate out parameters analytically – again, subject to being able to construct efficient samplers on the remaining space:

References for MCMC MCMC in Practice – Gilks et al – Chapman and Hall (1996) Monte Carlo Statistical Methods – Robert and Casella – Springer (1999)

MCMC Case study – Gabor Regression models Now consider design of a sampler for harmonic models. Full details forthcoming as Wolfe, Godsill and Ng (2004) - Bayesian variable selection and regularisation for time-frequency surface estimation – Journal of Royal Statistical Society (Series B – methodological) (See also Wolfe and Godsill (NIPS 2002)) See

Gabor Regression Models Consider models of the form G is a matrix of Gabor atoms – here we chose an overcomplete dictionary with 2* redundancy We will seek sparse representations with time-frequency structure – encoded through prior distributions on c k ’s For the moment consider case of fixed, known  e and  ck

Gabor regression models Likelihood function is Posterior probability density for c is…

Posterior for c: So, in fact no MC is required for this case, since we have the full mean and Covariance matrix for c [Conditioning on  e and  c implicit]

Gibbs Sampler – blocking structures However, for large Gabor models, the matrix inversion will be very slow, and here we could look at reduced- dimension blocking structures Then Gibbs sampler would proceed as follows, for k=1,…,K: It’s instructive to look at the form of this conditional pdf:

Full conditional for c k [G k contains columns of G corresponding to partition k, and G -k the remaining columns.] This term is the residual error when c k =0 Note relationship to Basis Pursuit residual terms

This form of Gibbs sampler can be very cheap computationally The interest in this work is to extend the modelling capabilities provided by other algorithms – giving new forms of sparsity and structure. The extra steps are added in modular fashion, retaining the conditionally Gaussian structure of the coefficients and the efficient implementation

Sampling  e First, we allow estimation of the noise floor by sampling  e, assuming an inverted-gamma (IG) prior p(  e 2 ): Under this prior (conjugate) the full conditional takes the same form, which is easily sampled by standard methods (e.g. MATLAB) :

Sampling coefficient parameters Next, place a structured prior distribution on the Gabor coefficients. First make them heavy-tailed to match real audio signals. This is done using Scale Mixtures of Normals (see Godsill and Rayner (IEEE Tr. Sp. And Audio –1998) for an audio restoration example). Simply assign a prior to the variance of each c k : Implies a non-Gaussian heavy-tailed distribution for c k

Priors for  ck Choice of p(  ck 2 ) determines the implied heavy-tailed distribution p(c k ) In simplest case adopt the IG prior as this is conjugate. Then implied p(c k ) is Student’s t – distributed: IG prior has Jeffreys and exponential limiting cases, so the family can encompass many of the sparseness-inducing cases. Again, the IG prior is conjugate and Leads to a simple Gibbs sampler step:

Direct Sparsity Modelling Other choices of p(  ck 2 ) lead to other heavy-tailed distributions, e.g. it is possible to get  -stable or Generalised Gaussian coefficients with other choices. In these cases M-H would be used to do the sampling, see e.g. Godsill and Kuruoglu (1999 – CUED Tech. Rep.). A further addition that is easily encorporated into the MCMC is direct estimation of sparsity. This is an important addition to the models and does not compromise the guaranteed convergence properties of the methods. We can achieve this by allowing finite probability mass at zero in p(  ck 2 ):

Direct Sparsity Modelling Prior with point mass at zero: Where  k 2{0,1} is a binary indicator variable specifying whether coefficient c k is active or inactive. Structure is introduced at this point, through priors on the time- frequency indicator field {  k } We use Markov chain or Markov random field priors to encourage continuity across time (tones), frequency (transients), or both: The indicator field is also sampled using Gibbs sampling – details not given here – no time left…

Final Details We also sample the parameters of requiring one Gibbs and one M-H step.

Interpreting the MCMC output Assume that the MCMC has converged and initial `burn-in’ deleted: Coefficient estimation: Noise reduction: Estimating the sparsity coefficients: How many coefficients are active?

Results

Results, contd.

Typical output from the program Convergence of parameters Noisy data Final iteration MMSE Estimate See for examples and Matlab code

Conclusion Why use MCMC methods in harmonic models? Extend the range of models computable Guaranteed convergence (in the limit) Computations can be quite cheap Code would contain same building blocks as EM, IRLS or basis pursuit for similar models – easy to modify to MCMC for baseline comparison It’s really not as complicated or slow as people think!! Why not use MCMC methods? Can be computationally expensive Convergence diagnostics unreliable You may not want to explore new models

References C.P. Robert and G. Casella, Monte Carlo StatisticalMethods, New York: Springer Verlag, 1999 W. R. Gilks and S. Richardson and D. J. Spiegelhalter, Markov chain Monte Carlo in practice, London: Chapman and Hall, 1996 P. J. Green, Reversible Jump Markov-chain Monte Carlo computation and Bayesian model determination, Biometrika, 82(4), pp , 1995

Harmonic models and MCMC – SJG references – see www-sigproc.eng.cam.ac.uk/~sjg P. J. Wolfe, S. J. Godsill, and W.J. Ng. Bayesian variable selection and regularisation for time-frequency surface estimation Journal of the Royal Statistical Society, Series B, Read paper (with discussion). To Appear.Bayesian variable selection and regularisation for time-frequency surface estimation M.Davy and S. J. Godsill. Bayesian harmonic models for musical signal analysis (with discussion). In J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith, editors, Bayesian Statistics VII. Oxford University Press, 2003.Bayesian harmonic models for musical signal analysis (with discussion) P. J. Wolfe and S. J. Godsill. Bayesian modelling of time-frequency coefficients for audio signal enhancement. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, Cambridge, MA. MIT Press, 2002.Bayesian modelling of time-frequency coefficients for audio signal enhancement S. J. Godsill and P. J. W. Rayner. Digital Audio Restoration: A Statistical Model- Based Approach. Berlin: Springer, ISBN , September 1998.Digital Audio Restoration: A Statistical Model- Based Approach S. J. Godsill and P. J. W. Rayner. Robust reconstruction and analysis of autoregressive signals in impulsive noise using the Gibbs sampler. IEEE Trans. on Speech and Audio Processing, 6(4): , July 1998.Robust reconstruction and analysis of autoregressive signals in impulsive noise using the Gibbs sampler S. J. Godsill and E. E. Kuruoglu. Bayesian inference for time series with heavy- tailed symmetric alpha -stable noise processes. In Proc. Applications of heavy tailed distributions in economics, engineering and statistics, June Washington DC, USA. CUED Tech. Rep.Bayesian inference for time series with heavy- tailed symmetric alpha -stable noise processes