Introduction to Sampling based inference and MCMC

Slides:



Advertisements
Similar presentations
Jose-Luis Blanco, Javier González, Juan-Antonio Fernández-Madrigal University of Málaga (Spain) Dpt. of System Engineering and Automation May Pasadena,
Advertisements

CSCE643: Computer Vision Bayesian Tracking & Particle Filtering Jinxiang Chai Some slides from Stephen Roth.
Efficient Cosmological Parameter Estimation with Hamiltonian Monte Carlo Amir Hajian Amir Hajian Cosmo06 – September 25, 2006 Astro-ph/
Introduction of Markov Chain Monte Carlo Jeongkyun Lee.
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Markov Chains Modified by Longin Jan Latecki
Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Bayesian Reasoning: Markov Chain Monte Carlo
Suggested readings Historical notes Markov chains MCMC details
BAYESIAN INFERENCE Sampling techniques
Exact Inference (Last Class) variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Nonlinear and Non-Gaussian Estimation with A Focus on Particle Filters Prasanth Jeevan Mary Knox May 12, 2006.
Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.
Today Introduction to MCMC Particle filters and MCMC
Monte Carlo Methods in Partial Differential Equations.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Particle Filtering (Sequential Monte Carlo)
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Overview Particle filtering is a sequential Monte Carlo methodology in which the relevant probability distributions are iteratively estimated using the.
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
-Arnaud Doucet, Nando de Freitas et al, UAI
Markov-Chain Monte Carlo CSE586 Computer Vision II Spring 2010, Penn State Univ.
Exact Inference (Last Class) Variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
Tracking with dynamics
CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Rao-Blackwellised Particle Filtering for Dynamic Bayesian Network Arnaud Doucet Nando de Freitas Kevin Murphy Stuart Russell.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks Arnaud Doucet, Nando de Freitas, Kevin Murphy and Stuart Russell CS497EA presentation.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
The Monte Carlo Method/ Markov Chains/ Metropolitan Algorithm from sec in “Adaptive Cooperative Systems” -summarized by Jinsan Yang.
Bayesian Neural Networks
Advanced Statistical Computing Fall 2016
Intro to Sampling Methods
Introducing Bayesian Approaches to Twin Data Analysis
Probabilistic Robotics
Jun Liu Department of Statistics Stanford University
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Markov Networks.
Haim Kaplan and Uri Zwick
Particle Filtering ICS 275b 2002.
Ch13 Empirical Methods.
Lecture 15 Sampling.
Robust Full Bayesian Learning for Neural Networks
Expectation-Maximization & Belief Propagation
Slides for Sampling from Posterior of Shape
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Markov Networks.
Presentation transcript:

Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham

The problem Up till now we were trying to solve search problems (search for optima of functions, search for NN structures, search for solution to various problems) Today we try to:- Compute volumes Averages, expectations, integrals Simulate a sample from a distribution of given shape Some analogies with EA in that we work with ‘samples’ or ‘populations’

The Monte Carlo principle p(x): a target density defined over a high-dimensional space (e.g. the space of all possible configurations of a system under study) The idea of Monte Carlo techniques is to draw a set of (iid) samples {x1,…,xN} from p in order to approximate p with the empirical distribution Using these samples we can approximate integrals I(f) (or v large sums) with tractable sums that converge (as the number of samples grows) to I(f)

Importance sampling It also implies that p(x) is approximated by Target density p(x) known up to a constant Task: compute Idea: Introduce an arbitrary proposal density that includes the support of p. Then: Sample from q instead of p Weight the samples according to their ‘importance’ It also implies that p(x) is approximated by Efficiency depends on a ‘good’ choice of q.

Sequential Monte Carlo Real time processing Dealing with non-stationarity Not having to store the data Goal: estimate the distrib of ‘hidden’ trajectories We observe yt at each time t We have a model: Initial distribution: Dynamic model: Measurement model:

Can define a proposal distribution: Then the importance weights are: Obs. Simplifying choice for proposal distribution: Then: ‘fitness’

‘proposed’ ‘weighted’ ‘re-sampled’ --------- ‘proposed’ ‘weighted’

Applications Computer vision Speech & audio enhancement Object tracking demo [Blake&Isard] Speech & audio enhancement Web statistics estimation Regression & classification Global maximization of MLPs [Freitas et al] Bayesian networks Details in Gilks et al book (in the School library) Genetics & molecular biology Robotics, etc.

M Isard & A Blake: CONDENSATION – conditional density propagation for visual tracking. J of Computer Vision, 1998

References & resources [1] M Isard & A Blake: CONDENSATION – conditional density propagation for visual tracking. J of Computer Vision, 1998 Associated demos & further papers: http://www.robots.ox.ac.uk/~misard/condensation.html [2] C Andrieu, N de Freitas, A Doucet, M Jordan: An Introduction to MCMC for machine learning. Machine Learning, vol. 50, pp. 5--43, Jan. - Feb. 2003. Nando de Freitas’ MCMC papers & sw http://www.cs.ubc.ca/~nando/software.html [3] MCMC preprint service http://www.statslab.cam.ac.uk/~mcmc/pages/links.html [4] W.R. Gilks, S. Richardson & D.J. Spiegelhalter: Markov Chain Monte Carlo in Practice. Chapman & Hall, 1996

The Markov Chain Monte Carlo (MCMC) idea Design a Markov Chain on finite state space …such that when simulating a trajectory of states from it, it will explore the state space spending more time in the most important regions (i.e. where p(x) is large)

Stationary distribution of a MC Supposing you browse this for infinitely long time, what is the probability to be at page xi. No matter where you started off. =>PageRank (Google)

Google vs. MCMC Google is given T and finds p(x) MCMC is given p(x) and finds T But it also needs a ‘proposal (transition) probability distribution’ to be specified. Q: Do all MCs have a stationary distribution? A: No.

Conditions for existence of a unique stationary distribution Irreducibility The transition graph is connected (any state can be reached) Aperiodicity State trajectories drawn from the transition don’t get trapped into cycles MCMC samplers are irreducible and aperiodic MCs that converge to the target distribution These 2 conditions are not easy to impose directly

Reversibility Reversibility (also called ‘detailed balance’) is a sufficient (but not necessary) condition for p(x) to be the stationary distribution. It is easier to work with this condition.

MCMC algorithms Metropolis algorithm Gibbs sampling other Metropolis-Hastings algorithm Metropolis algorithm Mixtures and blocks Gibbs sampling other Sequential Monte Carlo & Particle Filters

The Metropolis-Hastings and the Metropolis algorithm as a special case Obs. The target distrib p(x) in only needed up to normalisation.

Examples of M-H simulations with q a Gaussian with variance sigma

Variations on M-H: Using mixtures and blocks Mixtures (eg. of global & local distributions) MC1 with T1 and having p(x) as stationary p MC2 with T2 also having p(x) as stationary p New MCs can be obtained: T1*T2, or a*T1 + (1-a)*T2, which also have p(x) Blocks Split the multivariate state vector into blocks or components, that can be updated separately Tradeoff: small blocks – slow exploration of target p large blocks – low accept rate

Gibbs sampling Component-wise proposal q: Where the notation means: Homework: Show that in this case, the acceptance probability is =1 [see [2], pp.21]

Gibbs sampling algorithm

More advanced sampling techniques Auxiliary variable samplers Hybrid Monte Carlo Uses the gradient of p Tries to avoid ‘random walk’ behavior, i.e. to speed up convergence Reversible jump MCMC For comparing models of different dimensionalities (in ‘model selection’ problems) Adaptive MCMC Trying to automate the choice of q