Presentation is loading. Please wait.

Presentation is loading. Please wait.

BAYESIAN INFERENCE Sampling techniques

Similar presentations


Presentation on theme: "BAYESIAN INFERENCE Sampling techniques"— Presentation transcript:

1 BAYESIAN INFERENCE Sampling techniques
Andreas Steingötter

2 Motivation & Background
Exact inference is intractable, so we have to resort to some form of approximation

3 Motivation & Background
variational Bayes deterministic approximation not exact in principle Alternative approximation: Perform inference by numerical sampling, also known as Monte Carlo techniques.

4 Motivation & Background
Posterior distribution 𝑝 𝑧 is required (primarily) for the purpose of evaluating expectations 𝐸(𝑓). 𝑓 𝑧 are predictions made by model with parameters 𝑧 𝑝 𝑧 is parameter prior and 𝑓 𝑧 =𝑝(𝑦|𝑧) is likelihood - evaluate the marginal likelihood (evidence) for a model

5 Motivation & Background
approximation Classical Monte Carlo approx 𝑧 (𝑙) are random (not necessarily independent) draws from 𝑝 𝑧 , which converges to the right answer in the limit of large numbers of samples, 𝐿.

6 Motivation & Background
Problems: How to obtain independent samples from 𝒑 𝒛 ? Expectation may be dominated by regions of small probability -> large sample sizes will be required to achieve sufficient accuracy Monte Carlo ignores values of 𝑧 (𝑙) when forming the estimate if 𝑧 (𝑙) are independent draws from 𝑝 𝑧 , then low numbers suffice to estimate expectation

7 How to do sampling? Basic Sampling algorithms Markov chain Monte Carlo
Restricted mainly to 1- / 2- dimensional problems Markov chain Monte Carlo Very general and powerful framework

8 Basic sampling Special cases Model with directed graph
Ancestral sampling: Easy sampling of joint distribution: Logic sampling: Compare sampled value for 𝑧 𝑖 with observed value at node i. If NOT agree, then discard all previous samples and start with first node

9 Random sampling Computers can generate only pseudorandom numbers
Correlation of successive values Lack of uniformity of distribution Poor dimensional distribution of output sequence Distance between where certain values occur are distributed differently from those in a random sequence distribution

10 Random sampling from the Uniform Distribution
Assumption: good pseudo-random generator for uniformly distributed data is implemented Alternative: “true” random numbers with randomness coming from atmospheric noise

11 Random sampling from a standard non-uniform distribution
Goal: Sample from non-uniform distribution 𝑝 𝑦 which is a standard distribution, i.e. given in analytical form Suppose: we have uniformly distributed random numbers from (0,1) Solution: Transform random numbers 𝑧 over (0,1) using a function which is the inverse of the indefinite integral of the desired distribution

12 Random sampling from a standard non-uniform distribution
Step 1: Calculate cumulative distribution function Step 2: Transform samples 𝑈 𝑧 0,1 by

13 Rejection sampling Suppose: Approach:
direct sampling from 𝑝 𝒛 is difficult, but 𝑝 𝒛 can be evaluated for any given value of 𝒛 up to some normalization constant 𝑍 𝑍 𝑝 is unknown, 𝑝 𝑧 can be evaluated Approach: Define simple proposal distribution 𝑞(𝑧) such that 𝑘𝑞 𝑧 ≥ 𝑝 (𝑧) for all 𝑧.

14 Rejection sampling Simple visual example
Constant k should be as small as possible. Fraction of rejected points depends on the ratio of the area under the unnormalized distribution 𝑝 𝑧 to the area under the curve 𝑘𝑞 𝑧 . 𝑘𝑞 𝑧 𝑝 (𝑧)

15 Rejection sampling Rejection sampler Generate two random numbers
number 𝑧 0 from proposal distribution 𝑞(𝑧) generate a number 𝑢 0 from uniform distribution over [0,k𝑞 𝑧0 ] If 𝑢 0 > 𝑝 ( 𝑧 0 ) reject! Remaining pairs have unifrom distribution under 𝑝 (𝑧)

16 Adaptive rejection sampling
Suppose: difficult to determine a suitable analytic form for the proposal distribution 𝑞(𝑧) Approach: construct envelope function “on the fly” based on observed values of the distribution 𝑝 𝑧 if 𝑝 𝑧 is log concave (ln𝑝 𝑧 has non-increasing derivatives) use derivatives to construct envelope

17 Adaptive rejection sampling
Step 1: at initial set of grid points 𝑧 1 ,…, 𝑧 𝑀 evaluate function ln𝑝 𝑧 𝑖 and its gradient and calculate tangents at 𝑝 𝑧 𝑖 , i=1,…,M. Step 2: sample from envelop distribution, if accepted use it to calculate 𝑝(𝑧), otherwise refine grid. Envelope distribution is a piecewise exponential distribution Slope  Offset k

18 Adaptive rejection sampling
Problem of rejection sampling: Find a proposal distribution 𝑞(𝑧), which is close to required distribution to minimize rejection rate. Therefore restricted mainly to univariate distributions curse of dimensionality However: potential subroutine

19 Importance sampling Framework for approximating expectations 𝐸 𝑝 (𝑓 𝑧 ) directly with respect to 𝑝 𝒛 Does NOT provide 𝑝 𝒛 Suppose (again): direct sampling from 𝑝 𝒛 is difficult, but 𝑝 𝒛 can be evaluated for any given value of 𝒛 up to some normalization constant 𝑍.

20 Importance sampling As for rejection sampling, apply proposal distribution 𝑞 𝑧 from which it is easy to draw samples

21 Importance sampling Expectation formula for un-normalized distributions with importance weights 𝑟 𝑙 Key points: Importance weights correct bias introduced by sampling from proposal distribution Dependence on how well 𝑞 𝑧 approximates 𝑝(𝑧) (similar to rejection sampling) Choose sample points in input space where 𝑓 𝑧 𝑝 𝑧 is large (or at least where 𝑝 𝑧 is large) If 𝑝 𝑧 > 0 in same region, then 𝑞 𝑧 >0 necessary

22 Importance sampling Attention:
Consider none of the samples falls in the regions where 𝑓 𝑧 𝑝 𝑧 is large. In that case, the apparent variances of 𝑟 𝑙 and 𝑟 𝑙 𝑓(𝑧(𝑙)) may be small even though the estimate of the expectation may be severely wrong. Hence a major drawback of the importance sampling method is the potential to produce results that are arbitrarily in error and with no diagnostic indication. 𝒒 𝒛 should NOT be small where 𝒑 𝒛 may be significant!!!

23 Markov Chain Monte Carlo (MCMC) sampling
MCMC is a general framework, sampling from large class of distributions, scales well with dimensionality of sample space Goal: Generate samples from distribution 𝑝(𝑧) Idea: Build a machine which uses the current sample to decide which next sample to produce in such a way that the overall distribution of the samples will be 𝑝(𝑧).

24 Markov Chain Monte Carlo (MCMC) sampling
Approach: Generate a candidate sample 𝑧 ∗ from a proposal distribution 𝑞(𝑧| 𝑧 (𝜏) ) that depends on the current state 𝑧 (𝜏) and is sufficiently simple to draw samples from directly. Current sample 𝑧 (𝜏) is known (i.e. maintain record of the current state) Samples 𝑧 (1) , 𝑧 (2) , 𝑧 (3) ,… form a Markov chain Accept or reject the candidate sample 𝑧 ∗ according to some appropriate criterion

25 MCMC - Metropolis algorithm
Suppose: 𝑝 𝒛 can be evaluated for any given value of 𝒛 up to some normalization constant 𝑍. Algorithm: Step 1: Choose symmetric proposal distribution 𝑞 𝒛𝐴 𝒛𝐵 =𝑞 𝒛𝐵 𝒛𝐴 Step 2: Candidate sample 𝑧 ∗ is accepted with probability

26 MCMC - Metropolis algorithm
Algorithm (cont.): Step 2.1: Choose a random number 𝑢 with uniform distribution in (0,1) Step 2.2: Acceptance test for 𝑢< Step 3:

27 Metropolis algorithm Notes: rejection of a points leads to the previous sample (different from rejection sampling) If 𝑞 𝒛𝐴 𝒛𝐵 > 0 for any values 𝒛𝐴, 𝒛𝐵, then 𝒛 (𝜏) tends to 𝑝 𝒛 for 𝜏 ->  𝑧 (1) , 𝑧 (2) , 𝑧 (2) ... present no independent samples from 𝑝 𝒛 - serial correlation. Instead retain only every Mth sample.

28 Examples: Metropolis algorithm
Implementation in R: Elliptical distibution 𝑝 ( 𝑧 (𝜏) ) 𝑝 ( 𝑧 ∗ ) 𝑢< Update state 𝒛 𝜏+1 = 𝒛 ∗ Keep old state 𝒛 𝜏+1 = 𝒛 (𝜏)

29 Examples: Metropolis algorithm
Implementation in R: Initialization [-2,2], step size = 0.3 n=1500 n=15000

30 Examples: Metropolis algorithm
Implementation in R: Initialization [-2,2], step size = 0.5 n=1500 n=15000

31 Examples: Metropolis algorithm
Implementation in R: Initialization [-2,2], step size = 1 n=1500 n=15000

32 Validation of MCMC Properties of Markov chains: Transition probabilities 𝑇𝑚( 𝑧 ′ ,𝑧): z(1) z(2) z(m) z(m+1) homogeneous If 𝑇𝑚 is the same for all m Invariant (stationary)

33 Validation of MCMC Propoerties of Markov chains: 𝑝 ∗ 𝑧 : homogeneous
If 𝑇𝑚 is the same for all m Invariant (stationary) Sufficient detailed balance 𝑇𝑚 satisfy reversible

34 Validation of MCMC ergodicity
Goal: invariant Markov chain that converges to desired distribution 𝑝 ∗ 𝑧 An ergodic Markov chain has only one equilibrium distribution invariant 𝑝 ∗ 𝑧 = lim 𝑚→∞ 𝑝 𝑧m !!! for any 𝑝(𝑧0) ergodicity

35 Properties and validation of MCMC
Approach: Construct appropriate transition probabilities 𝑇( 𝑧 ′ ,𝑧): 𝑇( 𝑧 ′ ,𝑧) from set of base transitions 𝑩k Mixture form Successive application k - Mixing coefficients

36 Metropolis-Hastings algorithm
Generalization of Metropolis algorithm No symmetric proposal distribution 𝑞(𝑧) required Choice of proposal distribution crititcal If symmetry

37 Metropolis-Hastings algorithm
Gaussian centered on current state Small variance -> high acceptance, slow walk, dependent samples Large variance -> high rejection rate

38 Gibbs sampling Special case of Metropolis-Hastings algorithm
the random value is always accepted, Suppose: 𝑝 𝑧1,𝑧2,𝑧3 , Step 1: initial samples 𝑧1, 𝑧2, 𝑧3 Step 2: (repeated) 𝑧11~𝑝 𝑧1 𝑧2,𝑧3 𝑧21~𝑝(𝑧2|𝑧11,𝑧3) 𝑧31~𝑝(𝑧3|𝑧11,𝑧21) repeated by cycling randomly choose variable to be updated

39 Gibbs sampling 𝑝 𝒛\i is invariant (unchanged)
Univariate conditional distribution 𝑝(𝑧𝑖|𝒛\i) is invariant (by definition) Joint distribution 𝑝 𝒛 is invariant Because (fixed at each step)

40 Gibbs sampling Sufficient condition for ergodicity:
None of the conditional distributions be anywhere zero, i.e. any point in 𝒛 space can be reached from any other point in a finite number of steps z(2) z(1) z(3)

41 Gibbs sampling Obtain m independent samples:
Sample MCMC during a «burn-in» period to remove dependence on initial values Then, sample at set time points (e.g. every Mth sample) The Gibbs sequence converges to a stationary (equilibrium) distribution that is independent of the starting values, By construction this stationary distribution is the target distribution we are trying to simulate.

42 Gibbs sampling Practicability dependent feasibility to draw samples from conditional distributions 𝑝(𝑧𝑖|𝒛\i). Directed graphs will lead to conditional distributions for Gibbs sampling that are log concave. Adaptive rejection sampling methods


Download ppt "BAYESIAN INFERENCE Sampling techniques"

Similar presentations


Ads by Google