Download presentation
Presentation is loading. Please wait.
Published byKevin Washington Modified over 8 years ago
1
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011
2
Last Time 1.Random Sampling 1.Logic sampling 2.Rejection sampling 3.Likelihood weighting 4.Importance sampling 2.Markov Chain Monte Carlo (MCMC) 1.Gibbs sampling 2.Metropolis-Hastings
3
Example Estimate the probability of head for an unbiased coin –Generate samples from P(coin)=(0.5,0.5) –Like tossing a coin –Finally :
4
Sampling Algorithms Idea: search for high probability instances Suppose are instances with high mass We can approximate (Bayes rule): If is a complete instantiation, then is – 0 or 1
5
Search Algorithms (cont) Instances that do not satisfy e, do not play a role in approximation We need to focus the search to find instances that do satisfy e Clearly, in some cases this is hard (NP- hardness result)
6
Stochastic Sampling Intuition: given a sufficient number of samples x[1],…,x[N], we can estimate Law of large number implies that as N grows, our estimate will converge to p The number of samples that we need is potentially exponential in dimension of P.
7
Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 b Earthquake Radio Burglary Alarm Call 0.03
8
Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eb Earthquake Radio Burglary Alarm Call 0.001
9
Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eab 0.4 Earthquake Radio Burglary Alarm Call
10
Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eacb Earthquake Radio Burglary Alarm Call 0.8
11
Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eacb r 0.3 Earthquake Radio Burglary Alarm Call
12
Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a 0.8 0.05 P(r) e e 0.30.001 b Earthquake Radio Burglary Alarm Call 0.03 Weight = r a = a
13
Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eb Earthquake Radio Burglary Alarm Call 0.001 Weight = r = a
14
Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eb 0.4 Earthquake Radio Burglary Alarm Call Weight = r = a 0.6 a
15
Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 ecb Earthquake Radio Burglary Alarm Call 0.05 Weight = r = a a 0.6
16
Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 ecb r 0.3 Earthquake Radio Burglary Alarm Call Weight = r = a a 0.6 *0.3
17
Importance Sampling A method for evaluating expectation of f under P(x), P(X) Discrete: Continuous: If we could sample from P
18
Importance Sampling A general method for evaluating P(X) when we cannot sample from P(X). Idea: Choose an approximating distribution Q(X) and sample from it Using this we can now sample from Q and then W(X) If we could generate samples from P(X) Now that we generate the samples from Q(X)
19
(Unnormalized) Importance Sampling 1. For m=1:M Sample X[m] from Q(X) Calculate W(m) = P(X)/Q(X) 2. Estimate the expectation of f(X) using Requirements: P(X)>0 Q(X)>0 (don’t ignore possible scenarios) Possible to calculate P(X),Q(X) for a specific X=x It is possible to sample from Q(X)
20
Normalized Importance Sampling Assume that we cannot evaluate P(X=x) but can evaluate P’(X=x) = P(X=x) (for example we can evaluate P(X) but not P(X|e) in a Bayesian network) We define w’(X) = P’(X)/Q(X). We can then evaluate : and then: In the last step we simply replace with the above equation
21
Normalized Importance Sampling We can now estimate the expectation of f(X) similarly to unnormalized importance sampling by sampling x[m] from Q(X) and then (hence the name “normalized”)
22
Importance Sampling Weaknesses Important to choose sampling distribution with heavy tails –Not to “miss” large values of f Many-dimensional I-S: –“Typical set” of P may take a long time to find, unless Q good approximation to P –Weights vary by factors exponential in N Similar for Likelihood Weighting
23
Today’s Agenda 1.The Normal Distribution: pdf, cdf, estimation, sampling 2.Markov Chain Monte Carlo (MCMC) 1.Gibbs sampling 2.Metropolis-Hastings
24
Sampling from a Normal Distribution Gaussian Distribution Samples
25
Normal Distribution
28
Stochastic Sampling Previously: independent samples to estimate P(X = x |e ) Problem: difficult to sample from P(X 1, …. X n |e ) We had to use likelihood weighting –Introduces bias in estimation In some case, such as when the evidence is on leaves, these methods are inefficient –Very low weights if e has low prior probability –Very few samples have high-mass (high weight)
29
MCMC Methods Sampling methods that are based on Markov Chain –Markov Chain Monte Carlo (MCMC) methods Key ideas: –Sampling process as a Markov Chain Next sample depends on the previous one –Approximate any posterior distribution Next: review theory of Markov chains
30
MCMC Methods Notes: The Markov chain variable Y takes as value assignments to all variables that are consistent with evidence For simplicity, we will denote such a state using the vector of variables
31
Gibbs Sampler One of the simplest MCMC method Each transition changes the state of one X i The transition probability defined by P itself as a stochastic procedure: –Input: a state x 1,…,x n –Choose i at random (uniform probability) –Sample x’ i from P(X i |x 1, …, x i-1, x i+1,…, x n, e) –let x’ j = x j for all j i –return x’ 1,…,x’ n
32
Sampling Strategy How do we collect the samples? Strategy I: Run the chain M times, each for N steps –each run starts from a different state points Return the last state in each run M chains
33
Sampling Strategy Strategy II: Run one chain for a long time After some “burn in” period, sample points every some fixed number of steps “burn in” M samples from one chain
34
Comparing Strategies Strategy I: –Better chance of “covering” the space of points especially if the chain is slow to reach stationarity –Have to perform “burn in” steps for each chain Strategy II: –Perform “burn in” only once –Samples might be correlated (although only weakly) Hybrid strategy: –Run several chains, sample few times each –Combines benefits of both strategies
35
Summary: Sampling Inference Monte Carlo (sampling with positive and negative error) Methods: –Pos: Simplicity of implementation and theoretical guarantee of convergence –Neg: Can be slow to converge and hard to diagnose their convergence.
36
THE END
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.