Download presentation
Presentation is loading. Please wait.
Published byAugust Mathews Modified over 9 years ago
1
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence Spring 2010 Lecture #22
2
Previously Bayesian Networks Exact reasoning –Variable elimination Machine Learning –Naïve Bayes –ID3: Decision-Tree Learning Today: approximate reasoning
3
Today’s Agenda 1.Random Sampling 1.Logic sampling 2.Rejection sampling 3.Likelihood weighting 4.Importance sampling 2.Will not cover: Markov Chain Monte Carlo (MCMC) 1.Gibbs sampling 2.Metropolis-Hastings
4
Random Sampling Sampling: Also called Monte Carlo techniques Sometimes hard to compute the posterior probability exactly Approximate Reasoning: –Approximate by sampling from a distribution Accuracy depends on the number of samples
5
Example Estimate the probability of head for an unbiased coin –Generate samples from P(coin)=(0.5,0.5) –Like tossing a coin –Finally :
6
Generating Samples Need: generate samples from probability P(“head”)=p How? Find a biased coin … Use Random-number-generator –Provides Uniform distribution between [0,1] –When x generated uniform [0,1] answer… x<p 0 1p
7
Sampling a Bayesian Network If P(X 1,…,X n ) is represented by a Bayesian network, can we efficiently sample from it? Idea: sample according to structure of the network –Write distribution using the chain rule, and then sample each variable given its parents
8
Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 b Earthquake Radio Burglary Alarm Call 0.03
9
Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eb Earthquake Radio Burglary Alarm Call 0.001
10
Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eab 0.4 Earthquake Radio Burglary Alarm Call
11
Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eacb Earthquake Radio Burglary Alarm Call 0.8
12
Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eacb r 0.3 Earthquake Radio Burglary Alarm Call
13
Logic Sampling Let X 1, …, X n be order of variables consistent with arc direction for i = 1, …, n do –sample x i from P(X i | pa i ) –(Note: since Pa i {X 1,…,X i-1 }, we already assigned values to them) return x 1, …,x n
14
Logic Sampling Running time of sampling a complete instance –Linear in number of variables –Regardless of structure of the network However, can require a lot of samples to approximate the distribution
15
Can sample from P(X 1,…,X n |e)? If evidence is in roots of network, easily If evidence is in leaves of network, we have a problem –Our sampling method proceeds according to order of nodes in graph Note, we can use arc-reversal to make evidence nodes root. –In some networks, however, this will create exponentially large tables...
16
Rejection Sampling Employing evidence –Idea: remove any sample that does not satisfy the evidence –Problem: rejects so many sample if P(e) is small, we need many samples to get a decent estimate
17
Likelihood Weighting Can we ensure that all of our samples satisfy e? One simple solution: –When we need to sample a variable that is assigned value by e, use the specified value For example: we know Y = 1 –Sample X from P(X) –Then take Y = 1 Is this a sample from P( X,Y |Y = 1) ? X Y
18
Likelihood Weighting Problem: these samples of X from P(X) Solution: –Penalize samples in which P(Y=1|X) is small We now sample as follows: –Let x[i] be a sample from P(X) –Let w[i] be P(Y = 1|X = x [i]) X Y (Bayes rule)
19
Likelihood Weighting Why does this make sense? When N is large, we expect to sample NP(X = x) samples with x[i] = x Thus, When we normalize, we get approximation of the conditional probability w[i]
20
Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a 0.8 0.05 P(r) e e 0.30.001 b Earthquake Radio Burglary Alarm Call 0.03 Weight = r a = a
21
Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eb Earthquake Radio Burglary Alarm Call 0.001 Weight = r = a
22
Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eb 0.4 Earthquake Radio Burglary Alarm Call Weight = r = a 0.6 a
23
Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 ecb Earthquake Radio Burglary Alarm Call 0.05 Weight = r = a a 0.6
24
Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 ecb r 0.3 Earthquake Radio Burglary Alarm Call Weight = r = a a 0.6 *0.3
25
Likelihood Weighting: Generation of Samples Let X 1, …, X n be order of variables consistent with arc direction w = 1 for i = 1, …, n do –if X i = x i has been observed w w* P(X i = x i | pa i ) –else sample x i from P(X i | pa i ) return x 1, …,x n, and w
26
Sampling Algorithms with Evidence: Another View Idea: search for high probability instances Suppose are instances with high mass We can approximate (Bayes rule): If is a complete instantiation, then is – 0 or 1
27
Search Algorithms (cont) Instances that do not satisfy e, do not play a role in approximation We need to focus the search to find instances that do satisfy e Clearly, in some cases this is hard (NP- hardness result)
28
Stochastic Simulation Suppose we can sample instances according to P(X 1,…,X n ) What is the probability that a random sample satisfies e? –This is exactly P(e) We can view each sample as tossing a biased coin with probability P(e) of “Heads”
29
Stochastic Sampling Intuition: given a sufficient number of samples x[1],…,x[N], we can estimate Law of large number implies that as N grows, our estimate will converge to p The number of samples that we need is potentially exponential in dimension of P.
30
Importance Sampling A method for evaluating expectation of f under P(x), P(X) Discrete: Continuous: If we could sample from P
31
Importance Sampling A general method for evaluating P(X) when we cannot sample from P(X). Idea: Choose an approximating distribution Q(X) and sample from it Using this we can now sample from Q and then W(X) If we could generate samples from P(X) Now that we generate the samples from Q(X)
32
(Unnormalized) Importance Sampling 1. For m=1:M Sample X[m] from Q(X) Calculate W(m) = P(X)/Q(X) 2. Estimate the expectation of f(X) using Requirements: P(X)>0 Q(X)>0 (don’t ignore possible scenarios) Possible to calculate P(X),Q(X) for a specific X=x It is possible to sample from Q(X)
33
Normalized Importance Sampling Assume that we cannot evaluate P(X=x) but can evaluate P’(X=x) = P(X=x) (for example we can evaluate P(X) but not P(X|e) in a Bayesian network) We define w’(X) = P’(X)/Q(X). We can then evaluate : and then: In the last step we simply replace with the above equation
34
Normalized Importance Sampling We can now estimate the expectation of f(X) similarly to unnormalized importance sampling by sampling x[m] from Q(X) and then (hence the name “normalized”)
35
Importance Sampling Weaknesses Important to choose sampling distribution with heavy tails –Not to “miss” large values of f Many-dimensional I-S: –“Typical set” of P may take a long time to find, unless Q good approximation to P –Weights vary by factors exponential in N Similar for Likelihood Weighting
36
Next Class 1.Approximation guarantees and hardness 2.Monte Carlo techniques 1.Rejection sampling 2.Likelihood weighting 3.Importance sampling 3.Markov Chain Monte Carlo (MCMC) 1.Gibbs sampling 2.Metropolis-Hastings
37
Stochastic Sampling Previously: independent samples to estimate P(X = x |e ) Problem: difficult to sample from P(X 1, …. X n |e ) We had to use likelihood weighting –Introduces bias in estimation In some case, such as when the evidence is on leaves, these methods are inefficient –Very low weights if e has low prior probability –Very few samples have high-mass (high weight)
38
MCMC Methods Sampling methods that are based on Markov Chain –Markov Chain Monte Carlo (MCMC) methods Key ideas: –Sampling process as a Markov Chain Next sample depends on the previous one –Approximate any posterior distribution Next: review theory of Markov chains
39
MCMC Methods Notes: The Markov chain variable Y takes as value assignments to all variables that are consistent with evidence For simplicity, we will denote such a state using the vector of variables
40
Gibbs Sampler One of the simplest MCMC method Each transition changes the state of one X i The transition probability defined by P itself as a stochastic procedure: –Input: a state x 1,…,x n –Choose i at random (uniform probability) –Sample x’ i from P(X i |x 1, …, x i-1, x i+1,…, x n, e) –let x’ j = x j for all j i –return x’ 1,…,x’ n
41
Sampling Strategy How do we collect the samples? Strategy I: Run the chain M times, each for N steps –each run starts from a different state points Return the last state in each run M chains
42
Sampling Strategy Strategy II: Run one chain for a long time After some “burn in” period, sample points every some fixed number of steps “burn in” M samples from one chain
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.