Instructors: Fei Fang (This Lecture) and Dave Touretzky

Instructors: Fei Fang (This Lecture) and Dave Touretzky
Artificial Intelligence: Representation and Problem Solving Probabilistic Reasoning (3): Sampling Methods / 681 Instructors: Fei Fang (This Lecture) and Dave Touretzky Wean Hall 4126 12/8/2018

Probability Models and Probabilistic Inference Basics of Bayes’ Net
Recap Probability Models and Probabilistic Inference Basics of Bayes’ Net Independence Exact Inference Real problems: very large network, hard to compute conditional probabilities exactly (computationally expensive) Today: Sampling methods for approximate inference Fei Fang 12/8/2018

Recap Law of large numbers (LLN): the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed 𝑋 1 , 𝑋 2 ,…, 𝑋 𝑛 are i.i.d. Lebesgue integrable random variables with expected value 𝐸 𝑋 𝑖 =𝜇,∀𝑖. Then the sample average 𝑋 𝑛 = 1 𝑛 𝑋 1 +…+ 𝑋 𝑛 converges to the expected value, i.e., 𝑋 𝑛 →𝜇 for 𝑛→∞ Weak law: lim 𝑛→∞ Pr 𝑋 𝑛 −𝜇 >𝜖 =0,∀𝜖>0 Strong law: Pr lim 𝑛→∞ 𝑋 𝑛 =𝜇 =1 Express the quantity we want to know as the expected value of a random variable 𝑋, i.e., 𝜇=𝐸(𝑋). Generate sample values 𝑋 1 ,…, 𝑋 𝑛 (which are i.i.d random variables) and the sample average should converge to 𝜇 Independent and identically distributed Fei Fang 12/8/2018

Recap Bag 1: two gold coins. Bag 2: two pennies. Bag 3: one of each.
Bag is chosen at random, and one coin from it is selected at random; the coin is gold What is the probability that the other coin is gold given the observation? Defines the probability model This is the evidence Try 3000 times. P(other coin is gold | first coin is gold) ≈ #(first coin is gold and other coin is gold) #(first coin is gold) Fei Fang 12/8/2018

Approximate Inference Direct Sampling Markov Chain simulation
Outline Approximate Inference Direct Sampling Markov Chain simulation Fei Fang 12/8/2018

Approximate Inference in BN
For inference in BN, often interested in: Posterior probability of taking on any value given some evidence: 𝐏(𝐘|𝐞) Most likely explanation given some evidence: argmax 𝐲 𝑃(𝐘=𝐲|𝐞) Exact inference is intractable in large networks Enumeration Variable Elimination Approximate inference in BN through sampling If we can get samples of 𝐘 from the posterior distribution 𝐏(𝐘|𝐞), we can use these samples to approximate posterior distribution and/or most likely explanation (based on LLN) Fei Fang 12/8/2018

Approximate Inference in BN
Approximate inference in BN through sampling Assume we have some method for generating samples given a known probability distribution (e.g., uniform in [0,1]) A sample is an assignment of values to each variable in the network Generally only interested in query variables after finish sampling Use samples to approximately compute posterior probabilities Queries can be issued after finish sampling Fei Fang 12/8/2018

Example: Wet Grass Variables: 𝐶𝑙𝑜𝑢𝑑𝑦(𝐶), 𝑆𝑝𝑟𝑖𝑛𝑘𝑙𝑒𝑟(𝑆), 𝑅𝑎𝑖𝑛(𝑅), 𝑊𝑒𝑡 𝐺𝑟𝑎𝑠𝑠(𝑊) Domain of each variable: 𝑡𝑟𝑢𝑒 (+) or 𝑓𝑎𝑙𝑠𝑒 (−) What is the probability distribution of 𝐶𝑙𝑜𝑢𝑑𝑦 if you know the sprinkler has been turned on and it rained, i.e., 𝑃(𝐶|+s,+r)? Samples of 𝐶𝑙𝑜𝑢𝑑𝑦 given +𝑠 & +𝑟: +, −, +, +, −, +, +, +, +, − 𝑃 +c +s,+r ≈0.7 𝑃 −c +s,+r ≈0.3 +c 0.5 -c Cloudy +c +s .1 -s .9 -c .5 +c +r .8 -r .2 -c Sprinkler Rain Wet Grass +s +r +w .99 -w .01 -r .90 .10 -s 1.0 Fei Fang 12/8/2018

Sampling for a Single Variable
Want to sample values of a random variable 𝐶 whose domain is {𝑡𝑟𝑢𝑒, 𝑓𝑎𝑙𝑠𝑒}, with probability distribution 〈0.5,0.5〉 Simple approach 𝑟 = rand([0,1]) If (𝑟<0.5) Sample= +c (𝐶=𝑡𝑟𝑢𝑒) Else Sample = −c (𝐶=𝑓𝑎𝑙𝑠𝑒) 𝐏(𝐶) +c 0.5 -c Fei Fang 12/8/2018

Sampling with Condition
Want to sample values of a random variable 𝑆 when −c (𝐶=𝑓𝑎𝑙𝑠𝑒) Find the corresponding rows in the conditional probability table (CPT): 𝐏 𝑆 −c =〈0.5,0.5〉 Simple approach 𝑟 = rand([0,1]) If (𝑟<0.5) Sample= +s (𝑆=𝑡𝑟𝑢𝑒) Else Sample = −s (𝑆=𝑓𝑎𝑙𝑠𝑒) 𝐏(𝑆|𝐶) +c +s 0.90 -s 0.10 -c 0.5 Fei Fang 12/8/2018

Direct Sampling (Forward Sampling)
Directly generate samples from prior distribution and conditional distribution specified by Bayes’ Net (i.e., without considering any evidence) Create a topological ordering based on the DAG of Bayes’ Net A node can only appear after all of its ancestors in the graph Sample each variable in turn, conditioned on the values of its parents Fei Fang 12/8/2018

Valid topological ordering:
Example: Wet Grass Valid topological ordering: 𝐶,𝑆,𝑅,𝑊 𝐶,𝑅,𝑆,𝑊 Use the ordering 𝐶,𝑆,𝑅,𝑊, sample variables in turn +c 0.5 -c Cloudy +c +s .1 -s .9 -c .5 +c +r .8 -r .2 -c Sprinkler Rain Wet Grass +s +r +w .99 -w .01 -r .90 .10 -s 1.0 Fei Fang 12/8/2018

Estimate Probability We can use the samples to estimate probability of an event 𝑋 1 = 𝑥 1 ,…, 𝑋 𝑛 = 𝑥 𝑛 𝑃 𝑋 1 = 𝑥 1 ,…, 𝑋 𝑛 = 𝑥 𝑛 ≈ 𝑁 𝑃𝑆 ( 𝑥 1 ,…, 𝑥 𝑛 ) 𝑁 When #𝑠𝑎𝑚𝑝𝑙𝑒→∞, the approximation becomes exact (called consistent) Fei Fang 12/8/2018

Example: Wet Grass What is 𝑃 +c,−s,+r,+w ?
Sample times according to this sampling procedure, if you get (+c,−s,+r,+w) 𝐾 times, then 𝑃 +c,−s,+r,+w ≈ 𝐾 10000 +c 0.5 -c Cloudy +c +s .1 -s .9 -c .5 +c +r .8 -r .2 -c Sprinkler Rain Wet Grass +s +r +w .99 -w .01 -r .90 .10 -s 1.0 Fei Fang 12/8/2018

Reject Sampling If the query is conditional (posterior) probability 𝐏 𝐘 𝐞 , then need to “reject” the direct samples that do not match the evidence 𝐏 𝐘 𝐞 ≈ 𝐍 𝑃𝑆 𝐘,𝐞 𝑁 𝑃𝑆 𝐞 Inefficient  Fei Fang 12/8/2018

What is a reasonable estimate of 𝑃 +s +r ?
Quiz 1 You get 100 direct samples 73 have −s, of which 12 have +r 27 have +s, of which 8 have +r What is a reasonable estimate of 𝑃 +s +r ? A: B: C: 8 27 D: None of the above +c 0.5 -c Cloudy +c +s .1 -s .9 -c .5 +c +r .8 -r .2 -c Sprinkler Rain Wet Grass +s +r +w .99 -w .01 -r .90 .10 -s 1.0 Fei Fang 12/8/2018

Likelihood Weighting Likelihood Weighting
Generate only samples that agree with evidence and weight them according to likelihood of evidence More efficient than reject sampling A particular instance of the general statistical technique of importance sampling Likelihood Weighting 1. Initialize counter N for query variables 𝐘 to be 𝟎 2. Run 𝑁 iterations. In each iteration, get a sample 𝐱=(𝐲,𝐞,⋅) of all variables that is consistent with evidence 𝐞, together with a weight 𝑤. Set N 𝐲 ←N 𝐲 +𝑤. 3. Normalize counter N to get estimation 𝐏 (𝐘|𝐞) Fei Fang 12/8/2018

Get a sample together with a weight in likelihood weighting
Select a topological ordering of variables 𝑋 1 ,…, 𝑋 𝑛 Set 𝑤=1 For 𝑖=1..𝑛 If 𝑋 𝑖 is an evidence variable with value 𝑒 𝑖 𝑤←𝑤×𝑃 𝑋 𝑖 = 𝑒 𝑖 𝑃𝑎𝑟𝑒𝑛𝑡𝑠( 𝑋 𝑖 ) Else 𝑥 𝑖 ←𝑠𝑎𝑚𝑝𝑙𝑒 𝑓𝑟𝑜𝑚 𝐏 𝑋 𝑖 𝑃𝑎𝑟𝑒𝑛𝑡𝑠( 𝑋 𝑖 ) Fei Fang 12/8/2018

Example: Wet Grass Use the ordering 𝐶,𝑆,𝑅,𝑊 Evidence: +c,+w
Get a sample: +c 0.5 -c Cloudy +c +s .1 -s .9 -c .5 +c +r .8 -r .2 -c Sprinkler Rain Wet Grass +s +r +w .99 -w .01 -r .90 .10 -s 1.0 Fei Fang 12/8/2018

Consistency of Likelihood Weighting
Likelihood Weighting is consistent: As 𝑁→∞, estimation converges to 𝐏(𝐘|𝐞) (see detailed proof in textbook) A simple case: there is only one query variable 𝑌. Let 𝐗 be set of all random variables. Let 𝐙=𝐗\𝐄 be the set of non-evidence variables. Let 𝑚=|𝐸| and 𝑙=|𝑍|. Let 𝐔=𝐙\{𝐄∪𝐘}. Fei Fang 12/8/2018

Quiz 2 Using likelihood weighting, if we get 100 samples with +r and total weight 1, and 400 samples with −r and total weight 2, what is an estimate of 𝑃(𝑅=+𝑟|+w)? A: 1 9 B: 1 3 C: 1 5 D: 1 6 +c 0.5 -c Cloudy +c +s .1 -s .9 -c .5 +c +r .8 -r .2 -c Sprinkler Rain Wet Grass +s +r +w .99 -w .01 -r .90 .10 -s 1.0 Fei Fang 12/8/2018

Markov Chain Simulation
Recap: Direct sampling methods (including rejection sampling and likelihood weighting) generate each new sample from scratch Markov chain Monte Carlo (MCMC): Generate a new sample by making a random change to the preceding sample Recall: simulated annealing (also can be seen as a member of MCMC family) Fei Fang 12/8/2018

Gibbs Sampling A particular form of MCMC suited for Bayes Net
Given a sample 𝐱 (consistent with evidence 𝐞), simulate a new sample 𝐱′ by randomly sampling a value for one of the nonevidence variable 𝑋 𝑖 Sampling for 𝑋 𝑖 is conditioned on the current values of the variables in the Markov blanket of 𝑋 𝑖 Start from an initial sample, iteratively generating new samples. After generating enough samples, answer the query How to choose the nonevidence variable? Option 1: Go through all the nonevidence variable in turn based on an arbitrary order (not necessarily a topological order) Option 2: Randomly choose a non-evidence variable uniformly randomly Markov blanket=parents + children + children’s parents Fei Fang 12/8/2018

Example: Wet Grass Evidence: +s,+w.
Order of non-evidence variables: 𝐶,𝑅. Initial sample (+c,+s,−r,+w) Sample 𝐶 Get −c. Get new sample (−c,+s,−r,+w) Sample 𝑅 Get +r. Get new sample (−c,+s,+r,+w) Get −c and sample −c,+s,+r,+w Get −r and sample −c,+s,−r,+w …. Cloudy Sprinkler Rain Wet Grass How to compute 𝑃(𝐶|+s,−r) and 𝑃(𝑅|−c,+w,+s)? Exact inference! Fei Fang 12/8/2018

Compute 𝑃(𝐶|+s,−r) through exact inference
Example: Wet Grass Compute 𝑃(𝐶|+s,−r) through exact inference +c 0.5 -c Cloudy +c +s .1 -s .9 -c .5 +c +r .8 -r .2 -c Sprinkler Rain Wet Grass +s +r +w .99 -w .01 -r .90 .10 -s 1.0 Fei Fang 12/8/2018

What is NOT a possible next sample when using Gibbs sampling?
Quiz 3 Evidence: +s,+w. Initial sample (+c,+s,−r,+w). Current sample (−c,+s,+r,+w) What is NOT a possible next sample when using Gibbs sampling? A: (−c,+s,+r,+w) B: (+c,+s,+r,+w) C:(−c,+s,−r,+w) D:(−c,−s,+r,+w) +c 0.5 -c Cloudy +c +s .1 -s .9 -c .5 +c +r .8 -r .2 -c Sprinkler Rain Wet Grass +s +r +w .99 -w .01 -r .90 .10 -s 1.0 Fei Fang 12/8/2018

Following this process and get 100 samples
Example: Wet Grass Following this process and get 100 samples Assume you get 31 samples with +r, 69 with −r What is 𝑃(𝑅|+s,+w)? Fei Fang 12/8/2018

Gibbs Sampling How many samples are generated in total? Fei Fang
12/8/2018

Why Gibbs Sampling works?
The sampling process can be seen as a Markov Chain State 𝐱: An assignment to all variables, must be consistent with evidence 𝐞 Transition probability 𝑃( 𝐱 ′ |𝐱): decided by the ordering of choosing nonevidence variable, and the conditional probability defined by the Bayes’ Net. Can only transit to “neighboring” states, i.e., states with at most one variable being different Fei Fang 12/8/2018

Example: Wet Grass From G. Mori Fei Fang 12/8/2018

Why Gibbs Sampling works?
The sampling process can be seen as a Markov Chain The sampling process settles into a dynamic equilibrium and long-run fraction of time spent in each state is exactly proportional to its posterior probability (See detailed proof in textbook) Gibbs sampling is consistent: Converges to true posterior distribution when #sampled→∞ Fei Fang 12/8/2018

Approximate Inference in Bayes Net
Summary Approximate Inference in Bayes Net Direct (Forward) Sampling Reject Sampling Likelihood Weighting Markov chain simulation Gibbs Sampling Fei Fang 12/8/2018

Some slides are borrowed from previous slides made by Tai Sing Lee
Acknowledgment Some slides are borrowed from previous slides made by Tai Sing Lee Fei Fang 12/8/2018

Instructors: Fei Fang (This Lecture) and Dave Touretzky

Similar presentations

Presentation on theme: "Instructors: Fei Fang (This Lecture) and Dave Touretzky"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Instructors: Fei Fang (This Lecture) and Dave Touretzky

Similar presentations

Presentation on theme: "Instructors: Fei Fang (This Lecture) and Dave Touretzky"— Presentation transcript:

Similar presentations

About project

Feedback