Instructors: Fei Fang (This Lecture) and Dave Touretzky

Slides:



Advertisements
Similar presentations
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
Advertisements

Exact Inference in Bayes Nets
Homework 3: Naive Bayes Classification
Probabilistic Reasoning (2)
Inference in Bayesian Nets
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
10/22  Homework 3 returned; solutions posted  Homework 4 socket opened  Project 3 assigned  Mid-term on Wednesday  (Optional) Review session Tuesday.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Announcements Homework 8 is out Final Contest (Optional)
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Bayesian Networks Textbook: Probabilistic Reasoning, Sections 1-2, pp
Approximate Inference 2: Monte Carlo Markov Chain
Bayesian networks Chapter 14. Outline Syntax Semantics.
Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:
CS 188: Artificial Intelligence Fall 2006 Lecture 18: Decision Diagrams 10/31/2006 Dan Klein – UC Berkeley.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Inference Algorithms for Bayes Networks
Quick Warm-Up Suppose we have a biased coin that comes up heads with some unknown probability p; how can we use it to produce random bits with probabilities.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapter 14 Lecture 15 Uncertainty Chapter 14.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
Probabilistic Reasoning Inference and Relational Bayesian Networks.
QUIZ!!  T/F: You can always (theoretically) do BNs inference by enumeration. TRUE  T/F: In VE, always first marginalize, then join. FALSE  T/F: VE is.
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
CS 416 Artificial Intelligence Lecture 16 Uncertainty Chapter 14 Lecture 16 Uncertainty Chapter 14.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Oliver Schulte Machine Learning 726
CS 541: Artificial Intelligence
MCMC Output & Metropolis-Hastings Algorithm Part I
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Inference in Bayesian Networks
CS b553: Algorithms for Optimization and Learning
Bayes Net Learning: Bayesian Approaches
Approximate Inference Methods
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Artificial Intelligence
CS 4/527: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Markov Networks.
Markov Chain Monte Carlo
CSCI 5822 Probabilistic Models of Human and Machine Learning
Advanced Artificial Intelligence
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
CS 188: Artificial Intelligence Fall 2008
Class #16 – Tuesday, October 26
Hidden Markov Models Lirong Xia.
Approximate Inference by Sampling
CS 188: Artificial Intelligence Fall 2007
Quick Warm-Up Suppose we have a biased coin that comes up heads with some unknown probability p; how can we use it to produce random bits with probabilities.
Markov Networks.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Presentation transcript:

Instructors: Fei Fang (This Lecture) and Dave Touretzky Artificial Intelligence: Representation and Problem Solving Probabilistic Reasoning (3): Sampling Methods 15-381 / 681 Instructors: Fei Fang (This Lecture) and Dave Touretzky feifang@cmu.edu Wean Hall 4126 12/8/2018

Probability Models and Probabilistic Inference Basics of Bayes’ Net Recap Probability Models and Probabilistic Inference Basics of Bayes’ Net Independence Exact Inference Real problems: very large network, hard to compute conditional probabilities exactly (computationally expensive) Today: Sampling methods for approximate inference Fei Fang 12/8/2018

Recap Law of large numbers (LLN): the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed 𝑋 1 , 𝑋 2 ,…, 𝑋 𝑛 are i.i.d. Lebesgue integrable random variables with expected value 𝐸 𝑋 𝑖 =𝜇,∀𝑖. Then the sample average 𝑋 𝑛 = 1 𝑛 𝑋 1 +…+ 𝑋 𝑛 converges to the expected value, i.e., 𝑋 𝑛 →𝜇 for 𝑛→∞ Weak law: lim 𝑛→∞ Pr 𝑋 𝑛 −𝜇 >𝜖 =0,∀𝜖>0 Strong law: Pr lim 𝑛→∞ 𝑋 𝑛 =𝜇 =1 Express the quantity we want to know as the expected value of a random variable 𝑋, i.e., 𝜇=𝐸(𝑋). Generate sample values 𝑋 1 ,…, 𝑋 𝑛 (which are i.i.d random variables) and the sample average should converge to 𝜇 Independent and identically distributed Fei Fang 12/8/2018

Recap Bag 1: two gold coins. Bag 2: two pennies. Bag 3: one of each. Bag is chosen at random, and one coin from it is selected at random; the coin is gold What is the probability that the other coin is gold given the observation? Defines the probability model This is the evidence Try 3000 times. P(other coin is gold | first coin is gold) ≈ #(first coin is gold and other coin is gold) #(first coin is gold) Fei Fang 12/8/2018

Approximate Inference Direct Sampling Markov Chain simulation Outline Approximate Inference Direct Sampling Markov Chain simulation Fei Fang 12/8/2018

Approximate Inference in BN For inference in BN, often interested in: Posterior probability of taking on any value given some evidence: 𝐏(𝐘|𝐞) Most likely explanation given some evidence: argmax 𝐲 𝑃(𝐘=𝐲|𝐞) Exact inference is intractable in large networks Enumeration Variable Elimination Approximate inference in BN through sampling If we can get samples of 𝐘 from the posterior distribution 𝐏(𝐘|𝐞), we can use these samples to approximate posterior distribution and/or most likely explanation (based on LLN) Fei Fang 12/8/2018

Approximate Inference in BN Approximate inference in BN through sampling Assume we have some method for generating samples given a known probability distribution (e.g., uniform in [0,1]) A sample is an assignment of values to each variable in the network Generally only interested in query variables after finish sampling Use samples to approximately compute posterior probabilities Queries can be issued after finish sampling Fei Fang 12/8/2018

Example: Wet Grass Variables: 𝐶𝑙𝑜𝑢𝑑𝑦(𝐶), 𝑆𝑝𝑟𝑖𝑛𝑘𝑙𝑒𝑟(𝑆), 𝑅𝑎𝑖𝑛(𝑅), 𝑊𝑒𝑡 𝐺𝑟𝑎𝑠𝑠(𝑊) Domain of each variable: 𝑡𝑟𝑢𝑒 (+) or 𝑓𝑎𝑙𝑠𝑒 (−) What is the probability distribution of 𝐶𝑙𝑜𝑢𝑑𝑦 if you know the sprinkler has been turned on and it rained, i.e., 𝑃(𝐶|+s,+r)? Samples of 𝐶𝑙𝑜𝑢𝑑𝑦 given +𝑠 & +𝑟: +, −, +, +, −, +, +, +, +, − 𝑃 +c +s,+r ≈0.7 𝑃 −c +s,+r ≈0.3 +c 0.5 -c Cloudy +c +s .1 -s .9 -c .5 +c +r .8 -r .2 -c Sprinkler Rain Wet Grass +s +r +w .99 -w .01 -r .90 .10 -s 1.0 Fei Fang 12/8/2018

Sampling for a Single Variable Want to sample values of a random variable 𝐶 whose domain is {𝑡𝑟𝑢𝑒, 𝑓𝑎𝑙𝑠𝑒}, with probability distribution 〈0.5,0.5〉 Simple approach 𝑟 = rand([0,1]) If (𝑟<0.5) Sample= +c (𝐶=𝑡𝑟𝑢𝑒) Else Sample = −c (𝐶=𝑓𝑎𝑙𝑠𝑒) 𝐏(𝐶) +c 0.5 -c Fei Fang 12/8/2018

Sampling with Condition Want to sample values of a random variable 𝑆 when −c (𝐶=𝑓𝑎𝑙𝑠𝑒) Find the corresponding rows in the conditional probability table (CPT): 𝐏 𝑆 −c =〈0.5,0.5〉 Simple approach 𝑟 = rand([0,1]) If (𝑟<0.5) Sample= +s (𝑆=𝑡𝑟𝑢𝑒) Else Sample = −s (𝑆=𝑓𝑎𝑙𝑠𝑒) 𝐏(𝑆|𝐶) +c +s 0.90 -s 0.10 -c 0.5 Fei Fang 12/8/2018

Direct Sampling (Forward Sampling) Directly generate samples from prior distribution and conditional distribution specified by Bayes’ Net (i.e., without considering any evidence) Create a topological ordering based on the DAG of Bayes’ Net A node can only appear after all of its ancestors in the graph Sample each variable in turn, conditioned on the values of its parents Fei Fang 12/8/2018

Valid topological ordering: Example: Wet Grass Valid topological ordering: 𝐶,𝑆,𝑅,𝑊 𝐶,𝑅,𝑆,𝑊 Use the ordering 𝐶,𝑆,𝑅,𝑊, sample variables in turn +c 0.5 -c Cloudy +c +s .1 -s .9 -c .5 +c +r .8 -r .2 -c Sprinkler Rain Wet Grass +s +r +w .99 -w .01 -r .90 .10 -s 1.0 Fei Fang 12/8/2018

Estimate Probability We can use the samples to estimate probability of an event 𝑋 1 = 𝑥 1 ,…, 𝑋 𝑛 = 𝑥 𝑛 𝑃 𝑋 1 = 𝑥 1 ,…, 𝑋 𝑛 = 𝑥 𝑛 ≈ 𝑁 𝑃𝑆 ( 𝑥 1 ,…, 𝑥 𝑛 ) 𝑁 When #𝑠𝑎𝑚𝑝𝑙𝑒→∞, the approximation becomes exact (called consistent) Fei Fang 12/8/2018

Example: Wet Grass What is 𝑃 +c,−s,+r,+w ? Sample 10000 times according to this sampling procedure, if you get (+c,−s,+r,+w) 𝐾 times, then 𝑃 +c,−s,+r,+w ≈ 𝐾 10000 +c 0.5 -c Cloudy +c +s .1 -s .9 -c .5 +c +r .8 -r .2 -c Sprinkler Rain Wet Grass +s +r +w .99 -w .01 -r .90 .10 -s 1.0 Fei Fang 12/8/2018

Reject Sampling If the query is conditional (posterior) probability 𝐏 𝐘 𝐞 , then need to “reject” the direct samples that do not match the evidence 𝐏 𝐘 𝐞 ≈ 𝐍 𝑃𝑆 𝐘,𝐞 𝑁 𝑃𝑆 𝐞 Inefficient  Fei Fang 12/8/2018

What is a reasonable estimate of 𝑃 +s +r ? Quiz 1 You get 100 direct samples 73 have −s, of which 12 have +r 27 have +s, of which 8 have +r What is a reasonable estimate of 𝑃 +s +r ? A: 20 100 B: 12 73 C: 8 27 D: None of the above +c 0.5 -c Cloudy +c +s .1 -s .9 -c .5 +c +r .8 -r .2 -c Sprinkler Rain Wet Grass +s +r +w .99 -w .01 -r .90 .10 -s 1.0 Fei Fang 12/8/2018

Likelihood Weighting Likelihood Weighting Generate only samples that agree with evidence and weight them according to likelihood of evidence More efficient than reject sampling A particular instance of the general statistical technique of importance sampling Likelihood Weighting 1. Initialize counter N for query variables 𝐘 to be 𝟎 2. Run 𝑁 iterations. In each iteration, get a sample 𝐱=(𝐲,𝐞,⋅) of all variables that is consistent with evidence 𝐞, together with a weight 𝑤. Set N 𝐲 ←N 𝐲 +𝑤. 3. Normalize counter N to get estimation 𝐏 (𝐘|𝐞) Fei Fang 12/8/2018

Get a sample together with a weight in likelihood weighting Select a topological ordering of variables 𝑋 1 ,…, 𝑋 𝑛 Set 𝑤=1 For 𝑖=1..𝑛 If 𝑋 𝑖 is an evidence variable with value 𝑒 𝑖 𝑤←𝑤×𝑃 𝑋 𝑖 = 𝑒 𝑖 𝑃𝑎𝑟𝑒𝑛𝑡𝑠( 𝑋 𝑖 ) Else 𝑥 𝑖 ←𝑠𝑎𝑚𝑝𝑙𝑒 𝑓𝑟𝑜𝑚 𝐏 𝑋 𝑖 𝑃𝑎𝑟𝑒𝑛𝑡𝑠( 𝑋 𝑖 ) Fei Fang 12/8/2018

Example: Wet Grass Use the ordering 𝐶,𝑆,𝑅,𝑊 Evidence: +c,+w Get a sample: +c 0.5 -c Cloudy +c +s .1 -s .9 -c .5 +c +r .8 -r .2 -c Sprinkler Rain Wet Grass +s +r +w .99 -w .01 -r .90 .10 -s 1.0 Fei Fang 12/8/2018

Consistency of Likelihood Weighting Likelihood Weighting is consistent: As 𝑁→∞, estimation converges to 𝐏(𝐘|𝐞) (see detailed proof in textbook) A simple case: there is only one query variable 𝑌. Let 𝐗 be set of all random variables. Let 𝐙=𝐗\𝐄 be the set of non-evidence variables. Let 𝑚=|𝐸| and 𝑙=|𝑍|. Let 𝐔=𝐙\{𝐄∪𝐘}. Fei Fang 12/8/2018

Quiz 2 Using likelihood weighting, if we get 100 samples with +r and total weight 1, and 400 samples with −r and total weight 2, what is an estimate of 𝑃(𝑅=+𝑟|+w)? A: 1 9 B: 1 3 C: 1 5 D: 1 6 +c 0.5 -c Cloudy +c +s .1 -s .9 -c .5 +c +r .8 -r .2 -c Sprinkler Rain Wet Grass +s +r +w .99 -w .01 -r .90 .10 -s 1.0 Fei Fang 12/8/2018

Markov Chain Simulation Recap: Direct sampling methods (including rejection sampling and likelihood weighting) generate each new sample from scratch Markov chain Monte Carlo (MCMC): Generate a new sample by making a random change to the preceding sample Recall: simulated annealing (also can be seen as a member of MCMC family) Fei Fang 12/8/2018

Gibbs Sampling A particular form of MCMC suited for Bayes Net Given a sample 𝐱 (consistent with evidence 𝐞), simulate a new sample 𝐱′ by randomly sampling a value for one of the nonevidence variable 𝑋 𝑖 Sampling for 𝑋 𝑖 is conditioned on the current values of the variables in the Markov blanket of 𝑋 𝑖 Start from an initial sample, iteratively generating new samples. After generating enough samples, answer the query How to choose the nonevidence variable? Option 1: Go through all the nonevidence variable in turn based on an arbitrary order (not necessarily a topological order) Option 2: Randomly choose a non-evidence variable uniformly randomly Markov blanket=parents + children + children’s parents Fei Fang 12/8/2018

Example: Wet Grass Evidence: +s,+w. Order of non-evidence variables: 𝐶,𝑅. Initial sample (+c,+s,−r,+w) Sample 𝐶 Get −c. Get new sample (−c,+s,−r,+w) Sample 𝑅 Get +r. Get new sample (−c,+s,+r,+w) Get −c and sample −c,+s,+r,+w Get −r and sample −c,+s,−r,+w …. Cloudy Sprinkler Rain Wet Grass How to compute 𝑃(𝐶|+s,−r) and 𝑃(𝑅|−c,+w,+s)? Exact inference! Fei Fang 12/8/2018

Compute 𝑃(𝐶|+s,−r) through exact inference Example: Wet Grass Compute 𝑃(𝐶|+s,−r) through exact inference +c 0.5 -c Cloudy +c +s .1 -s .9 -c .5 +c +r .8 -r .2 -c Sprinkler Rain Wet Grass +s +r +w .99 -w .01 -r .90 .10 -s 1.0 Fei Fang 12/8/2018

What is NOT a possible next sample when using Gibbs sampling? Quiz 3 Evidence: +s,+w. Initial sample (+c,+s,−r,+w). Current sample (−c,+s,+r,+w) What is NOT a possible next sample when using Gibbs sampling? A: (−c,+s,+r,+w) B: (+c,+s,+r,+w) C:(−c,+s,−r,+w) D:(−c,−s,+r,+w) +c 0.5 -c Cloudy +c +s .1 -s .9 -c .5 +c +r .8 -r .2 -c Sprinkler Rain Wet Grass +s +r +w .99 -w .01 -r .90 .10 -s 1.0 Fei Fang 12/8/2018

Following this process and get 100 samples Example: Wet Grass Following this process and get 100 samples Assume you get 31 samples with +r, 69 with −r What is 𝑃(𝑅|+s,+w)? Fei Fang 12/8/2018

Gibbs Sampling How many samples are generated in total? Fei Fang 12/8/2018

Why Gibbs Sampling works? The sampling process can be seen as a Markov Chain State 𝐱: An assignment to all variables, must be consistent with evidence 𝐞 Transition probability 𝑃( 𝐱 ′ |𝐱): decided by the ordering of choosing nonevidence variable, and the conditional probability defined by the Bayes’ Net. Can only transit to “neighboring” states, i.e., states with at most one variable being different Fei Fang 12/8/2018

Example: Wet Grass From G. Mori Fei Fang 12/8/2018

Why Gibbs Sampling works? The sampling process can be seen as a Markov Chain The sampling process settles into a dynamic equilibrium and long-run fraction of time spent in each state is exactly proportional to its posterior probability (See detailed proof in textbook) Gibbs sampling is consistent: Converges to true posterior distribution when #sampled→∞ Fei Fang 12/8/2018

Approximate Inference in Bayes Net Summary Approximate Inference in Bayes Net Direct (Forward) Sampling Reject Sampling Likelihood Weighting Markov chain simulation Gibbs Sampling Fei Fang 12/8/2018

Some slides are borrowed from previous slides made by Tai Sing Lee Acknowledgment Some slides are borrowed from previous slides made by Tai Sing Lee Fei Fang 12/8/2018