CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.
Computer Science CPSC 322 Lecture 3 AI Applications.
Decision Theory: Sequential Decisions Computer Science cpsc322, Lecture 34 (Textbook Chpt 9.3) Nov, 28, 2012.
Statistical Inferences Based on Two Samples
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Local Search Jim Little UBC CS 322 – CSP October 3, 2014 Textbook §4.8
CPSC 322, Lecture 14Slide 1 Local Search Computer Science cpsc322, Lecture 14 (Textbook Chpt 4.8) Oct, 5, 2012.
January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.
1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,
A Tutorial on Learning with Bayesian Networks
Lirong Xia Hidden Markov Models Tue, March 28, 2014.
Probabilistic Reasoning (2)
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
CPSC 322, Lecture 35Slide 1 Finish VE for Sequential Decisions & Value of Information and Control Computer Science cpsc322, Lecture 35 (Textbook Chpt 9.4)
Bayesian network inference
CPSC 322, Lecture 30Slide 1 Reasoning Under Uncertainty: Variable elimination Computer Science cpsc322, Lecture 30 (Textbook Chpt 6.4) March, 23, 2009.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
Announcements Homework 8 is out Final Contest (Optional)
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Department of Computer Science Undergraduate Events More
CPSC 322, Lecture 29Slide 1 Reasoning Under Uncertainty: Bnet Inference (Variable elimination) Computer Science cpsc322, Lecture 29 (Textbook Chpt 6.4)
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Department of Computer Science Undergraduate Events More
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
INC 551 Artificial Intelligence Lecture 8 Models of Uncertainty.
CS 188: Artificial Intelligence Fall 2006 Lecture 18: Decision Diagrams 10/31/2006 Dan Klein – UC Berkeley.
CPSC 322, Lecture 28Slide 1 More on Construction and Compactness: Compact Conditional Distributions Once we have established the topology of a Bnet, we.
UBC Department of Computer Science Undergraduate Events More
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
Advanced Artificial Intelligence Lecture 5: Probabilistic Inference.
Inference Algorithms for Bayes Networks
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
CSE 473: Artificial Intelligence Autumn 2011 Bayesian Networks: Inference Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart.
CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapter 14 Lecture 15 Uncertainty Chapter 14.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
QUIZ!!  T/F: You can always (theoretically) do BNs inference by enumeration. TRUE  T/F: In VE, always first marginalize, then join. FALSE  T/F: VE is.
Bayesian Networks: Sampling Algorithms for Approximate Inference.
Reasoning Under Uncertainty: Belief Networks
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Inference in Bayesian Networks
Artificial Intelligence
CS 4/527: Artificial Intelligence
Exact Inference Continued
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11
CAP 5636 – Advanced Artificial Intelligence
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11
Advanced Artificial Intelligence
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Reasoning Under Uncertainty: Bnet Inference
Pattern Recognition and Image Analysis
Reasoning Under Uncertainty: Bnet Inference
CS 188: Artificial Intelligence
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11
CS 188: Artificial Intelligence Fall 2008
Hidden Markov Models Lirong Xia.
CS 188: Artificial Intelligence Fall 2007
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Reasoning Under Uncertainty: Variable elimination
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Presentation transcript:

CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014

CPSC 422, Lecture 112 Lecture Overview Recap of BNs Representation and Exact Inference Start Belief Networks Approx. Reasoning Intro to Sampling First Naïve Approx. Method: Forward Sampling Second Method: Rejection Sampling

CPSC 322, Lecture 26Slide 3 Realistic BNet: Liver Diagnosis Source: Onisko et al., 1999

Revise (in)dependencies…… CPSC 422, Lecture 11Slide 4

Independence (Markov Blanket) CPSC 422, Lecture 11 Slide 5 What is the minimal set of nodes that must be observed in order to make node X independent from all the non-observed nodes in the network

Independence (Markov Blanket) A node is conditionally independent from all the other nodes in the network, given its parents, children, and children’s parents (i.e., its Markov Blanket ) Configuration B CPSC 422, Lecture 11Slide 6

CPSC 322, Lecture 10Slide 7 Variable elimination algorithm: Summary To compute P(Z| Y 1 =v 1,…,Y j =v j ) : 1.Construct a factor for each conditional probability. 2.Set the observed variables to their observed values. 3.Given an elimination ordering, simplify/decompose sum of products 4.Perform products and sum out Z i 5.Multiply the remaining factors (all in ? ) 6.Normalize: divide the resulting factor f(Z) by  Z f(Z). P(Z, Y 1 …,Y j, Z 1 …,Z j ) Z

CPSC 322, Lecture 30Slide 8 Variable elimination ordering P(G,D=t) =  A,B,C, f(A,G) f(B,A) f(C,G,A) f(B,C) P(G,D=t) =  A f(A,G)  B f(B,A)  C f(C,G,A) f(B,C) P(G,D=t) =  A f(A,G)  C f(C,G,A)  B f(B,C) f(B,A) Is there only one way to simplify?

CPSC 322, Lecture 30Slide 9 Complexity: Just Intuition….. Treewidth of a network given an elimination ordering: max number of variables in a factor created by summing out a variable. Treewidth of a belief network : min treewidth over all elimination orderings (only on the graph structure and is a measure of the sparseness of the graph) The complexity of VE is exponential in the treewidth  and linear in the number of variables. Also, finding the elimination ordering with minimum treewidth is NP-hard  (but there are some good elimination ordering heuristics)

CPSC 422, Lecture 1110 Lecture Overview Recap of BNs Representation and Exact Inference Start Belief Networks Approx. Reasoning Intro to Sampling First Naïve Approx. Method: Forward Sampling Second Method: Rejection Sampling

Approximate Inference Basic idea: Draw N samples from known prob. distributions Use those samples to estimate unknown prob. distributions Why sample? Inference: getting a sample is faster than computing the right answer (e.g. with variable elimination) 11CPSC 422, Lecture 11

Sampling: What is it? Idea: Estimate probabilities from sample data (samples) of the (unknown) probabilities distribution Use frequency of each event in the sample data to approximate its probability  Frequencies are good approximations only if based on large samples we will see why and what “large” means  How do we get the samples?

We use Sampling Sampling is a process to obtain samples adequate to estimate an unknown probability Known prob. distribution(s) Estimates for unknown (hard to compute) distribution(s) Samples

Sampling The building block on any sampling algorithm is the generation of samples from a know distribution We then use these samples to derive estimates of hard-to-compute probabilities

Generating Samples from a Known Distribution For a random variable X with values {x 1,…,x k } Probability distribution P(X) = {P(x 1 ),…,P(x k )} Partition the interval (0, 1] into k intervals p i, one for each x i, with length P(x i ) To generate one sample Randomly generate a value y in (0, 1] (i.e. generate a value from a uniform distribution over (0, 1]. Select the value of the sample based on the interval p i that includes y From probability theory:

From Samples to Probabilities Count total number of samples m Count the number n i of samples x i Generate the frequency of sample x i as n i / m This frequency is your estimated probability of x i

Sampling for Bayesian Networks  OK, but how can we use all this for probabilistic inference in Bayesian networks?  As we said earlier, if we can’t use exact algorithms to update the network, we need to resort to samples and frequencies to compute the probabilities we are interested in  We generate these samples by relying on the mechanism we just described

Sampling for Bayesian Networks (N)  Suppose we have the following BN with two binary variables  It corresponds to the (unknown) joint probability distribution P(A,B) =P(B|A)P(A)  To sample from this distribution we first sample from P(A). Suppose we get A = 0. In this case, we then sample from…. If we had sampled A = 1, then in the second step we would have sampled from A B 0.3 P(A=1)‏ P(B=1|A)‏A

Prior (Forward) Sampling Cloudy Sprinkler Rain WetGrass Cloudy Sprinkler Rain WetGrass 19 +c0.5 -c0.5 +c +s0.1 -s0.9 -c +s0.5 -s0.5 +c +r0.8 -r0.2 -c +r0.2 -r0.8 +s +r +w0.99 -w0.01 -r +w0.90 -w0.10 -s +r +w0.90 -w0.10 -r +w0.01 -w0.99 Samples: +c, -s, +r, +w -c, +s, -r, +w … CPSC 422, Lecture 11

Example We’ll get a bunch of samples from the BN: +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w If we want to know P(W) We have counts Normalize to get P(W) = This will get closer to the true distribution with more samples 20 CPSC 422, Lecture 11 Cloudy Sprinkler Rain WetGrass

Example Can estimate anything else from the samples, besides P(W), P(R), etc : +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w What about P(C| +w)? P(C| +r, +w)? P(C| -r, -w)? 21 CPSC 422, Lecture 11 Cloudy Sprinkler Rain WetGrass Can use/generate fewer samples when we want to estimate a probability conditioned on evidence?

Rejection Sampling Let’s say we want P(S) No point keeping all samples around Just tally counts of S as we go Let’s say we want P(S| +w) Same thing: tally S outcomes, but ignore (reject) samples which don’t have W=+w This is called rejection sampling It is also consistent for conditional probabilities (i.e., correct in the limit) +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w 22CPSC 422, Lecture 11 C S R W

Hoeffding’s inequality  Suppose p is the true probability and s is the sample average from n independent samples.  p above can be the probability of any event for random variable X = {X 1,…X n } described by a Bayesian network  If you want an infinitely small probability of having an error greater than ε, you need infinitely many samples  But if you settle on something less than infinitely small, let’s say δ, then you just need to set  So you pick the error ε you can tolerate, the frequency δ with which you can tolerate it  And solve for n, i.e., the number of samples that can ensure this performance (1)‏

Hoeffding’s inequality  Examples: You can tolerate an error greater than 0.1 only in 5% of your cases Set ε =0.1, δ = 0.05 Equation (1) gives you n > 184  If you can tolerate the same error (0.1) only in 1% of the cases, then you need 265 samples  If you want an error of 0.01 in no more than 5% of the cases, you need 18,445 samples

CPSC 422, Lecture 11Slide 25 Learning Goals for today’s class  You can: Describe and compare Sampling from a single random variable Describe and Apply Forward Sampling in BN Describe and Apply Rejection Sampling Apply Hoeffding's inequality to compute number of samples needed

CPSC 422, Lecture 11Slide 26 TODO for Fri Read textbook Keep working on assignment-1 Next research paper likely to be next Mon

Rejection Sampling Let’s say we want P(C) No point keeping all samples around Just tally counts of C as we go Let’s say we want P(C| +s) Same thing: tally C outcomes, but ignore (reject) samples which don’t have S=+s This is called rejection sampling It is also consistent for conditional probabilities (i.e., correct in the limit) +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w 27CPSC 422, Lecture 11 C S R W

Prior Sampling This process generates samples with probability: …i.e. the BN’s joint probability Let the number of samples of an event be Then I.e., the sampling procedure is consistent 28CPSC 422, Lecture 11