CS 188: Artificial Intelligence Fall 2009 Lecture 17: Bayes Nets IV 10/27/2009 Dan Klein – UC Berkeley.

Slides:



Advertisements
Similar presentations
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
Advertisements

Making Simple Decisions Chapter 16 Some material borrowed from Jean-Claude Latombe and Daphne Koller by way of Marie desJadines,
Making Simple Decisions
Lirong Xia Hidden Markov Models Tue, March 28, 2014.
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Advanced Artificial Intelligence
QUIZ!!  T/F: Rejection Sampling without weighting is not consistent. FALSE  T/F: Rejection Sampling (often) converges faster than Forward Sampling. FALSE.
CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.
CS 188: Artificial Intelligence Spring 2009 Lecture 15: Bayes’ Nets II -- Independence 3/10/2009 John DeNero – UC Berkeley Slides adapted from Dan Klein.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Announcements Homework 8 is out Final Contest (Optional)
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
CS 188: Artificial Intelligence Fall 2009 Lecture 14: Bayes’ Nets 10/13/2009 Dan Klein – UC Berkeley.
CS 188: Artificial Intelligence Fall 2009 Lecture 18: Decision Diagrams 10/29/2009 Dan Klein – UC Berkeley.
CMSC 671 Fall 2003 Class #26 – Wednesday, November 26 Russell & Norvig 16.1 – 16.5 Some material borrowed from Jean-Claude Latombe and Daphne Koller by.
Announcements  Midterm 1  Graded midterms available on pandagrader  See Piazza for post on how fire alarm is handled  Project 4: Ghostbusters  Due.
Bayesian Networks Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Estimation and Hypothesis Testing. The Investment Decision What would you like to know? What will be the return on my investment? Not possible PDF for.
CS 188: Artificial Intelligence Fall 2008 Lecture 18: Decision Diagrams 10/30/2008 Dan Klein – UC Berkeley 1.
QUIZ!!  T/F: The forward algorithm is really variable elimination, over time. TRUE  T/F: Particle Filtering is really sampling, over time. TRUE  T/F:
Recap: Reasoning Over Time  Stationary Markov models  Hidden Markov models X2X2 X1X1 X3X3 X4X4 rainsun X5X5 X2X2 E1E1 X1X1 X3X3 X4X4 E2E2 E3E3.
INC 551 Artificial Intelligence Lecture 8 Models of Uncertainty.
Making Simple Decisions
CS 188: Artificial Intelligence Fall 2008 Lecture 19: HMMs 11/4/2008 Dan Klein – UC Berkeley 1.
CS 188: Artificial Intelligence Fall 2006 Lecture 18: Decision Diagrams 10/31/2006 Dan Klein – UC Berkeley.
QUIZ!!  T/F: Forward sampling is consistent. True  T/F: Rejection sampling is faster, but inconsistent. False  T/F: Rejection sampling requires less.
UBC Department of Computer Science Undergraduate Events More
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Announcements Project 4: Ghostbusters Homework 7
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
Advanced Artificial Intelligence Lecture 5: Probabilistic Inference.
Inference Algorithms for Bayes Networks
Quick Warm-Up Suppose we have a biased coin that comes up heads with some unknown probability p; how can we use it to produce random bits with probabilities.
CSE 473: Artificial Intelligence Autumn 2011 Bayesian Networks: Inference Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart.
CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapter 14 Lecture 15 Uncertainty Chapter 14.
CS 188: Artificial Intelligence Spring 2009 Lecture 20: Decision Networks 4/2/2009 John DeNero – UC Berkeley Slides adapted from Dan Klein.
QUIZ!!  T/F: You can always (theoretically) do BNs inference by enumeration. TRUE  T/F: In VE, always first marginalize, then join. FALSE  T/F: VE is.
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
CS 188: Artificial Intelligence Spring 2006
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
CS b553: Algorithms for Optimization and Learning
From last time: on-policy vs off-policy Take an action Observe a reward Choose the next action Learn (using chosen action) Take the next action Off-policy.
Bayes Net Learning: Bayesian Approaches
Artificial Intelligence
CS 4/527: Artificial Intelligence
CS 4/527: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Probability Topics Random Variables Joint and Marginal Distributions
Advanced Artificial Intelligence
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2008
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11
CS 188: Artificial Intelligence Fall 2008
Hidden Markov Models Lirong Xia.
CS 188: Artificial Intelligence Fall 2007
Quick Warm-Up Suppose we have a biased coin that comes up heads with some unknown probability p; how can we use it to produce random bits with probabilities.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

CS 188: Artificial Intelligence Fall 2009 Lecture 17: Bayes Nets IV 10/27/2009 Dan Klein – UC Berkeley

Announcements  Midterms  In glookup, back at end of lecture  Class did really well!  Assignments  W3 out later this week  P4 out next week  Contest! 2

Approximate Inference 3

 Simulation has a name: sampling  Sampling is a hot topic in machine learning, and it’s really simple  Basic idea:  Draw N samples from a sampling distribution S  Compute an approximate posterior probability  Show this converges to the true probability P  Why sample?  Learning: get samples from a distribution you don’t know  Inference: getting a sample is faster than computing the right answer (e.g. with variable elimination) 4 S A F

Prior Sampling Cloudy Sprinkler Rain WetGrass Cloudy Sprinkler Rain WetGrass 5 +c0.5 -c0.5 +c +s0.1 -s0.9 -c +s0.5 -s0.5 +c +r0.8 -r0.2 -c +r0.2 -r0.8 +s +r +w0.99 -w0.01 -r +w0.90 -w0.10 -s +r +w0.90 -w0.10 -r +w0.01 -w0.99 Samples: +c, -s, +r, +w -c, +s, -r, +w …

Prior Sampling  This process generates samples with probability: …i.e. the BN’s joint probability  Let the number of samples of an event be  Then  I.e., the sampling procedure is consistent 6

Example  We’ll get a bunch of samples from the BN: +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w  If we want to know P(W)  We have counts  Normalize to get P(W) =  This will get closer to the true distribution with more samples  Can estimate anything else, too  What about P(C| +w)? P(C| +r, +w)? P(C| -r, -w)?  Fast: can use fewer samples if less time (what’s the drawback?) Cloudy Sprinkler Rain WetGrass C S R W 7

Rejection Sampling  Let’s say we want P(C)  No point keeping all samples around  Just tally counts of C as we go  Let’s say we want P(C| +s)  Same thing: tally C outcomes, but ignore (reject) samples which don’t have S=+s  This is called rejection sampling  It is also consistent for conditional probabilities (i.e., correct in the limit) +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w Cloudy Sprinkler Rain WetGrass C S R W 8

Sampling Example  There are 2 cups.  The first contains 1 penny and 1 quarter  The second contains 2 quarters  Say I pick a cup uniformly at random, then pick a coin randomly from that cup. It's a quarter (yes!). What is the probability that the other coin in that cup is also a quarter?

Likelihood Weighting  Problem with rejection sampling:  If evidence is unlikely, you reject a lot of samples  You don’t exploit your evidence as you sample  Consider P(B|+a)  Idea: fix evidence variables and sample the rest  Problem: sample distribution not consistent!  Solution: weight by probability of evidence given parents BurglaryAlarm BurglaryAlarm 10 -b, -a +b, +a -b +a -b, +a +b, +a

Likelihood Weighting 11 +c0.5 -c0.5 +c +s0.1 -s0.9 -c +s0.5 -s0.5 +c +r0.8 -r0.2 -c +r0.2 -r0.8 +s +r +w0.99 -w0.01 -r +w0.90 -w0.10 -s +r +w0.90 -w0.10 -r +w0.01 -w0.99 Samples: +c, +s, +r, +w … Cloudy Sprinkler Rain WetGrass Cloudy Sprinkler Rain WetGrass

Likelihood Weighting  Sampling distribution if z sampled and e fixed evidence  Now, samples have weights  Together, weighted sampling distribution is consistent Cloudy R C S W 12

Likelihood Weighting  Likelihood weighting is good  We have taken evidence into account as we generate the sample  E.g. here, W’s value will get picked based on the evidence values of S, R  More of our samples will reflect the state of the world suggested by the evidence  Likelihood weighting doesn’t solve all our problems  Evidence influences the choice of downstream variables, but not upstream ones (C isn’t more likely to get a value matching the evidence)  We would like to consider evidence when we sample every variable 13 Cloudy Rain C S R W

Markov Chain Monte Carlo*  Idea: instead of sampling from scratch, create samples that are each like the last one.  Procedure: resample one variable at a time, conditioned on all the rest, but keep evidence fixed. E.g., for P(b|c):  Properties: Now samples are not independent (in fact they’re nearly identical), but sample averages are still consistent estimators!  What’s the point: both upstream and downstream variables condition on evidence. 14 +a+c+b +a+c-b-b-a-a -b-b

Recap: Inference Example  Find P(W|F=bad)  Restrict all factors  No hidden vars to eliminate (this time!)  Just join and normalize Weather Forecast WP(W) sun0.7 rain0.3 FP(F|rain) good0.1 bad0.9 FP(F|sun) good0.8 bad0.2 WP(W) sun0.7 rain0.3 WP(F=bad|W) sun0.2 rain0.9 WP(W,F=bad) sun0.14 rain0.27 WP(W | F=bad) sun0.34 rain

Decision Networks  MEU: choose the action which maximizes the expected utility given the evidence  Can directly operationalize this with decision networks  Bayes nets with nodes for utility and actions  Lets us calculate the expected utility for each action  New node types:  Chance nodes (just like BNs)  Actions (rectangles, cannot have parents, act as observed evidence)  Utility node (diamond, depends on action and chance nodes) Weather Forecast Umbrella U 16 [DEMO: Ghostbusters]

Decision Networks  Action selection:  Instantiate all evidence  Set action node(s) each possible way  Calculate posterior for all parents of utility node, given the evidence  Calculate expected utility for each action  Choose maximizing action Weather Forecast Umbrella U 17

Example: Decision Networks Weather Umbrella U WP(W) sun0.7 rain0.3 AWU(A,W) leavesun100 leaverain0 takesun20 takerain70 Umbrella = leave Umbrella = take Optimal decision = leave

Evidence in Decision Networks  Find P(W|F=bad)  Select for evidence  First we join P(W) and P(bad|W)  Then we normalize Weather Forecast WP(W) sun0.7 rain0.3 FP(F|rain) good0.1 bad0.9 FP(F|sun) good0.8 bad0.2 WP(W) sun0.7 rain0.3 WP(F=bad|W) sun0.2 rain0.9 WP(W,F=bad) sun0.14 rain0.27 WP(W | F=bad) sun0.34 rain Umbrella U

Example: Decision Networks Weather Forecast =bad Umbrella U AWU(A,W) leavesun100 leaverain0 takesun20 takerain70 WP(W|F=bad) sun0.34 rain0.66 Umbrella = leave Umbrella = take Optimal decision = take 20 [Demo]

Conditioning on Action Nodes  An action node can be a parent of a chance node  Chance node conditions on the outcome of the action  Action nodes are like observed variables in a Bayes’ net, except we max over their values 21 S’ A U S T(s,a,s’) R(s,a,s’)

Value of Information  Idea: compute value of acquiring evidence  Can be done directly from decision network  Example: buying oil drilling rights  Two blocks A and B, exactly one has oil, worth k  Prior probabilities 0.5 each, & mutually exclusive  Drilling in either A or B has MEU = k/2  Fair price of drilling rights: k/2  Question: what’s the value of information  Value of knowing which of A or B has oil  Value is expected gain in MEU from new info  Survey may say “oil in a” or “oil in b,” prob 0.5 each  If we know OilLoc, MEU is k (either way)  Gain in MEU from knowing OilLoc?  VPI(OilLoc) = k/2  Fair price of information: k/2 OilLoc DrillLoc U DOU aak ab0 ba0 bbk OP a1/2 b 22

Value of Perfect Information  Current evidence E=e, utility depends on S=s  Potential new evidence E’: suppose we knew E’ = e’  BUT E’ is a random variable whose value is currently unknown, so:  Must compute expected gain over all possible values  (VPI = value of perfect information) 23

VPI Example: Weather Weather Forecast Umbrella U AWU leavesun100 leaverain0 takesun20 takerain70 MEU with no evidence MEU if forecast is bad MEU if forecast is good FP(F) good0.59 bad0.41 Forecast distribution

VPI Example: Ghostbusters TBGP(T,B,G) +t+b+g0.16 +t+b gg t bb +g0.24 +t bb gg 0.04  t +b+g0.04 tt +b gg 0.24 tt bb +g0.06 tt bb gg  Reminder: ghost is hidden, sensors are noisy  T: Top square is red B: Bottom square is red G: Ghost is in the top  Sensor model: P( +t | +g ) = 0.8 P( +t |  g ) = 0.4 P( +b | +g) = 0.4 P( +b |  g ) = 0.8 Joint Distribution [Demo]

VPI Example: Ghostbusters TBGP(T,B,G) +t+b+g0.16 +t+b gg t bb +g0.24 +t bb gg 0.04  t +b+g0.04 tt +b gg 0.24 tt bb +g0.06 tt bb gg Utility of bust is 2, no bust is 0  Q1: What’s the value of knowing T if I know nothing?  Q1’: E P(T) [MEU(t) – MEU()]  Q2: What’s the value of knowing B if I already know that T is true (red)?  Q2’: E P(B|t) [MEU(t,b) – MEU(t)]  How low can the value of information ever be? Joint Distribution [Demo]

VPI Properties  Nonnegative in expectation  Nonadditive ---consider, e.g., obtaining E j twice  Order-independent 27

Quick VPI Questions  The soup of the day is either clam chowder or split pea, but you wouldn’t order either one. What’s the value of knowing which it is?  If you have $10 to bet and odds are 3 to 1 that Berkeley will beat Stanford, what’s the value of knowing the outcome in advance, assuming you can make a fair bet for either Cal or Stanford?  What if you are morally obligated not to bet against Cal, but you can refrain from betting? 28