CS 188: Artificial Intelligence Fall 2007

Slides:



Advertisements
Similar presentations
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
Advertisements

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Making Simple Decisions
Lirong Xia Hidden Markov Models Tue, March 28, 2014.
Probabilistic Reasoning (2)
CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
CS 188: Artificial Intelligence Fall 2009 Lecture 16: Bayes’ Nets III – Inference 10/20/2009 Dan Klein – UC Berkeley.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
CS 188: Artificial Intelligence Fall 2009 Lecture 17: Bayes Nets IV 10/27/2009 Dan Klein – UC Berkeley.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
10/22  Homework 3 returned; solutions posted  Homework 4 socket opened  Project 3 assigned  Mid-term on Wednesday  (Optional) Review session Tuesday.
Announcements Homework 8 is out Final Contest (Optional)
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
CS 188: Artificial Intelligence Fall 2009 Lecture 18: Decision Diagrams 10/29/2009 Dan Klein – UC Berkeley.
CS 188: Artificial Intelligence Fall 2008 Lecture 18: Decision Diagrams 10/30/2008 Dan Klein – UC Berkeley 1.
QUIZ!!  T/F: The forward algorithm is really variable elimination, over time. TRUE  T/F: Particle Filtering is really sampling, over time. TRUE  T/F:
Introduction to Bayesian Networks
CS 188: Artificial Intelligence Fall 2006 Lecture 18: Decision Diagrams 10/31/2006 Dan Klein – UC Berkeley.
QUIZ!!  T/F: Forward sampling is consistent. True  T/F: Rejection sampling is faster, but inconsistent. False  T/F: Rejection sampling requires less.
UBC Department of Computer Science Undergraduate Events More
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Announcements Project 4: Ghostbusters Homework 7
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
Advanced Artificial Intelligence Lecture 5: Probabilistic Inference.
Inference Algorithms for Bayes Networks
Quick Warm-Up Suppose we have a biased coin that comes up heads with some unknown probability p; how can we use it to produce random bits with probabilities.
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
CSE 473: Artificial Intelligence Autumn 2011 Bayesian Networks: Inference Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart.
CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapter 14 Lecture 15 Uncertainty Chapter 14.
CS 188: Artificial Intelligence Spring 2009 Lecture 20: Decision Networks 4/2/2009 John DeNero – UC Berkeley Slides adapted from Dan Klein.
QUIZ!!  T/F: You can always (theoretically) do BNs inference by enumeration. TRUE  T/F: In VE, always first marginalize, then join. FALSE  T/F: VE is.
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
Artificial Intelligence Bayes’ Nets: Independence Instructors: David Suter and Qince Li Course Harbin Institute of Technology [Many slides.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
CS 188: Artificial Intelligence Spring 2007
CS 541: Artificial Intelligence
CS 188: Artificial Intelligence Spring 2006
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Inference in Bayesian Networks
CS b553: Algorithms for Optimization and Learning
Artificial Intelligence
Quizzz Rihanna’s car engine does not start (E).
CS 4/527: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence Spring 2007
Probability Topics Random Variables Joint and Marginal Distributions
Advanced Artificial Intelligence
CHAPTER 7 BAYESIAN NETWORK INDEPENDENCE BAYESIAN NETWORK INFERENCE MACHINE LEARNING ISSUES.
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CS 188: Artificial Intelligence Fall 2008
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2008
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2008
Hidden Markov Models Lirong Xia.
Quick Warm-Up Suppose we have a biased coin that comes up heads with some unknown probability p; how can we use it to produce random bits with probabilities.
CS 188: Artificial Intelligence Spring 2006
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

CS 188: Artificial Intelligence Fall 2007 Lecture 18: Bayes Nets III 10/30/2007 Dan Klein – UC Berkeley

Announcements Project shift: Contest is live Project 4 moved back a little Instead, mega-mini-homework, worth 3x, graded Contest is live

Inference Inference: calculating some statistic from a joint probability distribution Examples: Posterior probability: Most likely explanation: L R B D T T’

Reminder: Alarm Network

Normalization Trick Normalize

Inference by Enumeration?

Nesting Sums Atomic inference is extremely slow! Slightly clever way to save work: Move the sums as far right as possible Example:

Evaluation Tree View the nested sums as a computation tree: Still repeated work: calculate P(m | a) P(j | a) twice, etc.

Variable Elimination: Idea Lots of redundant work in the computation tree We can save time if we cache all partial results Join on one hidden variable at a time Project out that variable immediately This is the basic idea behind variable elimination

Basic Objects Track objects called factors Initial factors are local CPTs During elimination, create new factors Anatomy of a factor: 4 numbers, one for each value of D and E Argument variables, always non-evidence variables Variables introduced Variables summed out

Basic Operations First basic operation: join factors Combining two factors: Just like a database join Build a factor over the union of the domains Example:

Basic Operations Second basic operation: marginalization Take a factor and sum out a variable Shrinks a factor to a smaller one A projection operation Example:

Example

Example

General Variable Elimination Query: Start with initial factors: Local CPTs (but instantiated by evidence) While there are still hidden variables (not Q or evidence): Pick a hidden variable H Join all factors mentioning H Project out H Join all remaining factors and normalize

Example Choose A

Example Choose E Finish Normalize

Variable Elimination What you need to know: VE caches intermediate computations Polynomial time for tree-structured graphs! Saves time by marginalizing variables ask soon as possible rather than at the end We will see special cases of VE later You’ll have to implement the special cases Approximations Exact inference is slow, especially when you have a lot of hidden nodes Approximate methods give you a (close) answer, faster

Sampling Basic idea: Outline: Draw N samples from a sampling distribution S Compute an approximate posterior probability Show this converges to the true probability P Outline: Sampling from an empty network Rejection sampling: reject samples disagreeing with evidence Likelihood weighting: use evidence to weight samples

Prior Sampling Cloudy Cloudy Sprinkler Sprinkler Rain Rain WetGrass

Prior Sampling This process generates samples with probability …i.e. the BN’s joint probability Let the number of samples of an event be Then I.e., the sampling procedure is consistent

Example We’ll get a bunch of samples from the BN: c, s, r, w c, s, r, w c, s, r, w c, s, r, w If we want to know P(W) We have counts <w:4, w:1> Normalize to get P(W) = <w:0.8, w:0.2> This will get closer to the true distribution with more samples Can estimate anything else, too What about P(C| r)? P(C| r, w)? Cloudy Sprinkler Rain WetGrass C S R W

Rejection Sampling Let’s say we want P(C) Let’s say we want P(C| s) No point keeping all samples around Just tally counts of C outcomes Let’s say we want P(C| s) Same thing: tally C outcomes, but ignore (reject) samples which don’t have S=s This is rejection sampling It is also consistent (correct in the limit) Cloudy Sprinkler Rain WetGrass C S R W c, s, r, w c, s, r, w c, s, r, w c, s, r, w

Likelihood Weighting Problem with rejection sampling: If evidence is unlikely, you reject a lot of samples You don’t exploit your evidence as you sample Consider P(B|a) Idea: fix evidence variables and sample the rest Problem: sample distribution not consistent! Solution: weight by probability of evidence given parents Burglary Alarm Burglary Alarm

Likelihood Sampling Cloudy Cloudy Sprinkler Sprinkler Rain Rain WetGrass WetGrass

Likelihood Weighting Sampling distribution if z sampled and e fixed evidence Now, samples have weights Together, weighted sampling distribution is consistent Cloudy Rain C S R W

Likelihood Weighting Note that likelihood weighting doesn’t solve all our problems Rare evidence is taken into account for downstream variables, but not upstream ones A better solution is Markov-chain Monte Carlo (MCMC), more advanced We’ll return to sampling for robot localization and tracking in dynamic BNs Cloudy Rain C S R W

Decision Networks MEU: choose the action which maximizes the expected utility given the evidence Can directly operationalize this with decision diagrams Bayes nets with nodes for utility and actions Lets us calculate the expected utility for each action New node types: Chance nodes (just like BNs) Actions (rectangles, must be parents, act as observed evidence) Utilities (depend on action and chance nodes) Umbrella U Weather Report

Decision Networks Action selection: Instantiate all evidence Calculate posterior over parents of utility node Set action node each possible way Calculate expected utility for each action Choose maximizing action Umbrella U Weather Report

Example: Decision Networks Umbrella U Weather A W U(A,W) leave sun 100 rain take 20 70 W P(W) sun 0.7 rain 0.3

Example: Decision Networks Umbrella W P(W) sun 0.7 rain 0.3 U Weather A W U(A,W) leave sun 100 rain take 20 70 R P(R|sun) clear 0.5 cloudy Report R P(R|rain) clear 0.2 cloud 0.8

Value of Information Idea: compute value of acquiring each possible piece of evidence Can be done directly from decision network Example: buying oil drilling rights Two blocks A and B, exactly one has oil, worth k Prior probabilities 0.5 each, mutually exclusive Current price of each block is k/2 Probe gives accurate survey of A. Fair price? Solution: compute value of information = expected value of best action given the information minus expected value of best action without information Survey may say “oil in A” or “no oil in A,” prob 0.5 each = [0.5 * value of “buy A” given “oil in A”] + [0.5 * value of “buy B” given “no oil in A”] – 0 = [0.5 * k/2] + [0.5 * k/2] - 0 = k/2 DrillLoc U OilLoc

General Formula Current evidence E=e, possible utility inputs s Potential new evidence E’: suppose we knew E’ = e’ BUT E’ is a random variable whose value is currently unknown, so: Must compute expected gain over all possible values (VPI = value of perfect information)

VPI Properties Nonnegative in expectation Nonadditive ---consider, e.g., obtaining Ej twice Order-independent

VPI Example Umbrella U Weather Report

VPI Scenarios Imagine actions 1 and 2, for which U1 > U2 How much will information about Ej be worth? Little – we’re sure action 1 is better. A lot – either could be much better Little – info likely to change our action but not our utility