CS 188: Artificial Intelligence Fall 2009 Lecture 16: Bayes’ Nets III – Inference 10/20/2009 Dan Klein – UC Berkeley.

Slides:



Advertisements
Similar presentations
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
Advertisements

Lirong Xia Bayesian networks (2) Thursday, Feb 25, 2014.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
Inference in Bayesian Nets
CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
CS 188: Artificial Intelligence Fall 2009 Lecture 13: Probability 10/8/2009 Dan Klein – UC Berkeley 1.
CS 188: Artificial Intelligence Fall 2009 Lecture 15: Bayes’ Nets II – Independence 10/15/2009 Dan Klein – UC Berkeley.
CS 188: Artificial Intelligence Fall 2009 Lecture 17: Bayes Nets IV 10/27/2009 Dan Klein – UC Berkeley.
CS 188: Artificial Intelligence Spring 2009 Lecture 15: Bayes’ Nets II -- Independence 3/10/2009 John DeNero – UC Berkeley Slides adapted from Dan Klein.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
Announcements Homework 8 is out Final Contest (Optional)
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
CS 188: Artificial Intelligence Fall 2009 Lecture 14: Bayes’ Nets 10/13/2009 Dan Klein – UC Berkeley.
Advanced Artificial Intelligence
Announcements  Midterm 1  Graded midterms available on pandagrader  See Piazza for post on how fire alarm is handled  Project 4: Ghostbusters  Due.
Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.
QUIZ!!  T/F: The forward algorithm is really variable elimination, over time. TRUE  T/F: Particle Filtering is really sampling, over time. TRUE  T/F:
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
CS 188: Artificial Intelligence Fall 2006 Lecture 18: Decision Diagrams 10/31/2006 Dan Klein – UC Berkeley.
Chp 13 – Probability Exercises. Please try to do this problems without looking at the answer. When you look at the answer, you will not realize what you.
Announcements  Office hours this week Tuesday (as usual) and Wednesday  HW6 posted, due Monday 10/20.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Announcements Project 4: Ghostbusters Homework 7
QUIZ!! T/F: Probability Tables PT(X) do not always sum to one. FALSE
CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.
QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.
Today’s Topics Graded HW1 in Moodle (Testbeds used for grading are linked to class home page) HW2 due (but can still use 5 late days) at 11:55pm tonight.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
Advanced Artificial Intelligence Lecture 5: Probabilistic Inference.
Inference Algorithms for Bayes Networks
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Computer vision: models, learning and inference Chapter 2 Introduction to probability.
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
CSE 473: Artificial Intelligence Autumn 2011 Bayesian Networks: Inference Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart.
CS 188: Artificial Intelligence Spring 2009 Lecture 20: Decision Networks 4/2/2009 John DeNero – UC Berkeley Slides adapted from Dan Klein.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
CS 511a: Artificial Intelligence Spring 2013 Lecture 13: Probability/ State Space Uncertainty 03/18/2013 Robert Pless Class directly adapted from Kilian.
QUIZ!!  T/F: You can always (theoretically) do BNs inference by enumeration. TRUE  T/F: In VE, always first marginalize, then join. FALSE  T/F: VE is.
Artificial Intelligence Bayes’ Nets: Independence Instructors: David Suter and Qince Li Course Harbin Institute of Technology [Many slides.
Inference in Bayesian Networks
Computer vision: models, learning and inference
Our Status We’re done with Part I (for now at least) Search and Planning! Reference AI winter --- logic (exceptions, etc. + so much knowledge --- can.
Artificial Intelligence
CS 4/527: Artificial Intelligence
Quizzz Rihanna’s car engine does not start (E).
CS 4/527: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Probability Topics Random Variables Joint and Marginal Distributions
Advanced Artificial Intelligence
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CS 188: Artificial Intelligence Fall 2008
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2008
Announcements Midterm: Wednesday 7pm-9pm
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Spring 2006
Probability Lirong Xia.
Probabilistic Reasoning
Bayesian networks (2) Lirong Xia. Bayesian networks (2) Lirong Xia.
CS 188: Artificial Intelligence Fall 2008
Bayesian networks (2) Lirong Xia.
Presentation transcript:

CS 188: Artificial Intelligence Fall 2009 Lecture 16: Bayes’ Nets III – Inference 10/20/2009 Dan Klein – UC Berkeley

Announcements  Midterm THURSDAY EVENING  6-9pm in 155 Dwinelle  One page (2 sides) of notes & basic calculator ok  Review session tonight, 6pm, 145 Dwinelle  No section this week  GSI OHs on midterm prep page  Assignments:  P2 in glookup (let us know if there are problems)  W3, P4 not until after midterm 2

Project 2 Mini-Contest!  Honorable mentions:  (8 wins, average 2171), Arpad Kovacs, Alan Young  (7 wins, average 2053), Eui Shin  (8 wins, average 2002), Katie Bedrosian, Edward Lin  2nd runner up: 5 wins, average 2445  Eric Chang, Peter Currier  Depth 3 alpha-beta with complex eval function that tried hard to eat ghosts. On games where they won, their score was very good.  1st runner up: 8 wins, average score 2471  Nils Reimers  Expectimax with adjustable depth, extremely conservative eval function (losing had - inf utility and winning had +inf), so the agent was pretty good at consistently winning.  Winner: 7 wins, average score 3044  James Ide, Xingyan Liu  Proper treatment of ghosts’ chance nodes. Elaborate evaluation function to detect if Pacman was trapped by ghosts and whether to get a power pellet or run. 3 [DEMO]

Inference  Inference: calculating some useful quantity from a joint probability distribution  Examples:  Posterior probability:  Most likely explanation: 4 BE A JM

Inference by Enumeration  Given unlimited time, inference in BNs is easy  Recipe:  State the marginal probabilities you need  Figure out ALL the atomic probabilities you need  Calculate and combine them  Example: 5 BE A JM

Example: Enumeration  In this simple method, we only need the BN to synthesize the joint entries 6

Inference by Enumeration? 7

Variable Elimination  Why is inference by enumeration so slow?  You join up the whole joint distribution before you sum out the hidden variables  You end up repeating a lot of work!  Idea: interleave joining and marginalizing!  Called “Variable Elimination”  Still NP-hard, but usually much faster than inference by enumeration  We’ll need some new notation to define VE 8

Factor Zoo I  Joint distribution: P(X,Y)  Entries P(x,y) for all x, y  Sums to 1  Selected joint: P(x,Y)  A slice of the joint distribution  Entries P(x,y) for fixed x, all y  Sums to P(x) 9 TWP hotsun0.4 hotrain0.1 coldsun0.2 coldrain0.3 TWP coldsun0.2 coldrain0.3

Factor Zoo II  Family of conditionals: P(X |Y)  Multiple conditionals  Entries P(x | y) for all x, y  Sums to |Y|  Single conditional: P(Y | x)  Entries P(y | x) for fixed x, all y  Sums to 1 10 TWP hotsun0.8 hotrain0.2 coldsun0.4 coldrain0.6 TWP coldsun0.4 coldrain0.6

Factor Zoo III  Specified family: P(y | X)  Entries P(y | x) for fixed y, but for all x  Sums to … who knows!  In general, when we write P(Y 1 … Y N | X 1 … X M )  It is a “factor,” a multi-dimensional array  Its values are all P(y 1 … y N | x 1 … x M )  Any assigned X or Y is a dimension missing (selected) from the array 11 TWP hotrain0.2 coldrain0.6

Example: Traffic Domain  Random Variables  R: Raining  T: Traffic  L: Late for class!  First query: P(L) 12 T L R +r0.1 -r0.9 +r+t0.8 +r-t0.2 -r+t0.1 -r-t0.9 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9

 Track objects called factors  Initial factors are local CPTs (one per node)  Any known values are selected  E.g. if we know, the initial factors are  VE: Alternately join factors and eliminate variables 13 Variable Elimination Outline +r0.1 -r0.9 +r+t0.8 +r-t0.2 -r+t0.1 -r-t0.9 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9 +t+l0.3 -t+l0.1 +r0.1 -r0.9 +r+t0.8 +r-t0.2 -r+t0.1 -r-t0.9

 First basic operation: joining factors  Combining factors:  Just like a database join  Get all factors over the joining variable  Build a new factor over the union of the variables involved  Example: Join on R  Computation for each entry: pointwise products 14 Operation 1: Join Factors +r0.1 -r0.9 +r+t0.8 +r-t0.2 -r+t0.1 -r-t0.9 +r+t0.08 +r-t0.02 -r+t0.09 -r-t0.81 T R R,T

 In general, we join on a variable  Take all factors mentioning that variable  Join them all together with pointwise products  Result is P(all LHS vars | all non-LHS vars)  Leave other factors alone  Example: Join on T Operation 1: Join Factors +r0.1 -r0.9 +r+t0.8 +r-t0.2 -r+t0.1 -r-t0.9 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9 +r+t+l0.24 +r+t-l0.56 +r-t+l0.02 +r-t-l0.18 -r+t+l0.03 -r+t-l0.07 -r-t+l0.09 -r-t-l0.81 T R L T, L R

Example: Multiple Joins 16 T R Join R L R, T L +r0.1 -r0.9 +r+t0.8 +r-t0.2 -r+t0.1 -r-t0.9 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9 +r+t0.08 +r-t0.02 -r+t0.09 -r-t0.81 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9

Example: Multiple Joins 17 Join T R, T, L R, T L +r+t0.08 +r-t0.02 -r+t0.09 -r-t0.81 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9 +r+t+l r+t-l r-t+l r-t-l r+t+l r+t-l r-t+l r-t-l 0.729

Operation 2: Eliminate  Second basic operation: marginalization  Take a factor and sum out a variable  Shrinks a factor to a smaller one  A projection operation  Example: 18 +r+t0.08 +r-t0.02 -r+t0.09 -r-t0.81 +t0.17 -t0.83

Multiple Elimination 19 Sum out R Sum out T T, LL R, T, L +r+t+l r+t-l r-t+l r-t-l r+t+l r+t-l r-t+l r-t-l t+l t-l t+l t-l l l0.886

P(L) : Marginalizing Early! 20 Sum out R T L +r+t0.08 +r-t0.02 -r+t0.09 -r-t0.81 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9 +t0.17 -t0.83 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9 T R L +r0.1 -r0.9 +r+t0.8 +r-t0.2 -r+t0.1 -r-t0.9 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9 Join R R, T L

Marginalizing Early (aka VE*) Join TSum out T T, LL * VE is variable elimination T L +t0.17 -t0.83 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9 +t+l t-l t+l t-l l l0.886

 If evidence, start with factors that select that evidence  No evidence uses these initial factors:  Computing, the initial factors become:  We eliminate all vars other than query + evidence 22 Evidence +r0.1 -r0.9 +r+t0.8 +r-t0.2 -r+t0.1 -r-t0.9 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9 +r0.1 +r+t0.8 +r-t0.2 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9

 Result will be a selected joint of query and evidence  E.g. for P(L | +r), we’d end up with:  To get our answer, just normalize this!  That’s it! 23 Evidence II +l0.26 -l0.74 +r+l r-l0.074 Normalize

General Variable Elimination  Query:  Start with initial factors:  Local CPTs (but instantiated by evidence)  While there are still hidden variables (not Q or evidence):  Pick a hidden variable H  Join all factors mentioning H  Eliminate (sum out) H  Join all remaining factors and normalize 24

Variable Elimination Bayes Rule 25 ABP +a+b0.08 +a bb 0.09 BAP +b+a0.8 b aa 0.2 bb +a0.1 bb aa 0.9 BP +b0.1 bb 0.9 a Ba, B Start / Select Join on BNormalize ABP +a+b8/17 +a bb 9/17

Example Choose A 26

Example Choose E Finish with B Normalize 27

Variable Elimination  What you need to know:  Should be able to run it on small examples, understand the factor creation / reduction flow  Better than enumeration: saves time by marginalizing variables as soon as possible rather than at the end  We will see special cases of VE later  On tree-structured graphs, variable elimination runs in polynomial time, like tree-structured CSPs  You’ll have to implement a tree-structured special case to track invisible ghosts (Project 4)