CS 541: Artificial Intelligence

Slides:



Advertisements
Similar presentations
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Advertisements

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
Exact Inference in Bayes Nets
Dynamic Bayesian Networks (DBNs)
Probabilistic Reasoning (2)
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
10/24  Exam on 10/26 (Lei Tang and Will Cushing to proctor)
Inference in Bayesian Nets
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
Constructing Belief Networks: Summary [[Decide on what sorts of queries you are interested in answering –This in turn dictates what factors to model in.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
10/22  Homework 3 returned; solutions posted  Homework 4 socket opened  Project 3 assigned  Mid-term on Wednesday  (Optional) Review session Tuesday.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Announcements Homework 8 is out Final Contest (Optional)
Computer vision: models, learning and inference Chapter 10 Graphical Models.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Bayesian networks Chapter 14. Outline Syntax Semantics.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
CS 188: Artificial Intelligence Fall 2006 Lecture 18: Decision Diagrams 10/31/2006 Dan Klein – UC Berkeley.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Inference Algorithms for Bayes Networks
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapter 14 Lecture 15 Uncertainty Chapter 14.
Probabilistic Reasoning Inference and Relational Bayesian Networks.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
CS 416 Artificial Intelligence Lecture 16 Uncertainty Chapter 14 Lecture 16 Uncertainty Chapter 14.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Web-Mining Agents Data Mining
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Inference in Bayesian Networks
Artificial Intelligence
CS 4/527: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
CSCI 5822 Probabilistic Models of Human and Machine Learning
Advanced Artificial Intelligence
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CAP 5636 – Advanced Artificial Intelligence
Professor Marie desJardins,
Bayesian Statistics and Belief Networks
CS 188: Artificial Intelligence
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2008
Class #16 – Tuesday, October 26
Class #22/23 – Wednesday, November 12 / Monday, November 17
CS 188: Artificial Intelligence Fall 2007
Markov Networks.
Variable Elimination Graphical Models – Carlos Guestrin
Bayesian Networks: Structure and Semantics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks

No class (function as Monday) Schedule Week Date Topic Reading Homework 1 08/30/2011 Introduction & Intelligent Agent Ch 1 & 2 N/A 2 09/06/2011 Search: search strategy and heuristic search Ch 3 & 4s HW1 (Search) 3 09/13/2011 Search: Constraint Satisfaction & Adversarial Search Ch 4s & 5 & 6  Teaming Due 4 09/20/2011 Logic: Logic Agent & First Order Logic Ch 7 & 8s HW1 due, Midterm Project  (Game) 5 09/27/2011 Logic: Inference on First Order Logic Ch 8s & 9   6 10/04/2011 Uncertainty and Bayesian Network Ch 13 &14s HW2 (Logic) 10/11/2011 No class (function as Monday) 7 10/18/2011 Midterm Presentation Midterm Project Due 8 10/25/2011 Inference in Baysian Network Ch 14s HW2 Due, HW3 (Probabilistic Reasoning) 9 11/01/2011 Probabilistic Reasoning over Time Ch 15 10 11/08/2011 TA Session: Review of HW1 & 2  HW3 due, HW4 (MDP) 11 11/15/2011 Learning: Decision Tree & SVM Ch 18  12 11/22/2011 Statistical Learning: Introduction: Naive Bayesian & Percepton Ch 20  HW4 due 13 11/29/2011 Markov Decision Process& Reinforcement learning Ch 16s & 21 14 12/06/2011 Final Project Presentation Final Project Due TA: Vishakha Sharma Email Address: vsharma1@stevens.edu

Re-cap of Lecture VI – Part I Probability is a rigorous formalism for uncertain knowledge Joint probability distribution species probability of every atomic event Queries can be answered by summing over atomic events For nontrivial domains, we must find a way to reduce the joint size Independence and conditional independence provide the tools

Re-cap of Lecture VI – Part II Bayes nets provide a natural representation for (causally induced) conditional independence Topology + CPTs = compact representation of joint distribution Generally easy for (non)experts to construct Canonical distributions (e.g., noisy-OR) = compact representation of CPTs Continuous variables parameterized distributions (e.g., linear Gaussian)

Outline Exact inference by enumeration Exact inference by variable elimination Approximate inference by stochastic simulation Approximate inference by Markov chain Monte Carlo

Exact inference by enumeration

Inference tasks Simple queries: compute posterior marginal P( Xi | E=e ) e.g., P( NoGas | Gauge=empty, Lights=on, Starts=false ) Conjunctive queries: P( Xi, Xj | E=e ) = P( Xi | E=e )P( Xj | Xi, E=e ) Optimal decisions: decision networks include utility information; probabilistic inference required for P( outcome | action, evidence) Value of information: which evidence to seek next? Sensitivity analysis: which probability values are most critical? Explanation: why do I need a new starter motor?

Inference by enumeration Slightly intelligent way to sum out variables from the joint without actually Simple query on the burglary network: Rewrite full joint entries using product of CPT entries: Recursive depth-first enumeration: O(n) space, O(dn) time

Enumeration algorithm

Evaluation tree Enumeration is inefficient: repeated computation E.g., computes for every value of e

Inference by variable elimination

Inference by variable elimination Variable elimination: carry out summations right-to-left, storing intermediate results (factors) to avoid recomputation

Variable elimination: basic operations Summing out a variable from a product of factors: Move any constant factors outside the summation Add up submatrices in pointwise product of remaining factors Assuming do not depend on Pointwise product of factors and : E.g.,

Variable elimination algorithm

Irrelevant variables Consider the query Sum over m is identically 1,M is irrelevant of to the query Theorem 1: is irrelevant unless Here , and so is irrelevant (Compare this to backward chaining from the query in Horn clause KBs)

Irrelevant variables Definition: moral graph of Bayesian network: marry all parents and drop arrows Definition: A is m-separated from B by C iff separated by C in the moral graph Theorem 2: Y is irrelevant if m-separated from X by E For , both Burglary and Earthquake are irrelevant

Complexity of exact inference Singly connected networks (or polytree): any two nodes are connected by at most one (undirected) path time and space cost of variable elimination are O(dkn) Multiply connected networks: Can reduce 3SAT to exact inference  NP-hard Equivalent to counting 3SAT models  #P-complete

Inference by stochastic simulation

Inference by stochastic simulation Basic idea: Draw N samples from a sampling distribution S Compute an approximate posterior probability Show this converges to the true probability P Outline: Direct sampling from a Bayesian network Rejection sampling: reject samples disagreeing with evidence Likelihood weighting: use evidence to weight samples Markov chain Monte Carlo (MCMC): sample from a stochastic process, whose stationary distribution is the true posterior

Direct sampling

Example

Example

Example

Example

Example

Example

Example

Direct sampling Probability that PRIORSAMPLE generates a particular event i.e., the true prior probability E.g., Let be the number of samples generated for events Then we have That is, estimates from PRIORSAMPLE are consistent Shorthand:

Rejection sampling

Rejection sampling estimated from samples agreeing with e E.g, estimate using 100 samples 27 samples have Of these, 8 have , 19 have . Similar to a basic real-world empirical estimation procedure

Analysis of rejection sampling Hence rejection sampling returns consistent posterior estimates Problem: hopelessly expensive if P(e) is small P(e) drops o exponentially with number of evidence variables!

Likelihood weighting

Likelihood weighting Idea: x evidence variables, sample only nonevidence variables, and weight each sample by the likelihood it accords the evidence

Likelihood weighting example

Likelihood weighting example

Likelihood weighting example

Likelihood weighting example

Likelihood weighting example

Likelihood weighting example

Likelihood weighting example

Likelihood weighting analysis Sampling probability for WEIGHTEDSAMPLE is Note: pays attention to evidence in ancestors only Somewhere "between" prior and posterior distribution Weight for a given sample z, e is Weighted sampling probability is Hence likelihood weighting returns consistent estimates but performance still degrades with many evidence variables because a few samples have nearly all the total weight

Markov chain Monte Carlo

Approximate inference using MCMC "State" of network = current assignment to all variables. Generate next state by sampling one variable given Markov blanket Sample each variable in turn, keeping evidence fixed Can also choose a variable to sample at random each time

The Markov chain With Sprinkler =true , WetGrass=true, there are four states: Wander about for a while, average what you see

MCMC example Estimate Sample Cloudy or Rain given its Markov blanket, repeat. Count number of times Rain is true and false in the samples. E.g., visit 100 states 31 have Rain=true, 69 have Rain=false Theorem: chain approaches stationary distribution, i.e., long-run fraction of time spent in each state is exactly proportional to its posterior probability

Markov blanket sampling Markov blanket of Cloudy is Sprinkler and Rain Markov blanket of Rain is Cloudy, Sprinkler, and WetGrass Probability given the Markov blanket is calculated as follows: Easily implemented in message-passing parallel systems, brains Main computational problems: Difficult to tell if convergence has been achieved Can be wasteful if Markov blanket is large: P(Xi|mb(Xi)) won't change much (law of large numbers)

Summary Exact inference by variable elimination: polytime on polytrees, NP-hard on general graphs space = time, very sensitive to topology Approximate inference by LW, MCMC: LW does poorly when there is lots of (downstream) evidence LW, MCMC generally insensitive to topology Convergence can be very slow with probabilities close to 1 or 0 Can handle arbitrary combinations of discrete and continuous variables

HW#3 Probabilistic reasoning Mandatory Problem 14.1 (1 points) Problem 14.4 (2 points) Problem 14.5 (2 points) Problem 14.21 (5 points) Programming is needed for question e) You need to submit your executable along with source code, with detailed readme file on how we can run it. You lose 3 points if no program. Bonus Problem 14.19 (2 points)