Bayesian network inference

Slides:



Advertisements
Similar presentations
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
Advertisements

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
Exact Inference in Bayes Nets
Lirong Xia Hidden Markov Models Tue, March 28, 2014.
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar,
Markov Networks.
CSCI 121 Special Topics: Bayesian Networks Lecture #3: Multiply-Connected Graphs and the Junction Tree Algorithm.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Overview Full Bayesian Learning MAP learning
… Hidden Markov Models Markov assumption: Transition model:
Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
10/24  Exam on 10/26 (Lei Tang and Will Cushing to proctor)
Inference in Bayesian Nets
Global Approximate Inference Eran Segal Weizmann Institute.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
10/22  Homework 3 returned; solutions posted  Homework 4 socket opened  Project 3 assigned  Mid-term on Wednesday  (Optional) Review session Tuesday.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Exact Inference: Clique Trees
Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Bayesian networks Chapter 14. Outline Syntax Semantics.
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009.
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
CPSC 322, Lecture 28Slide 1 More on Construction and Compactness: Compact Conditional Distributions Once we have established the topology of a Bnet, we.
1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
Bayesian Networks Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Belief Propagation and its Generalizations Shane Oldenburger.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Inference Algorithms for Bayes Networks
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Pattern Recognition and Machine Learning
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Distributed cooperation and coordination using the Max-Sum algorithm
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Approximate Inference
Learning Bayesian Network Models from Data
Bayesian Models in Machine Learning
CSCI 5822 Probabilistic Models of Human and Machine Learning
Learning Markov Networks
Structure and Semantics of BN
Bayesian Statistics and Belief Networks
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence Fall 2008
Expectation-Maximization & Belief Propagation
Class #16 – Tuesday, October 26
Structure and Semantics of BN
Inference III: Approximate Inference
Approximate Inference by Sampling
Markov Networks.
Bayesian Networks: Structure and Semantics
Presentation transcript:

Bayesian network inference Given: Query variables: X Evidence (observed) variables: E = e Unobserved variables: Y Goal: calculate some useful information about the query variables Posterior P(X|e) MAP estimate arg maxx P(x|e) Recall: inference via the full joint distribution Since BN’s can afford exponential savings in storage of joint distributions, can they afford similar savings for inference?

Bayesian network inference In full generality, NP-hard More precisely, #P-hard: equivalent to counting satisfying assignments We can reduce satisfiability to Bayesian network inference Decision problem: is P(Y) > 0? C1 C2 C3 G. Cooper, 1990

Inference example Query: P(B | j, m)

Inference example Query: P(B | j, m) Are we doing any unnecessary work?

Inference example

Exact inference Basic idea: compute the results of sub-expressions in a bottom-up way and cache them for later use Form of dynamic programming Has polynomial time and space complexity for polytrees Polytree: at most one undirected path between any two nodes

Representing people

Review: Bayesian network inference In general, harder than satisfiability Efficient inference via dynamic programming is possible for polytrees In other practical cases, must resort to approximate methods Start here

Approximate inference: Sampling A Bayesian network is a generative model Allows us to efficiently generate samples from the joint distribution Algorithm for sampling the joint distribution: While not all variables are sampled: Pick a variable that is not yet sampled, but whose parents are sampled Draw its value from P(X | parents(X))

Example of sampling from the joint distribution

Example of sampling from the joint distribution

Example of sampling from the joint distribution

Example of sampling from the joint distribution

Example of sampling from the joint distribution

Example of sampling from the joint distribution

Example of sampling from the joint distribution

Inference via sampling Suppose we drew N samples from the joint distribution How do we compute P(X = x | e)? Rejection sampling: to compute P(X = x | e), keep only the samples in which e happens and find in what proportion of them x also happens

Inference via sampling Rejection sampling: to compute P(X = x | e), keep only the samples in which e happens and find in what proportion of them x also happens What if e is a rare event? Example: burglary  earthquake Rejection sampling ends up throwing away most of the samples

Inference via sampling Rejection sampling: to compute P(X = x | e), keep only the samples in which e happens and find in what proportion of them x also happens What if e is a rare event? Example: burglary  earthquake Rejection sampling ends up throwing away most of the samples Likelihood weighting Sample from P(X = x | e), but weight each sample by P(e)

Inference via sampling: Summary Use the Bayesian network to generate samples from the joint distribution Approximate any desired conditional or marginal probability by empirical frequencies This approach is consistent: in the limit of infinitely many samples, frequencies converge to probabilities No free lunch: to get a good approximate of the desired probability, you may need an exponential number of samples anyway

Example: Solving the satisfiability problem by sampling C1 C2 C3 P(ui = T) = 0.5 Sample values of u1, …, u4 according to P(ui = T) = 0.5 Estimate of P(Y): # of satisfying assignments / # of sampled assignments Not guaranteed to correctly figure out whether P(Y) > 0 unless you sample every possible assignment!

Other approximate inference methods Variational methods Approximate the original network by a simpler one (e.g., a polytree) and try to minimize the divergence between the simplified and the exact model Belief propagation Iterative message passing: each node computes some local estimate and shares it with its neighbors. On the next iteration, it uses information from its neighbors to update its estimate.

Parameter learning Suppose we know the network structure (but not the parameters), and have a training set of complete observations Training set ? Sample C S R W 1 T F 2 3 4 5 6 … …. ? ? ?

Parameter learning Suppose we know the network structure (but not the parameters), and have a training set of complete observations P(X | Parents(X)) is given by the observed frequencies of the different values of X for each combination of parent values Similar to sampling, except your samples come from the training data and not from the model (whose parameters are initially unknown)

Parameter learning Incomplete observations Expectation maximization (EM) algorithm for dealing with missing data ? Training set Sample C S R W 1 ? F T 2 3 4 5 6 … …. ? ? ? Roughly speaking, alternate between doing inference on the missing data (“filling in” the missing data) and re-estimating the parameters of the model given the estimated missing data

? Parameter learning What if the network structure is unknown? Structure learning algorithms exist, but they are pretty complicated… Training set C Sample C S R W 1 T F 2 3 4 5 6 … …. ? S R W