A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graph.

Slides:



Advertisements
Similar presentations
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Advertisements

A Tutorial on Learning with Bayesian Networks
Representing Boolean Functions for Symbolic Model Checking Supratik Chakraborty IIT Bombay.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
A. Darwiche Sensitivity Analysis in Bayesian Networks Adnan Darwiche Computer Science Department
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
Artificial Intelligence Universitatea Politehnica Bucuresti Adina Magda Florea
Bayesian Networks Bucket Elimination Algorithm 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Anagh Lal Tuesday, April 08, Chapter 9 – Tree Decomposition Methods- Part II Anagh Lal CSCE Advanced Constraint Processing.
A. Darwiche Sensitivity Analysis in Bayesian Networks Adnan Darwiche Computer Science Department
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
BAYESIAN NETWORKS CHAPTER#4 Book: Modeling and Reasoning with Bayesian Networks Author : Adnan Darwiche Publisher: CambridgeUniversity Press 2009.
A. Darwiche Inference in Bayesian Networks. A. Darwiche Query Types Pr: –Evidence: Pr(e) –Posterior marginals: Pr(x|e) for every X MPE: Most probable.
An Introduction to Variational Methods for Graphical Models.
Probabilistic networks Inference and Other Problems Hans L. Bodlaender Utrecht University.
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar,
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
M.I. Jaime Alfonso Reyes ´Cortés.  The basic task for any probabilistic inference system is to compute the posterior probability distribution for a set.
Anagh Lal Monday, April 14, Chapter 9 – Tree Decomposition Methods Anagh Lal CSCE Advanced Constraint Processing.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Taylor Expansion Diagrams (TED): Verification EC667: Synthesis and Verification of Digital Systems Spring 2011 Presented by: Sudhan.
Bayesian Networks Alan Ritter.
Copyright © Cengage Learning. All rights reserved. 3 Differentiation Rules.
A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graphs.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 
Derivatives of polynomials Derivative of a constant function We have proved the power rule We can prove.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Introduction to Bayesian Networks
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Discrete Structures Trees (Ch. 11)
Probabilistic Networks Chapter 14 of Dechter’s CP textbook Speaker: Daniel Geschwender April 1, 2013 April 1&3, 2013DanielG--Probabilistic Networks1.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
An Introduction to Variational Methods for Graphical Models
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.
Daniel Kroening and Ofer Strichman 1 Decision Procedures An Algorithmic Point of View BDDs.
Belief Propagation and its Generalizations Shane Oldenburger.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Introduction on Graphic Models
Today Graphical Models Representing conditional dependence graphically
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Artificial Intelligence Chapter 19 Reasoning with Uncertain Information Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Compiling Bayesian Networks Using Variable Elimination
CS 2750: Machine Learning Directed Graphical Models
Inference in Bayesian Networks
Exam Preparation Class
A New Algorithm for Computing Upper Bounds for Functional EmajSAT
Probabilistic Data Management
Class #19 – Tuesday, November 3
Class #16 – Tuesday, October 26
Clique Tree Algorithm: Computation
Switching Lemmas and Proof Complexity
Presentation transcript:

A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graph

Resources A 10-page version: Another is a 20-page version: m-diff.pdf m-diff.pdf More material: Darwiche, A. A logical approach to factoring belief networks. In Proceedings of KR(2002), pp

Outline Technical preliminaries The network Polynomial Inference query Probabilistic Semantics of Partial Derivatives Compiling Arithmetic circuits Evaluating and Differentiating arithmetic circuits Summary

Technical preliminaries Notational Convention Variables are denoted by upper – case letters (A) and their values by lower – case letters (a). Sets of variables are denoted by bold – face upper – case letters (A) and their instantiations are denoted by bold – face lower – case letters (a). For variable A and value a, we often write a instead of A=a. For a variable A with values true and false, we use a to denote A=true and to denote A=false. Finally, let X be a variable and let U be its parents in a Bayesian network. The set XU is called the family of variable X, and the variable is called a network parameter and is used to represent the conditional probability Pr(x | u).

Technical preliminaries Bayesian Network and Chain Rule A Bayesian network over variables X is a directed acyclic graph over X, in addition to conditional probability values for each variable X in the network and its parents U. the semantics of a Bayesian network are given by the chain rule, which says that the probability of instantiation x of all network variables X is simply the product of all network parameters,where xu is consistent with x.

Polynomial of network Evidence indicators: For each network variable X, we have a set of evidence indicators Network parameters: For each network family XU, we have a set of parameters Let N be a Bayesian network over variables X, and let U denoted the parents of variable X in the network. The polynomial of network N is defined as:

What can we get from polynomial? The list of the queries that can be answered in constant time once these partial derivatives are computed: The posterior marginal of any network variable X, Pr(x|e); The posterior marginal of any network family {X}  U, Pr(x,u| e); The sensitivity of Pr(e) to change in any network parameter ; The probability of evidence e after having changed the value of some variable E to e, Pr(e-E, e); The posterior marginal of some variable E after having retracted evidence on E, Pr(e|e-E). The posterior marginal of any pair of network variables X and Y, Pr(x,y|e); The posterior marginal of any pair of network families F1 and F2, Pr(f1,f2|e); The sensitivity of conditional probability Pr(y|e) to a change in network parameter

f(e)=Pr(e) Theorem 1: Let N be Bayesian network representing probability distribution Pr and having polynomial f. for any evidence (instantiation of variables ) e, we have f(e) = Pr(e) The value of network polynomial f at evidence e, denoted by f(e), is the result of replacing each evidence indicator in f with 1 if x is consistent with e, and with 0 otherwise.

Derivatives with respect to evidence indicators Theorem 2. Let N be a Bayesian network representing probability distribution Pr and having polynomial f. For every variable X and evidence e, we have That is, if we differentiate the polynomial f with respect to indicator and evaluate the result at evidence e, we obtain the probability of instantiation x, e-X. Corollary 1. For every variable X and evidence e, X  E: The partial derivatives give us the posterior marginal of every variable. Corollary 2. For every variable X and evidence e, we have:

Derivatives with respect to network parameters

Second partial derivatives

Problem exists Theorems 2–4 show us how to compute answers to classical probabilistic queries by differentiating the polynomial representation of a Bayesian network. Therefore, if we have an efficient way to represent and differentiate the polynomial, then we also have an efficient way to perform probabilistic reasoning. The size of network polynomial is exponential to the size of network variables. The polynomial f has an exponential number of terms, one term for each instantiation of the network variables. We compute the polynomial using an arithmetic circuit

Arithmetic circuit An arithmetic circuit is a graphical representation of a function f over variables E. Definition 3. An arithmetic circuit over variables E is a rooted, directed acyclic graph whose leaf nodes are labeled with numeric constants or variables in E and whose other nodes are labeled with multiplication and addition operations. The size of an arithmetic circuit is measured by the number of edges that it contains. Q1:how to obtain a compact arithmetic circuit that computes a given network polynomial? Q2:How can we efficiently evaluate and differentiate a circuit?

How to compile arithmetic circuit B A C

Jointree of a BN A jointree for a Bayesian network N is a labeled tree (T,L), where T is a tree and L is a function that assigns labels to nodes in T. A jointree must satisfy three properties: (1) each label L(i) is a set of variables in the BN; (2) each family XU in the network must appear in some label L(i); (3) if a variable appears in the labels of jointree nodes i and j, it must also appear in the label of each node k on the path connecting them. Nodes in a jointree, and their labels, are called clusters. Similarly, edges in a jointree, and their labels, are called separators, where the label of edge ij is defined as L(i)  L(j).

Get Circuit from Jointree Given a root cluster, a particular assignment of CPT and evidence tables to clusters, the arithmetic circuit embedded in a jointree is defined as follows. The circuit includes: one output addition node f; an addition node s for each instantiation of a separator S; a multiplication node c for each instantiation of a cluster C; an input node x for each instantiation x of variable X; an input node  x|u for each instantiation xu of family XU. The children of the output node f are the multiplication nodes c generated by the root cluster; the children of an addition node s are all compatible multiplication nodes c generated by the child cluster; the children of a multiplication node c are all compatible addition nodes s generated by child separators, in addition to all compatible inputs nodes  x|u and x for which CPT  x|u and evidence table x are assigned to cluster C.

Evaluating and Differentiating Arithmetic Circuits Evaluating an arithmetic circuit: upward-pass Computing the circuit derivatives: downward-pass For each circuit node v, there are tow registers vr(v) and dr(v), In the upward–pass, we evaluate the circuit by setting the values of vr(v) registers, and in the downward–pass, we differentiate the circuit by setting the values of dr(v) registers. Initialization: dr(v) is initialized to zero except for root v where dr(v) = 1. Upward–pass: At node v, compute the value of v and store it in vr(v). Downward–pass: At node v and for each parent p, increment dr(v) by —dr(p) if p is an addition node; —dr(p)  v’ vr(v’) if p is a multiplication node, where v’ are the other children of p.

An Example

Summary The efficient computation of answers to probabilistic queries posed to BNs. We can compile a BN into a multivariate polynomial and then computes the partial derivatives of this polynomial with respect to each variable. Once such derivatives are made available, we can compute in constant-time answers to a large class of probabilistic queries The network polynomial itself is exponential in size, but this paper shows how it can be computed efficiently using an arithmetic circuit that can be evaluated and differentiated in time and space linear in the circuit size.

Questions