Arthur Choi and Adnan Darwiche UCLA {aychoi,darwiche}@cs.ucla.edu A Variational Approach for Approximating Bayesian Networks by Edge Deletion Arthur Choi and Adnan Darwiche UCLA {aychoi,darwiche}@cs.ucla.edu Slides used for plenary presentation at UAI-06. Updated 09/21/2006.
The Idea A C B D A B C D Approximate inference: Exact inference in an approximate model Approximate model: by deleting edges
The Idea A C B D A B Y X C D Approximate inference: Exact inference in an approximate model Approximate model: by deleting edges Specifying Auxiliary Parameters Method 1: BP Method 2: KL
The Idea Original Network Approximate Network
Deleting an Edge U X
Deleting an Edge: The Clone U U' X
Deleting an Edge: The Soft Evidence U New edge parameters for each new query. s' U' X
Specifying the Approximation How do we parametrize edges? Compensate for the missing edge Quality of approximation Which edges do we delete? Computational complexity
A First Approach: ED-BP (Edge Deletion-Belief Propagation) Choose parameters that satisfy: U s' U' X Can be used as update equations: Initialize parameters randomly Iterate until fixed point is reached To be presented at AAAI-06.
Belief Propagation as Edge Deletion Theorem: IBP corresponds to ED-BP U s' U' X
Belief Propagation as Edge Deletion IBP in the original network ED-BP in a disconnected approximation To be presented at AAAI-06.
Edge Recovery using Mutual Information MI(U;U'|e') U s' U' X
A First Approach: ED-BP (Edge Deletion-Belief Propagation) How do we parametrize edges? Subsumes BP as a degenerate case. Which edges do we delete? Recover edges using mutual information
A Second Approach Based on the KL-Divergence
An Simple Bound on The KL-Divergence X U X U' A Bayesian network An approximation
An Simple Bound on The KL-Divergence X U X U' U X U' qu'|u = 1 iff u' = u A Bayesian network An extended network An approximation
Identifying Edge Parameters: ED-KL Theorem 1: Edge parameters are a stationary point of the KL-divergence if and only if: U X U' s'
Identifying Edge Parameters: ED-KL Theorem 1: Edge parameters are a stationary point of the KL-divergence if and only if: U X U' s' Theorem 2: Edge parameters are a stationary point of the KL-divergence if and only if:
Deleting a Single Edge When a single edge is deleted, we can: kl1 When a single edge is deleted, we can: compute KL-divergence efficiently. iterate efficiently.
Identifying Edges to Delete kl4 kl1 kl2 kl5 kl3 kl6
Comparing ED-BP & ED-KL ED-BP characterized by: ED-KL characterized by:
Quality of Approximation Disconnected Approximation Exact Inference
Quality of Approximation Belief Propagation
Quality of Approximation Belief Propagation
Quality of Approximation
Quality of Approximation, Extreme Cases
Approximating MAP Consider the MAP explanation: MAP is hard even when marginals are easy! P(e): complexity in treewidth, MAP: complexity in constrained treewidth. Delete edges to reduce constrained treewidth!
Quality of MAP Approximations
Quality of MAP Approximations
Quality of MAP Approximations
Complexity of Approximation
Summary Approximate Inference Parametrizing Deleted Edges: Exact inference in an approximate model. Tradeoff approximation quality with computational resources by deleting edges. Parametrizing Deleted Edges: ED-BP: Subsumes belief propagation. (New understanding of belief propagation) ED-KL: A variational approach. Choosing Which Edges to Delete: ED-BP: Edge recovery in terms of mutual information. ED-KL: Delete edges by (single-edge) KL. ED-BP + Delete edges by KL: surprisingly good!