Many-Pairs Mutual Information for Adding Structure to Belief Propagation Approximations Arthur Choi and Adnan Darwiche University of California, Los Angeles
Many-Pairs Mutual Information X Y mutual information
d-Separation If X and Y are d-separated by Z then X and Y are independent given Z Earthquake? (E) Burglary? (B) Alarm? (A) Call? (C) Radio? (R) Are R and B d-separated by E?
d-Separation Each path is a pipe. Each variable is a valve. A valve is either open or closed. Are R and B d-separated by A? Earthquake? (E) Burglary? (B) Alarm? (A) Call? (C) Radio? (R)
d-Separation Earthquake? (E) Burglary? (B) Alarm? (A) Call? (C) Radio? (R) W Sequential Valve
d-Separation Earthquake? (E) Burglary? (B) Alarm? (A) Radio? (R) W Call? (C) Divergent Valve
d-Separation Earthquake? (E) Burglary? (B) Alarm? (A) Call? (C) Radio? (R) Convergent Valve W
d-Separation Earthquake? (E) Burglary? (B) Alarm? (A) Call? (C) Radio? (R) Are R and B d-separated by E? E is closed. A is closed. R and B is d-separated.
d-Separation E is open. A is open. R and B are not d-separated. Earthquake? (E) Burglary? (B) Alarm? (A) Call? (C) Radio? (R) Are R and B d-separated by A?
d-Separation Earthquake? (E) Burglary? (B) Alarm? (A) Call? (C) Radio? (R) What if E or A are “nearly” closed ? Are R and B “nearly” independent ?
Mutual Information and Entropy Mutual Information: non-negative; zero iff X and Y are ind. given z
d-Separation versus MI d-Separation hard outcomes graphical test no inference needed efficient Mutual Information soft outcomes non-graphical requires inference joint marginals on pairs of variables many-pairs MI is difficult
d-Separation versus MI d-Separation hard outcomes graphical test no inference needed efficient Mutual Information soft outcomes non-graphical requires inference joint marginals on pairs of variables many-pairs MI is difficult soft d-Separation (in polytrees) combine advantages of d-Separation and MI graphical test with soft outcomes
Mutual Information and Entropy Mutual Information: non-negative; zero iff X and Y are ind. given z Entropy: non-negative; zero iff X is fixed; maximized by uniform distribution
Soft d-Separation in Polytrees W Sequential Valve Theorem 1: MI(X;Y | z) ENT(W | z) XWY
Soft d-Separation in Polytrees W Divergent Valve Theorem 1: MI(X;Y | z) ENT(W | z) X W Y
Soft d-Separation in Polytrees N1N1 WN1N1 Convergent Valve Theorem 2: MI(X;Y | z) MI(N 1 ;N 2 | z) XN1N1 W N2N2 Y
Soft d-Separation in Polytrees soft d-separation X W1W1 W2W2 W3W3 W4W4 W5W5 W6W6 Y sd-sep(X,z,Y) = 0if X and Y disconnected = MI(X;Y|z)if X and Y are adjacent = smallest valve bound, otherwise
Soft d-Separation in Polytrees soft d-separation X W1W1 W2W2 W3W3 W4W4 W5W5 W6W6 Y sd-sep(X,z,Y) = 0if X and Y disconnected = MI(X;Y|z)if X and Y are adjacent = smallest valve bound, otherwise MI(X;Y|z) sd-sep(X,z,Y)
d-Separation vs. MI vs. soft d-sep d-Separation hard outcomes graphical test no inference needed efficient MI soft outcomes non-graphical requires inference joint marginals on pairs of variables many-pairs MI is difficult soft d-sep soft outcomes graphical test requires inference family and node marginals efficient in polytrees
Many-Pairs Mutual Information Mutual information can be expensive, even in polytrees Bayesian network n variables, at most w parents and s states One run of BP: O(ns w ) time single pair: MI: O(s) runs of BP, O(s ns w ) time Pr(X,Y|z) = Pr(X|Y,z) Pr(Y|z) sd-sep: one run of BP, O(n + ns w ) time k-pairs: MI: O(ks) runs of BP, O(ks ns w ) time sd-sep: one run of BP, O(kn + ns w ) time
Application: ED-BP Loopy BP marginals Exact Inference ED-BP networks: [CD06] recover edges: mutual information
Empirical Analysis soft d-separation versus true MI Start with polytree ED-BP approximation (equivalently, run loopy BP) Score deleted edges by sd-sep and true-MI efficiency important here Recover the highest ranking edges approximation accuracy important here
Empirical Analysis edge rank (true MI) alarm true-MI edge rank (true MI) alarm true-MI
Empirical Analysis edge rank (true MI) alarm true-MI sd-sep edge rank (true MI) alarm true-MI sd-sep
Empirical Analysis x edges recovered average KL-error alarm random
Empirical Analysis x edges recovered average KL-error alarm random true-MI
Empirical Analysis x edges recovered average KL-error alarm random true-MI sd-sep
Empirical Analysis edge rank (true MI) pigs true-MI sd-sep edge rank (true MI) pigs true-MI sd-sep
Empirical Analysis x edges recovered average KL-error pigs random true-MI sd-sep
Empirical Analysis networkmethod0%10%20%rank time# deleted# params barleyrandom115ms120ms141ms0ms MI111ms93ms2999ms sd-sep110ms125ms46ms65.84x diabetesrandom732ms1103ms1651ms0ms MI550ms674ms84604ms sd-sep957ms1639ms132ms641.99x mildewrandom238ms241ms243ms0ms MI233ms263ms6661ms sd-sep245ms323ms42ms157.26x munin1random13ms14ms22ms0ms MI12ms10ms680ms sd-sep10ms 35ms19.57x
Alternative Proposals & Extensions Extensions to general networks convergent valves problematic look at node-disjoint paths Extensions to undirected models entropy bounds on nodes find separating set with minimum aggregate bound optimal solution via network flows easier to generalize, bounds not as tight
Thanks!