Presentation is loading. Please wait.

Presentation is loading. Please wait.

Two Approximate Algorithms for Belief Updating Mini-Clustering - MC Robert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating",

Similar presentations


Presentation on theme: "Two Approximate Algorithms for Belief Updating Mini-Clustering - MC Robert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating","— Presentation transcript:

1 Two Approximate Algorithms for Belief Updating Mini-Clustering - MC Robert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating", AAAI-2002 Iterative Join-Graph Propagation - IJGP Rina Dechter, Kalev Kask and Robert Mateescu. "Iterative Join- Graph Propagation”, UAI 2002

2 What is Mini-Clustering? Mini-Clustering (MC) is an approximate algorithm for belief updating in Bayesian networks MC is an anytime version of join-tree clustering MC applies message passing along a cluster tree The complexity of MC is controlled by a user-adjustable parameter, the i-bound Empirical evaluation shows that MC is a very effective algorithm, in many cases superior to other approximate schemes (IBP, Gibbs Sampling)

3 Motivation Probabilistic reasoning using belief networks is known to be NP-hard Nevertheless, approximate inference can be a powerful tool for decision making under uncertainty We propose an anytime version of Cluster Tree Elimination

4 Outline Preliminaries Belief networks Tree decompositions Tree Clustering algorithm Mini-Clustering algorithm Experimental results

5 The belief updating problem is the task of computing the posterior probability P(Y|e) of query nodes Y  X given evidence e. We focus on the basic case where Y is a single variable X i G E F CD B A Belief networks

6 Tree decompositions

7 A B C p(a), p(b|a), p(c|a,b) B C D F p(d|b), p(f|c,d) B E F p(e|b,f) E F G p(g|e,f) EF BF BC G E F CD B A Belief networkTree decomposition

8 G E F C D B A Example: Join-tree A B C p(a), p(b|a), p(c|a,b) B C D F p(d|b), p(f|c,d) B E F p(e|b,f) E F G p(g|e,f) EF BF BC

9 Cluster Tree Elimination Cluster Tree Elimination (CTE) is an exact algorithm that works by passing messages along a tree decomposition Basic idea: Each node sends only one message to each of its neighbors Node u sends a message to its neighbor v only when u received messages from all its other neighbors Previous work on tree clustering: Lauritzen, Spiegelhalter - ‘88 (probabilities) Jensen, Lauritzen, Olesen - ‘90 (probabilities) Shenoy, Shafer - ‘90, Shenoy - ‘97 (general) Dechter, Pearl - ‘89 (constraints) Gottlob, Leone, Scarello - ‘00 (constraints)

10 Cluster Tree Elimination Cluster Tree Elimination (CTE) is an exact algorithm It works by passing messages along a tree decomposition Basic idea: Each node sends only one message to each of its neighbors Node u sends a message to its neighbor v only when u received messages from all its other neighbors

11 Cluster Tree Elimination Previous work on tree clustering: Lauritzen, Spiegelhalter - ‘88 (probabilities) Jensen, Lauritzen, Olesen - ‘90 (probabilities) Shenoy, Shafer - ‘90, Shenoy - ‘97 (general) Dechter, Pearl - ‘89 (constraints) Gottlob, Leone, Scarello - ‘00 (constraints)

12 u v x1x1 x2x2 xnxn Belief Propagation h(u,v)

13 u v x1x1 x2x2 xnxn Belief Propagation h(u,v)

14 ABC 2 4 1 3 BEF EFG EF BF BC BCDF G E F CD B A Cluster Tree Elimination - example

15 Cluster Tree Elimination - the messages A B C p(a), p(b|a), p(c|a,b) B C D F p(d|b), p(f|c,d) h (1,2) (b,c) B E F p(e|b,f), h (2,3) (b,f) E F G p(g|e,f) 2 4 1 3 EF BC BF sep(2,3)={B,F} elim(2,3)={C,D}

16 Cluster Tree Elimination - properties Correctness and completeness: Algorithm CTE is correct, i.e. it computes the exact joint probability of a single variable and the evidence. Time complexity: O ( deg  (n+N)  d w*+1 ) Space complexity: O ( N  d sep ) wheredeg = the maximum degree of a node n = number of variables (= number of CPTs) N = number of nodes in the tree decomposition d = the maximum domain size of a variable w* = the induced width sep = the separator size

17 Mini-Clustering - motivation Time and space complexity of Cluster Tree Elimination depend on the induced width w* of the problem When the induced width w* is big, CTE algorithm becomes infeasible

18 Mini-Clustering - the basic idea Try to reduce the size of the cluster (the exponent); partition each cluster into mini-clusters with less variables Accuracy parameter i = maximum number of variables in a mini-cluster The idea was explored for variable elimination (Mini- Bucket)

19 Mini-Clustering Motivation: Time and space complexity of Cluster Tree Elimination depend on the induced width w* of the problem When the induced width w* is big, CTE algorithm becomes infeasible The basic idea: Try to reduce the size of the cluster (the exponent); partition each cluster into mini-clusters with less variables Accuracy parameter i = maximum number of variables in a mini-cluster The idea was explored for variable elimination (Mini-Bucket)

20 Suppose cluster(u) is partitioned into p mini-clusters: mc(1),…,mc(p), each containing at most i variables TC computes the ‘exact’ message: We want to process each  f  mc(k) f separately Mini-Clustering

21 Approximate each  f  mc(k) f, k=2,…,p and take it outside the summation How to process the mini-clusters to obtain approximations or bounds: Process all mini-clusters by summation - this gives an upper bound on the joint probability A tighter upper bound: process one mini-cluster by summation and the others by maximization Can also use mean operator (average) - this gives an approximation of the joint probability

22 Split a cluster into mini-clusters =>bound complexity Idea of Mini-Clustering

23 EF BF BC ABC 2 4 1 3 BEF EFG BCDF Mini-Clustering - example

24 Mini-Clustering - the messages, i=3 A B C p(a), p(b|a), p(c|a,b) B C D p(d|b), h (1,2) (b,c) C D F p(f|c,d) B E F p(e|b,f), h 1 (2,3) (b), h 2 (2,3) (f) E F G p(g|e,f) 2 4 1 3 EF BC BF sep(2,3)={B,F} elim(2,3)={C,D}

25 Cluster Tree Elimination vs. Mini-Clustering ABC 2 4 1 3 BEF EFG EF BF BC BCDF ABC 2 4 1 3 BEF EFG EF BF BC BCDF

26 Mini-Clustering Correctness and completeness: Algorithm MC(i) computes a bound (or an approximation) on the joint probability P(X i,e) of each variable and each of its values. Time & space complexity: O(n  hw*  d i ) where hw* = max u | {f | f   (u)   } |

27 Normalization Algorithms for the belief updating problem compute, in general, the joint probability: Computing the conditional probability: is easy to do if exact algorithms can be applied becomes an important issue for approximate algorithms

28 MC can compute an (upper) bound on the joint P(X i,e) Deriving a bound on the conditional P(X i |e) is not easy when the exact P(e) is not available If a lower bound would be available, we could use: as an upper bound on the posterior In our experiments we normalized the results and regarded them as approximations of the posterior P(X i |e) Normalization

29 Experimental results We tested MC with max and mean operators Algorithms: Exact IBP Gibbs sampling (GS) MC with normalization (approximate) Networks (all variables are binary): Coding networks CPCS 54, 360, 422 Grid networks (MxM) Random noisy-OR networks Random networks

30 Experimental results Measures: Normalized Hamming Distance pick most likely value (for exact and for approximate) take ratio between number of disagreements and total number of variables average over problems BER (Bit Error Rate) - for coding networks Absolute error difference between exact and the approximate, averaged over all values, all variables, all problems Relative error difference between exact and the approximate, divided by the exact, averaged over all values, all variables, all problems Time

31 Experimental results Algorithms: Exact IBP Gibbs sampling (GS) MC with normalization (approximate) Networks (all variables are binary): Coding networks CPCS 54, 360, 422 Grid networks (MxM) Random noisy-OR networks Random networks We tested MC with max and mean operators Measures: Normalized Hamming Distance (NHD) BER (Bit Error Rate) Absolute error Relative error Time

32 Random networks - Absolute error evidence=0evidence=10

33 Coding networks - Bit Error Rate sigma=0.22sigma=.51

34 Noisy-OR networks - Absolute error evidence=10evidence=20

35 CPCS422 - Absolute error evidence=0evidence=10

36 Grid 15x15 - 0 evidence

37 Grid 15x15 - 10 evidence

38 Grid 15x15 - 20 evidence

39 N=100, P=3, w*=7 Coding Networks 1

40 N=100, P=4, w*=11 Coding Networks 2

41 CPCS54 - w*=15

42 N=50, P=2, w*=10 Noisy-OR Networks 1

43 N=50, P=3, w*=16 Noisy-OR Networks 2

44 N=50, P=2, w*=10 Random Networks 1

45 N=50, P=3, w*=16 Random Networks 2

46 Conclusion MC extends the partition based approximation from mini-buckets to general tree decompositions for the problem of belief updating Empirical evaluation demonstrates its effectiveness and superiority (for certain types of problems, with respect to the measures considered) relative to other existing algorithms


Download ppt "Two Approximate Algorithms for Belief Updating Mini-Clustering - MC Robert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating","

Similar presentations


Ads by Google