Download presentation
Presentation is loading. Please wait.
Published byAsher McDonald Modified over 9 years ago
1
Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making 2007 Bayesian networks Variable Elimination Based on Nir Friedman’s course (Hebrew University)
2
Automated Planning and Decision Making In previous lessons we introduced compact representations of probability distributions: ○Bayesian Networks A network describes a unique probability distribution P How do we answer queries about P ? The process of computing answers to these queries is called probabilistic inference
3
Automated Planning and Decision Making Queries: Likelihood There are many types of queries we might ask. Most of these involve evidence ○An evidence e is an assignment of values to a set E of variables in the domain ○Without loss of generality E = { X k+1, …, X n } Simplest query: compute probability of evidence This is often referred to as computing the likelihood of the evidence
4
Automated Planning and Decision Making Queries: A posteriori belief Often we are interested in the conditional probability of a variable given the evidence This is the a posteriori belief in X, given evidence e A related task is computing the term P(X, e) ○i.e., the likelihood of e and X = x for values of X ○we can recover the a posteriori belief by
5
Automated Planning and Decision Making A posteriori belief This query is useful in many cases: Prediction: what is the probability of an outcome given the starting condition ○Target is a descendent of the evidence Diagnosis: what is the probability of disease/fault given symptoms ○Target is an ancestor of the evidence As we shall see, the direction between variables does not restrict the directions of the queries ○Probabilistic inference can combine evidence form all parts of the network
6
Automated Planning and Decision Making Queries: A posteriori joint In this query, we are interested in the conditional probability of several variables, given the evidence P(X, Y, … | e ) Note that the size of the answer to query is exponential in the number of variables in the joint
7
Automated Planning and Decision Making Queries: MAP In this query we want to find the maximum a posteriori assignment for some variable of interest (say X 1, …,X l ) That is, x 1, …,x l maximize the probability P(x 1, …,x l | e) Note that this is equivalent to maximizing P(x 1, …,x l, e)
8
Automated Planning and Decision Making Queries: MAP We can use MAP for: Classification ○find most likely label, given the evidence Explanation ○What is the most likely scenario, given the evidence
9
Automated Planning and Decision Making Queries: MAP Cautionary note: The MAP depends on the set of variables Example: ○MAP of X is 1, ○MAP of (X, Y) is (0,0)
10
Automated Planning and Decision Making Complexity of Inference Theorem: Computing P(X = x) in a Bayesian network is NP-hard Not surprising, since we can simulate Boolean gates.
11
Automated Planning and Decision Making Proof We reduce 3-SAT to Bayesian network computation Assume we are given a 3-SAT problem: q 1, …,q n be propositions, 1,..., k be clauses, such that i = l i1 l i2 l i3 where each l ij is a literal over q 1, …,q n = 1 ... k We will construct a network s.t. P(X=t) > 0 iff is satisfiable
12
Automated Planning and Decision Making... P(Q i = true) = 0.5, P( I = true | Q i, Q j, Q l ) = 1 iff Q i, Q j, Q l satisfy the clause I A 1, A 2, …, are simple binary and gates... 11 Q1Q1 Q3Q3 Q2Q2 Q4Q4 QnQn 22 33 kk A1A1 k-1 A2A2 X A k/2-1
13
Automated Planning and Decision Making It is easy to check ○Polynomial number of variables ○Each CPDs can be described by a small table (8 parameters at most) ○P(X = true) > 0 if and only if there exists a satisfying assignment to Q 1, …,Q n Conclusion: polynomial reduction of 3-SAT
14
Automated Planning and Decision Making Note: this construction also shows that computing P(X = t) is harder than NP 2 n P(X = t) is the number of satisfying assignments to Thus, it is #P-hard (in fact it is #P- complete)
15
Automated Planning and Decision Making Hardness - Notes We used deterministic relations in our construction The same construction works if we use (1- , ) instead of (1,0) in each gate for any < 0.5 Hardness does not mean we cannot solve inference ○It implies that we cannot find a general procedure that works efficiently for all networks ○For particular families of networks, we can have provably efficient procedure
16
Automated Planning and Decision Making Inference in Simple Chains How do we compute P(X 2 ) ? X1X1 X2X2
17
Automated Planning and Decision Making Inference in Simple Chains (cont.) How do we compute P(X 3 ) ? we already know how to compute P(X 2 )... X1X1 X2X2 X3X3
18
Automated Planning and Decision Making Inference in Simple Chains (cont.) How do we compute P(X n ) ? Compute P(X 1 ), P(X 2 ), P(X 3 ), … We compute each term by using the previous one Complexity: Each step costs O(|Val(X i )|*|Val(X i+1 )|) operations Compare to naïve evaluation, that requires summing over joint values of n-1 variables X1X1 X2X2 X3X3 XnXn...
19
Automated Planning and Decision Making Inference in Simple Chains (cont.) Suppose that we observe the value of X 2 =x 2 How do we compute P(X 1 |x 2 ) ? ○Recall that it suffices to compute P(X 1,x 2 ) X1X1 X2X2
20
Automated Planning and Decision Making Inference in Simple Chains (cont.) Suppose that we observe the value of X 3 =x 3 How do we compute P(X 1,x 3 ) ? How do we compute P(x 3 |x 1 ) ? X1X1 X2X2 X3X3
21
Automated Planning and Decision Making Inference in Simple Chains (cont.) Suppose that we observe the value of X n =x n How do we compute P(X 1,x n ) ?... X1X1 X2X2 X3X3 XnXn
22
Automated Planning and Decision Making Inference in Simple Chains (cont.) We compute P(x n |x n-1 ), P(x n |x n-2 ), … iteratively X1X1 X2X2 X3X3 XnXn
23
Automated Planning and Decision Making Inference in Simple Chains (cont.) Suppose that we observe the value of X n =x n We want to find P(X k |x n ) How do we compute P(X k,x n ) ? We compute P(X k ) by forward iterations We compute P(x n | X k ) by backward iterations X1X1 X2X2 XkXk XnXn...
24
Automated Planning and Decision Making Elimination in Chains We now try to understand the simple chain example using first-order principles Using definition of probability, we have ABC E D
25
Automated Planning and Decision Making Elimination in Chains By chain decomposition, we get ABC E D
26
Automated Planning and Decision Making Elimination in Chains Rearranging terms... ABC E D
27
Automated Planning and Decision Making Elimination in Chains Now we can perform innermost summation This summation, is exactly the first step in the forward iteration we describe before ABC E D X
28
Automated Planning and Decision Making Elimination in Chains Rearranging and then summing again, we get ABC E D X X
29
Automated Planning and Decision Making Elimination in Chains with Evidence Similarly, we understand the backward pass We write the query in explicit form ABC E D
30
Automated Planning and Decision Making Elimination in Chains with Evidence Eliminating d, we get ABC E D X
31
Automated Planning and Decision Making Elimination in Chains with Evidence Eliminating c, we get ABC E D X X
32
Automated Planning and Decision Making Elimination in Chains with Evidence Finally, we eliminate b ABC E D X X X
33
Automated Planning and Decision Making Variable Elimination General idea: Write query in the form Iteratively ○Move all irrelevant terms outside of innermost sum ○Perform innermost sum, getting a new term ○Insert the new term into the product
34
Automated Planning and Decision Making A More Complex Example Visit to Asia Smoking Lung Cancer Tuberculosis Abnormality in Chest Bronchitis X-Ray Dyspnea “Asia” network:
35
Automated Planning and Decision Making V S L T A B XD We want to compute P(d) Need to eliminate: v,s,x,t,l,a,b Initial factors
36
Automated Planning and Decision Making V S L T A B XD We want to compute P(d) Need to eliminate: v,s,x,t,l,a,b Initial factors Eliminate: v Note: f v (t) = P(t) In general, result of elimination is not necessarily a probability term Compute:
37
Automated Planning and Decision Making V S L T A B XD We want to compute P(d) Need to eliminate: s,x,t,l,a,b Initial factors Eliminate: s Summing on s results in a factor with two arguments f s (b,l) In general, result of elimination may be a function of several variables Compute:
38
Automated Planning and Decision Making We want to compute P(d) Need to eliminate: x,t,l,a,b Initial factors Eliminate: x Note: f x (a) = 1 for all values of a !! Compute: V S L T A B XD
39
Automated Planning and Decision Making We want to compute P(d) Need to eliminate: t,l,a,b Initial factors Eliminate: t Compute: V S L T A B XD
40
Automated Planning and Decision Making We want to compute P(d) Need to eliminate: l,a,b Initial factors Eliminate: l Compute: V S L T A B XD
41
Automated Planning and Decision Making We want to compute P(d) Need to eliminate: b Initial factors Eliminate: a,b Compute: V S L T A B XD a b a b dbfdf),()( x l a badpafbafdbf),|()(),(),(
42
Automated Planning and Decision Making Variable Elimination We now understand variable elimination as a sequence of rewriting operations Actual computation is done in elimination step Computation depends on order of elimination ○We will return to this issue in detail
43
Automated Planning and Decision Making Dealing with evidence How do we deal with evidence? Suppose get evidence V = t, S = f, D = t We want to compute P(L, V = t, S = f, D = t) V S L T A B XD
44
Automated Planning and Decision Making Dealing with Evidence We start by writing the factors: Since we know that V = t, we don’t need to eliminate V Instead, we can replace the factors P(V) and P(T|V) with These “select” the appropriate parts of the original factors given the evidence Note that f p(V) is a constant, and thus does not appear in elimination of other variables V S L T A B XD
45
Automated Planning and Decision Making Dealing with Evidence Given evidence V = t, S = f, D = t Compute P(L, V = t, S = f, D = t ) Initial factors, after setting evidence: V S L T A B XD
46
Automated Planning and Decision Making Dealing with Evidence Given evidence V = t, S = f, D = t Compute P(L, V = t, S = f, D = t ) Initial factors, after setting evidence: Eliminating x, we get V S L T A B XD
47
Automated Planning and Decision Making Dealing with Evidence Given evidence V = t, S = f, D = t Compute P(L, V = t, S = f, D = t ) Initial factors, after setting evidence: Eliminating x, we get Eliminating t, we get V S L T A B XD
48
Automated Planning and Decision Making Dealing with Evidence Given evidence V = t, S = f, D = t Compute P(L, V = t, S = f, D = t ) Initial factors, after setting evidence: Eliminating x, we get Eliminating t, we get Eliminating a, we get V S L T A B XD
49
Automated Planning and Decision Making Given evidence V = t, S = f, D = t Compute P(L, V = t, S = f, D = t ) Initial factors, after setting evidence: Eliminating x, we get Eliminating t, we get Eliminating a, we get Eliminating b, we get Dealing with Evidence V S L T A B XD
50
Automated Planning and Decision Making Complexity of variable elimination Suppose in one elimination step we compute This requires multiplications ○For each value for x, y 1, …, y k, we do m multiplications additions ○For each value of y 1, …, y k, we do |Val(X)| additions Complexity is exponential in number of variables in the intermediate factor.
51
Automated Planning and Decision Making Understanding Variable Elimination We want to select “good” elimination orderings that reduce complexity We start by attempting to understand variable elimination via the graph we are working with This will reduce the problem of finding good ordering to a graph-theoretic operation that is well-understood
52
Automated Planning and Decision Making Undirected graph representation At each stage of the procedure, we have an algebraic term that we need to evaluate In general this term is of the form: where Z i are sets of variables We now plot a graph where there is undirected edge X--Y if X,Y are arguments of some factor ○that is, if X,Y are in some Z i
53
Automated Planning and Decision Making Undirected Graph Representation Consider the “Asia” example The initial factors are thus, the undirected graph is In the first step this graph is just the moralized graph V S L T A B XD V S L T A B XD
54
Automated Planning and Decision Making Undirected Graph Representation Now we eliminate t, getting The corresponding change in the graph is V S L T A B XD V S L T A B XD
55
Automated Planning and Decision Making Example Want to compute P(L, V = t, S = f, D = t) Moralizing L T A B X V S D V S L T A B XD
56
Automated Planning and Decision Making Example Want to compute P(L, V = t, S = f, D = t) Moralizing Setting evidence L T A B X V S D V S L T A B XD
57
Automated Planning and Decision Making Example Want to compute P(L, V = t, S = f, D = t) Moralizing Setting evidence Eliminating x ○New factor f x (A) L T A B X V S D V S L T A B XD
58
Automated Planning and Decision Making Example Want to compute P(L, V = t, S = f, D = t) Moralizing Setting evidence Eliminating x Eliminating a ○New factor f a (b,t,l) L T A B X V S D V S L T A B XD
59
Automated Planning and Decision Making Example Want to compute P(L, V = t, S = f, D = t) Moralizing Setting evidence Eliminating x Eliminating a Eliminating b ○New factor f b (t,l) L T A B X V S D V S L T A B XD
60
Automated Planning and Decision Making Example Want to compute P(L, V = t, S = f, D = t) Moralizing Setting evidence Eliminating x Eliminating a Eliminating b Eliminating t ○New factor f t (l) L T A B X V S D V S L T A B XD
61
Automated Planning and Decision Making Elimination in Undirected Graphs Generalizing, we see that we can eliminate a variable x by 1. For all Y,Z, s.t., Y--X, Z--X add an edge Y--Z 2. Remove X and all adjacent edges to it This procedure creates a clique that contains all the neighbors of X After step 1 we have a clique that corresponds to the intermediate factor (before marginalization) The cost of the step is exponential in the size of this clique
62
Automated Planning and Decision Making Undirected Graphs The process of eliminating nodes from an undirected graph gives us a clue to the complexity of inference To see this, we will examine the graph that contains all of the edges we added during the elimination. The resulting graph is always chordal.
63
Automated Planning and Decision Making Example Want to compute P(L) Moralizing L T A B X V S D V S L T A B XD
64
Automated Planning and Decision Making Example Want to compute P(L) Moralizing Eliminating v ○Multiply to get f ’ v (v,t) ○Result f v (t) L T A B X V S D V S L T A B XD
65
Automated Planning and Decision Making Example Want to compute P(L) Moralizing Eliminating v Eliminating x ○Multiply to get f ’ x (a,x) ○Result f x (a) L T A B X V S D V S L T A B XD
66
Automated Planning and Decision Making Example Want to compute P(L) Moralizing Eliminating v Eliminating x Eliminating s ○Multiply to get f ’ s (l,b,s) ○Result f s (l,b) L T A B X V S D V S L T A B XD
67
Automated Planning and Decision Making Example Want to compute P(D) Moralizing Eliminating v Eliminating x Eliminating s Eliminating t ○Multiply to get f ’ t (a,l,t) ○Result f t (a,l) L T A B X V S D V S L T A B XD
68
Automated Planning and Decision Making Example Want to compute P(D) Moralizing Eliminating v Eliminating x Eliminating s Eliminating t Eliminating l ○Multiply to get f ’ l (a,b,l) ○Result f l (a,b) L T A B X V S D V S L T A B XD
69
Automated Planning and Decision Making Example Want to compute P(D) Moralizing Eliminating v Eliminating x Eliminating s Eliminating t Eliminating l Eliminating a, b ○Multiply to get f ’ a (a,b,d) ○Result f(d) L T A B X V S D V S L T A B XD
70
Automated Planning and Decision Making The resulting graph is the induced graph (for this particular ordering) Main property: ○Every maximal clique in the induced graph corresponds to a intermediate factor in the computation ○Every factor stored during the process is a subset of some maximal clique in the graph These facts are true for any variable elimination ordering on any network Expanded Graphs L T A B X V S D
71
Automated Planning and Decision Making Induced Width (Treewidth) The size of the largest clique in the induced graph is thus an indicator for the complexity of variable elimination This quantity (minus one) is called the induced width (or treewidth) of a graph according to the specified ordering Finding a good ordering for a graph is equivalent to finding the minimal induced width of the graph
72
Automated Planning and Decision Making Consequence: Elimination on Trees Suppose we have a tree ○A network where each variable has at most one parent All the factors involve at most two variables Thus, the moralized graph is also a tree A C B D E FG A C B D E FG
73
Automated Planning and Decision Making Elimination on Trees We can maintain the tree structure by eliminating extreme variables in the tree A C B D E FG A C B D E FG A C B D E FG
74
Automated Planning and Decision Making Elimination on Trees Formally, for any tree, there is an elimination ordering with treewidth = 1 Theorem Inference on trees is linear in number of variables
75
Automated Planning and Decision Making PolyTrees A polytree is a network where there is at most one path from one variable to another Theorem: Inference in a polytree is linear in the representation size of the network ○This assumes tabular CPT representation Can you see how the argument would work? A C B D E FG H
76
Automated Planning and Decision Making General Networks What do we do when the network is not a polytree? If network has a cycle, the treewidth for any ordering is greater than 1
77
Automated Planning and Decision Making Example Eliminating A, B, C, D, E,…. Resulting graph is chordal with treewidth 2 A H B D F C E G A H B D F C E G A H B D F C E G A H B D F C E G A H B D F C E G
78
Automated Planning and Decision Making Example Eliminating H,G, E, C, F, D, E, A A H B D F C E G A H B D F C E G A H B D F C E G A H B D F C E G A H B D F C E G
79
Automated Planning and Decision Making General Networks From graph theory: Theorem: Finding an ordering that minimizes the treewidth is NP-Hard However, There are reasonable heuristics for finding “relatively” good ordering There are provable approximations to the best treewidth If the graph has a small treewidth, there are algorithms that find it in polynomial time
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.