Presentation is loading. Please wait.

Presentation is loading. Please wait.

. Inference I Introduction, Hardness, and Variable Elimination Slides by Nir Friedman.

Similar presentations


Presentation on theme: ". Inference I Introduction, Hardness, and Variable Elimination Slides by Nir Friedman."— Presentation transcript:

1 . Inference I Introduction, Hardness, and Variable Elimination Slides by Nir Friedman

2 u In previous lessons we introduced compact representations of probability distributions: l Bayesian Networks l Markov Networks  A network describes a unique probability distribution P  How do we answer queries about P ? u We use inference as a name for the process of computing answers to such queries

3 Queries: Likelihood u There are many types of queries we might ask. u Most of these involve evidence An evidence e is an assignment of values to a set E variables in the domain Without loss of generality E = { X k+1, …, X n }  Simplest query: compute probability of evidence u This is often referred to as computing the likelihood of the evidence

4 Queries: A posteriori belief u Often we are interested in the conditional probability of a variable given the evidence  This is the a posteriori belief in X, given evidence e  A related task is computing the term P(X, e) i.e., the likelihood of e and X = x for values of X l we can recover the a posteriori belief by

5 A posteriori belief This query is useful in many cases: u Prediction: what is the probability of an outcome given the starting condition l Target is a descendent of the evidence u Diagnosis: what is the probability of disease/fault given symptoms l Target is an ancestor of the evidence u As we shall see, the direction between variables does not restrict the directions of the queries l Probabilistic inference can combine evidence form all parts of the network

6 Queries: A posteriori joint  In this query, we are interested in the conditional probability of several variables, given the evidence P(X, Y, … | e ) u Note that the size of the answer to query is exponential in the number of variables in the joint

7 Queries: MAP  In this query we want to find the maximum a posteriori assignment for some variable of interest (say X 1,…,X l )  That is, x 1,…,x l maximize the probability P(x 1,…,x l | e)  Note that this is equivalent to maximizing P(x 1,…,x l, e)

8 Queries: MAP We can use MAP for: u Classification l find most likely label, given the evidence u Explanation l What is the most likely scenario, given the evidence

9 Queries: MAP Cautionary note: u The MAP depends on the set of variables u Example: MAP of X is 1, MAP of (X, Y) is (0,0)

10 Complexity of Inference Thm: Computing P(X = x) in a Bayesian network is NP- hard Not surprising, since we can simulate Boolean gates.

11 Proof We reduce 3-SAT to Bayesian network computation Assume we are given a 3-SAT problem:  q 1,…,q n be propositions,   1,...,  k be clauses, such that  i = l i1  l i2  l i3 where each l ij is a literal over q 1,…,q n u  =  1 ...  k We will construct a network s.t. P(X=t) > 0 iff  is satisfiable

12 ...  P(Q i = true) = 0.5,  P(  I = true | Q i, Q j, Q l ) = 1 iff Q i, Q j, Q l satisfy the clause  I  A 1, A 2, …, are simple binary or gates... 11 Q1Q1 Q3Q3 Q2Q2 Q4Q4 QnQn 22 33 kk A1A1  k-1 A2A2 X A k/2-1

13 u It is easy to check l Polynomial number of variables l Each CPDs can be described by a small table (8 parameters at most) P(X = true) > 0 if and only if there exists a satisfying assignment to Q 1,…,Q n u Conclusion: polynomial reduction of 3-SAT

14 Note: this construction also shows that computing P(X = t) is harder than NP  2 n P(X = t) is the number of satisfying assignments to  u Thus, it is #P-hard (in fact it is #P-complete)

15 Hardness - Notes u We used deterministic relations in our construction  The same construction works if we use (1- ,  ) instead of (1,0) in each gate for any  < 0.5 u Hardness does not mean we cannot solve inference l It implies that we cannot find a general procedure that works efficiently for all networks l For particular families of networks, we can have provably efficient procedures l We will characterize such families in the next two classes

16 Inference in Simple Chains How do we compute P(X 2 ) ? X1X1 X2X2

17 Inference in Simple Chains (cont.) How do we compute P(X 3 ) ?  we already know how to compute P(X 2 )... X1X1 X2X2 X3X3

18 Inference in Simple Chains (cont.) How do we compute P(X n ) ?  Compute P(X 1 ), P(X 2 ), P(X 3 ), … u We compute each term by using the previous one Complexity:  Each step costs O(|Val(X i )|*|Val(X i+1 )|) operations  Compare to naïve evaluation, that requires summing over joint values of n-1 variables X1X1 X2X2 X3X3 XnXn...

19 Inference in Simple Chains (cont.)  Suppose that we observe the value of X 2 =x 2  How do we compute P(X 1 |x 2 ) ? Recall that it suffices to compute P(X 1,x 2 ) X1X1 X2X2

20 Inference in Simple Chains (cont.)  Suppose that we observe the value of X 3 =x 3  How do we compute P(X 1,x 3 ) ?  How do we compute P(x 3 |x 1 ) ? X1X1 X2X2 X3X3

21 Inference in Simple Chains (cont.)  Suppose that we observe the value of X n =x n  How do we compute P(X 1,x n ) ?  We compute P(x n |x n-1 ), P(x n |x n-2 ), … iteratively X1X1 X2X2 X3X3 XnXn...

22 Inference in Simple Chains (cont.)  Suppose that we observe the value of X n =x n  We want to find P(X k |x n )  How do we compute P(X k,x n ) ?  We compute P(X k ) by forward iterations  We compute P(x n | X k ) by backward iterations X1X1 X2X2 XkXk XnXn...

23 Elimination in Chains u We now try to understand the simple chain example using first-order principles u Using definition of probability, we have ABC E D

24 Elimination in Chains u By chain decomposition, we get ABC E D

25 Elimination in Chains u Rearranging terms... ABC E D

26 Elimination in Chains u Now we can perform innermost summation u This summation, is exactly the first step in the forward iteration we describe before ABC E D X

27 Elimination in Chains u Rearranging and then summing again, we get ABC E D X X

28 Elimination in Chains with Evidence u Similarly, we understand the backward pass u We write the query in explicit form ABC E D

29 Elimination in Chains with Evidence  Eliminating d, we get ABC E D X

30 Elimination in Chains with Evidence  Eliminating c, we get ABC E D X X

31 Elimination in Chains with Evidence  Finally, we eliminate b ABC E D X X X

32 Variable Elimination General idea: u Write query in the form u Iteratively l Move all irrelevant terms outside of innermost sum l Perform innermost sum, getting a new term l Insert the new term into the product

33 A More Complex Example Visit to Asia Smoking Lung Cancer Tuberculosis Abnormality in Chest Bronchitis X-Ray Dyspnea u “Asia” network:

34 V S L T A B XD  We want to compute P(d)  Need to eliminate: v,s,x,t,l,a,b Initial factors

35 V S L T A B XD  We want to compute P(d)  Need to eliminate: v,s,x,t,l,a,b Initial factors Eliminate: v Note: f v (t) = P(t) In general, result of elimination is not necessarily a probability term Compute:

36 V S L T A B XD  We want to compute P(d)  Need to eliminate: s,x,t,l,a,b u Initial factors Eliminate: s Summing on s results in a factor with two arguments f s (b,l) In general, result of elimination may be a function of several variables Compute:

37 V S L T A B XD  We want to compute P(d)  Need to eliminate: x,t,l,a,b u Initial factors Eliminate: x Note: f x (a) = 1 for all values of a !! Compute:

38 V S L T A B XD  We want to compute P(d)  Need to eliminate: t,l,a,b u Initial factors Eliminate: t Compute:

39 V S L T A B XD  We want to compute P(d)  Need to eliminate: l,a,b u Initial factors Eliminate: l Compute:

40 V S L T A B XD  We want to compute P(d)  Need to eliminate: b u Initial factors Eliminate: a,b Compute:

41 Variable Elimination u We now understand variable elimination as a sequence of rewriting operations u Actual computation is done in elimination step u Exactly the same computation procedure applies to Markov networks u Computation depends on order of elimination l We will return to this issue in detail

42 Dealing with evidence u How do we deal with evidence?  Suppose get evidence V = t, S = f, D = t  We want to compute P(L, V = t, S = f, D = t) V S L T A B XD

43 Dealing with Evidence u We start by writing the factors:  Since we know that V = t, we don’t need to eliminate V  Instead, we can replace the factors P(V) and P(T|V) with u These “select” the appropriate parts of the original factors given the evidence  Note that f p(V) is a constant, and thus does not appear in elimination of other variables V S L T A B XD

44 Dealing with Evidence  Given evidence V = t, S = f, D = t  Compute P(L, V = t, S = f, D = t ) u Initial factors, after setting evidence: V S L T A B XD

45 Dealing with Evidence  Given evidence V = t, S = f, D = t  Compute P(L, V = t, S = f, D = t ) u Initial factors, after setting evidence:  Eliminating x, we get V S L T A B XD

46 Dealing with Evidence  Given evidence V = t, S = f, D = t  Compute P(L, V = t, S = f, D = t ) u Initial factors, after setting evidence:  Eliminating x, we get  Eliminating t, we get V S L T A B XD

47 Dealing with Evidence  Given evidence V = t, S = f, D = t  Compute P(L, V = t, S = f, D = t ) u Initial factors, after setting evidence:  Eliminating x, we get  Eliminating t, we get  Eliminating a, we get V S L T A B XD

48 Dealing with Evidence  Given evidence V = t, S = f, D = t  Compute P(L, V = t, S = f, D = t ) u Initial factors, after setting evidence:  Eliminating x, we get  Eliminating t, we get  Eliminating a, we get  Eliminating b, we get V S L T A B XD

49 Complexity of variable elimination u Suppose in one elimination step we compute This requires u multiplications For each value for x, y 1, …, y k, we do m multiplications u additions For each value of y 1, …, y k, we do |Val(X)| additions Complexity is exponential in number of variables in the intermediate factor.

50 Understanding Variable Elimination u We want to select “good” elimination orderings that reduce complexity u We start by attempting to understand variable elimination via the graph we are working with u This will reduce the problem of finding good ordering to graph-theoretic operation that is well- understood

51 Undirected graph representation u At each stage of the procedure, we have an algebraic term that we need to evaluate  In general this term is of the form: where Z i are sets of variables  We now plot a graph where there is undirected edge X--Y if X,Y are arguments of some factor that is, if X,Y are in some Z i u Note: this is the Markov network that describes the probability on the variables we did not eliminate yet

52 Undirected Graph Representation u Consider the “Asia” example u The initial factors are u thus, the undirected graph is u In the first step this graph is just the moralized graph V S L T A B XD V S L T A B XD

53 Undirected Graph Representation  Now we eliminate t, getting u The corresponding change in the graph is V S L T A B XD V S L T A B XD

54 Example  Want to compute P(L, V = t, S = f, D = t) u Moralizing V S L T A B XD L T A B X V S D

55 Example  Want to compute P(L, V = t, S = f, D = t) u Moralizing u Setting evidence V S L T A B XD L T A B X V S D

56 Example  Want to compute P(L, V = t, S = f, D = t) u Moralizing u Setting evidence  Eliminating x New factor f x (A) V S L T A B XD L T A B X V S D

57 Example  Want to compute P(L, V = t, S = f, D = t) u Moralizing u Setting evidence  Eliminating x  Eliminating a New factor f a (b,t,l) V S L T A B XD L T A B X V S D

58 Example  Want to compute P(L, V = t, S = f, D = t) u Moralizing u Setting evidence  Eliminating x  Eliminating a  Eliminating b New factor f b (t,l) V S L T A B XD L T A B X V S D

59 Example  Want to compute P(L, V = t, S = f, D = t) u Moralizing u Setting evidence  Eliminating x  Eliminating a  Eliminating b  Eliminating t New factor f t (l) V S L T A B XD L T A B X V S D

60 Elimination in Undirected Graphs u Generalizing, we see that we can eliminate a variable x by 1. For all Y,Z, s.t., Y--X, Z--X  add an edge Y--Z 2. Remove X and all adjacent edges to it  This procedures create a clique that contains all the neighbors of X u After step 1 we have a clique that corresponds to the intermediate factor (before marginlization) u The cost of the step is exponential in the size of this clique

61 Undirected Graphs u The process of eliminating nodes from an undirected graph gives us a clue to the complexity of inference u To see this, we will examine the graph that contains all of the edges we added during the elimination. The resulting graph is always chordal.

62 Example  Want to compute P(L)  Moralizing V S L T A B XD L T A B X V S D

63 Example  Want to compute P(L)  Moralizing  Eliminating v Multiply to get f’ v (v,t) Result f v (t) V S L T A B XD L T A B X V S D

64 Example  Want to compute P(L)  Moralizing  Eliminating v  Eliminating x Multiply to get f’ x (a,x) Result f x (a) V S L T A B XD L T A B X V S D

65 Example  Want to compute P(L)  Moralizing  Eliminating v  Eliminating x  Eliminating s Multiply to get f’ s (l,b,s) Result f s (l,b) V S L T A B XD L T A B X V S D

66 Example  Want to compute P(D)  Moralizing  Eliminating v  Eliminating x  Eliminating s  Eliminating t Multiply to get f’ t (a,l,t) Result f t (a,l) V S L T A B XD L T A B X V S D

67 Example  Want to compute P(D)  Moralizing  Eliminating v  Eliminating x  Eliminating s  Eliminating t  Eliminating l Multiply to get f’ l (a,b,l) Result f l (a,b) V S L T A B XD L T A B X V S D

68 Example  Want to compute P(D)  Moralizing  Eliminating v  Eliminating x  Eliminating s  Eliminating t  Eliminating l  Eliminating a, b Multiply to get f’ a (a,b,d) Result f(d) V S L T A B XD L T A B X V S D

69 u The resulting graph is the induced graph (for this particular ordering) u Main property: l Every maximal clique in the induced graph corresponds to a intermediate factor in the computation l Every factor stored during the process is a subset of some maximal clique in the graph u These facts are true for any variable elimination ordering on any network Expanded Graphs L T A B X V S D

70 Induced Width (Treewidth) u The size of the largest clique in the induced graph is thus an indicator for the complexity of variable elimination u This quantity (minus one) is called the induced width (or treewidth) of a graph according to the specified ordering u Finding a good ordering for a graph is equivalent to finding the minimal induced width of the graph

71 Consequence: Elimination on Trees u Suppose we have a tree l A network where each variable has at most one parent u All the factors involve at most two variables u Thus, the moralized graph is also a tree A C B D E FG A C B D E FG

72 Elimination on Trees u We can maintain the tree structure by eliminating extreme variables in the tree A C B D E FG A C B D E FG A C B D E FG

73 Elimination on Trees u Formally, for any tree, there is an elimination ordering with treewidth = 1 Thm u Inference on trees is linear in number of variables

74 PolyTrees u A polytree is a network where there is at most one path from one variable to another Thm: u Inference in a polytree is linear in the representation size of the network l This assumes tabular CPT representation u Can you see how the argument would work? A C B D E FG H

75 General Networks What do we do when the network is not a polytree? u If network has a cycle, the treewidth for any ordering is greater than 1

76 Example u Eliminating A, B, C, D, E,…. u Resulting graph is chordal with treewidth 2 A H B D F C E G A H B D F C E G A H B D F C E G A H B D F C E G A H B D F C E G

77 Example u Eliminating H,G, E, C, F, D, E, A A H B D F C E G A H B D F C E G A H B D F C E G A H B D F C E G A H B D F C E G

78 General Networks u From graph theory: Thm: u Finding an ordering that minimizes the treewidth is NP-Hard However, u There are reasonable heuristic for finding “relatively” good ordering u There are provable approximations to the best treewidth u If the graph has a small treewidth, there are algorithms that find it in polynomial time


Download ppt ". Inference I Introduction, Hardness, and Variable Elimination Slides by Nir Friedman."

Similar presentations


Ads by Google