Axioms and Algorithms for Inferences Involving Probabilistic Independence Dan Geiger, Azaria Paz, and Judea Pearl, Information and Computation 91(1), March 1991, Presentation by Guy Moses & Omer Weissbrod for the course Bayesian Networks Computer Science Faculty, Technion – winter 2009 partially based on the presentation by Ilan Gronau
2 What’s ahead? Introduction Introduction - some definitions, notations and reminders. Proof of Completeness Proof of Completeness. - “if it’s true – it can be proved”. Preparations for the Membership Algorithm Preparations for the Membership Algorithm – more definitions, and some theoretical groundwork. The Membership Algorithm The Membership Algorithm – description, proof of correctness, complexity analysis.
3 Definitions U (Universe) – set of random variables with probability distribution P. X,Y – finite sets of random variables: X = x 1,…,x n , Y = y 1,…,y m P(X,Y) = P(X)·P(Y) - a short-hand notation for the equality: Pr{x 1 =a 1,…, x n =a n, y 1 =b 1, …, y m =b m } = Pr{x 1 =a 1,…, x n =a n } · Pr{y 1 =b 1, …, y m =b m } for every choice of a 1, …, a n, b 1, …, b m (X,Y) – short-hand for P(X,Y) = P(X)·P(Y) This is called an independence statement. *note that X, Y are disjoint sets of variables (X Y = ).
4 Notations - a specific independence statement of the form (X,Y) - a set of independence statements of the form (X,Y): = 1, …, k XY - short-hand notation for the union X Y P satisfies = (X,Y) means: P(X,Y) = P(X)·P(Y) for that specific P.
5 Soundness and Completeness Definitions: iff every distribution that satisfies also satisfies . iff cl( ), i.e. there exists a derivation chain 1,…, n = s.t. for each j, either j or j is derived by an axiom from the previous statements. For a set of axioms A : Soundness: A is sound iff for every and : Completeness: A is complete iff for every and : Completeness - Alternative definition: A is complete iff for every and every cl( ) there exists a distribution P that satisfies cl) ( and does not satisfy .
6 Independence Axioms We saw (in 1st lecture) that axioms 1a-1d are sound (always infer correctly). Today we’ll show they are complete (can derive every true statement).
7 What’s ahead? Introduction Introduction - some definitions, notations and reminders. Proof of Completeness Proof of Completeness. - “if it’s true – it can be proved”. Preparations for the Membership Algorithm Preparations for the Membership Algorithm – more definitions, and some theoretical groundwork. The Membership Algorithm The Membership Algorithm – description, proof of correctness, complexity analysis.
8 Minimal Statement Definition: =(X,Y) cl( ) is minimal if for every non-empty X’,Y’ s.t. X’ X, Y’ Y, X’Y’ XY we have (X’,Y’) cl( ). For every =(X,Y) cl( ) we can find an appropriate minimal ’=(X’,Y’) cl( ) through iterative decomposition. Observation: P satisfies P satisfies ’ (decomposition soundness), Therefore: P doesn’t satisfy ’ P doesn’t satisfy . Our plan: Given an arbitrary cl( ), We will find a distribution P that satisfies cl( ) but doesn’t satisfy ’. This will prove completeness (using the alternative completeness definition and the observation above). To simplify annotation, we will assume WLOG that =(X,Y) is already minimal.
9 Let =(X,Y) cl( ) be a minimal statement where: X={x 1,…,x n }, Y={y 1,…,y m }, and Z={z 1,z 2,…,z k } stand for the rest of the variables in U. We will construct P as follows: All variables, except x 1, are fair coins (probability for each of their two values) x 1 is defined thus: Part 1: P does not satisfy We will inspect the following scenario: x 1 =1, all other variables are 0. P (x 1, …, x n, y 1, …, y m ) P (x 1, …, x n )·P (y 1, …, y m ) Therefore, P does not satisfy , as required. Completeness Proof =0=0.5 n =0.5 m
10 Completeness Proof – cont’d Part 2: P satisfies cl( ) Let (V,W) cl( ). We will show that P (V,W)=P (V)·P (W). This is done by inspecting different scenarios: Scenario 1: either V or W contains only elements of Z. We will assume WLOG that W contains only elements of Z. all variables in Z are independent under P and therefore: W V X Z Y Y Y Y Y Y X X X Z Z Z Z Z Z Z Z Z Z Z Z Z
11 Completeness Proof – cont’d Part 2: P satisfies cl( ) Let (V,W) cl( ). We will show that P (V,W)=P (V)·P (W). This is done by inspecting different scenarios: Scenario 2: Both V and W contain elements of X Y, but V W doesn’t contain all elements of X Y. Without full information about the assignments of the variables in X Y, x 1 could turn out to be 0 or 1 with probability , and therefore: W V X Z Y Y Y Y Y Y X X X Z Z Z Z Z Z Z Z Z Z Z Z Z
12 W X Z Y Y Y Y Y Y X X X Z Z Z Z Z Z Z Z Z Z Z Z Z V Completeness Proof – cont’d Part 2: P satisfies cl( ) - continued Scenario 3: Both V and W contain elements of X Y, and (X Y) (V W). We will show a derivation chain for =(X,Y), contradicting our original assumption that cl( ) : Mark: (V,W)=(X V Y V Z V, X W Y W Z W ) cl( ) where: Y=Y V Y W, X=X V X W, Z V Z W Z, V=X V Y V Z V, W=X W Y W Z W Remove all z ’s by decomposition: (X V Y V, X W Y W ) cl( ) Due to minimality of =(X,Y) : (X V,Y V ) cl( ) and (X W,Y) cl( ) (X V,Y V ) (X V Y V, X W Y W ) (X V,Y V X W Y W ) = (X V,X W Y) (X W,Y) (X W Y, X V ) (Y, X V X W ) = (Y,X) = mix
13 Completeness Proof – Summary Reminder: Completeness - Alternative definition: A is complete iff for every and every cl( ) there exists a distribution P that satisfies cl) ( and does not satisfy . We’ve shown: given a minimal cl( ), there exists a distribution P that obeys: 1. P does not satisfy . 2. P satisfies . Given a non-minimal cl( ), we will derive its minimal statement ’, and devise a distribution P ’ that satisfies but does not satisfy ’. Due to soundness of decomposition, P ’ cannot satisfy as well.
14 Scope of Completeness The proof uses P - a binary p.d. (probability distribution function) therefore: all p.d.’s over U discrete p.d.’s binary p.d.’s normal p.d.’s however, for normal p.d.’s, the axiom set a1-d1 is not complete. a stronger axiom is required: replace: with: P
15 What’s ahead? Introduction Introduction - some definitions, notations and reminders. Proof of Completeness Proof of Completeness. - “if it’s true – it can be proved”. Preparations for the Membership Algorithm Preparations for the Membership Algorithm – more definitions, and some theoretical groundwork. The Membership Algorithm The Membership Algorithm – description, proof of correctness, complexity analysis.
16 Some more Definitions and Tools Definition: Span span( ) : the set of elements represented in statement . Example: span(x 1 x 2,x 3, x 4 ) = {x 1,x 2,x 3, x 4 } span( ) : the set of elements represented in all statements of . Example: span({(x 1,x 2 ),(x 1,x 3 )}) = {x 1,x 2,x 3 }
17 Some more Definitions and Tools Definition: Projection The projection of on X, denoted (X), is the statement derived from by removing all elements not in X from . Example: if =(x 1 x 2 x 3, x 4 x 5 ) and X={x 2,x 3,x 4 } then (X)=(x 2 x 3, x 4 ). The projection of on X, denoted (X), is { (X) | }.
18 Some more Definitions and Tools Projection Lemma: iff ‘ , where ’ = (span( )) ) if ' then clearly because all the statements in ‘ can be derived from the statements in by decomposition.
19 Some more Definitions and Tools Projection Lemma: iff ’ , where ’ = (span( )), s = span( ) ) if then there is a derivation chain for : 1, 2, …, k. For each j : if k j, k<j, (by symmetry or decomposition) then k (s) j (s) by symmetry or decomposition respectively. Similarly, if j is derived from k and l by mixing, then j (s) is derived from k (s), l (s) by mixing.
20 Some more Definitions and Tools Projection Lemma: iff ’ , where ’ = (span( )), s = span( ) Observations from projection lemma: Variables not in are unnecessary for determining whether . The problem of verifying whether can be simplified to the problem of verifying whether ' , where '= (span( )). This problem can be solved with a possibly reduced time and space complexity.
21 Conditions for Inference of Independence Maim claim: for a given , we have ’ iff: 1. is trivial: = (X, ) (up to symmetry) OR 2. is in ’: ’ (up to symmetry) OR 3. is derivable from ’: there exists ’ ’ s.t. span( ) = span( ’) and for ’ = (AP,BQ) = (AQ,BP) (A,B,Q,P may be empty) ’ (A,P), ’ (B,Q) (up to symmetry)
22 Proof of Main Claim Maim claim: for a given , we have ’ iff: 1. is trivial*: = (X, ) *up to symmetry 2. is in* ’: ’ 3. is derivable* from ’: ’ ’ s.t. span( ) = span( ’) and for ’=(AP,BQ) =(AQ,BP) : ’ (A,P), ’ (B,Q) ) if 1. is trivial* OR 2. is in* ’. than the proof is immediate. otherwise, 3. there exists ’ ’ s.t. span( ) = span( ’) and for ’=(AP,BQ) =(AQ,BP) : ’ (A,P), ’ (B,Q) we will show a constructive proof under these conditions
23 Proof of Main Claim Maim claim: for a given , we have ’ iff: 1. is trivial*: = (X, ) *up to symmetry 2. is in* ’: ’ 3. is derivable* from ’: ’ ’ s.t. span( ) = span( ’) and for ’=(AP,BQ) =(AQ,BP) : ’ (A,P), ’ (B,Q) ) (contd.) given that ’ (AP,BQ), ’ (A,P), ’ (B,Q). 1. (A,P) (AP,BQ) (A,PBQ) 2. (B,Q) (AP,BQ) (APB,Q) (PB,Q) 3. (PB,Q) (A,PBQ) (AQ,PB) = (AQ, BP) = We’ve proven this direction. mix dec.
24 Proof of Main Claim Maim claim: for a given , we have ’ iff: 1. is trivial*: = (X, ) *up to symmetry 2. is in* ’: ’ 3. is derivable* from ’: ’ ’ s.t. span( ) = span( ’) and for ’=(AP,BQ) =(AQ,BP) : ’ (A,P), ’ (B,Q) ) Given ’ , if 1. is trivial* OR 2. is in* ’, than the proof is immediate. Otherwise, since no axiom can add new variables to a statement, there must exist ’ ’ s.t. span( ) = span( ’) in the derivation chain of . also: = (AQ,BP) (A,P) = (AQ,BP) (Q,B) dec.
25 Conclusions from Claim We’ve seen that, after discarding unneeded variables, it is possible to tell whether ’ (when it’s not immediately obvious) by: a.Finding another statement ’ ’ for which span( ) = span( ’), b.Verifying that ’ (A,P), ’ (B,Q) when ’=(AP,BQ) =(AQ,BP). This suggests using a recursive “divide and conquer” approach.
26 What’s ahead? Introduction Introduction - some definitions, notations and reminders. Proof of Completeness Proof of Completeness. - “if it’s true – it can be proved”. Preparations for the Membership Algorithm Preparations for the Membership Algorithm – more definitions, and some theoretical groundwork. The Membership Algorithm The Membership Algorithm – description, proof of correctness, complexity analysis.
27 The Membership Algorithm Procedure Find( , ): 1.set ’ := (span( )). 2.if is trivial, or ’ (up to symmetry) then Find( , ) := TRUE. 3.elseif for all non-trivial ’ ’ : span( ) span( ’), then Find( , ) := FALSE. 4.else there exists ’ ’ : span( ) = span( ’), and ’=(AP,BQ) =(AQ,BP), set 1 := (A,P), 2 := (B,Q). Find( , ) := ( Find( ’, 1 ) Find( ’, 2 ) )
28 Algorithm Correctness Proof We will prove that Find( , ) := TRUE cl( ) by induction on k= . Induction base: if k=1 then is trivial, therefore the algorithm will return TRUE in step 2 and cl( ).
29 Algorithm Correctness Proof Induction assumption: Find( , ) := TRUE cl( ) for each ’ <k. Induction step: Find( , ) := TRUE iff either: 1. Step 2 returns TRUE is trivial or ’ cl( ). 2. Step 4 returns TRUE iff Find( ’, 1 ) := TRUE Find( ’, 2 ) := TRUE iff 1 cl( ’ ) 2 cl( ’ ) iff cl( ’ ) (according to algorithm’s definition) (according to induction assumption) (according to main claim) (according to projection lemma) iff cl( )
30 Complexity Analysis Definitions: n = the number of distinct variables in { }. k = the number of distinct variables in { }. First projection cost: O (| |·n) – happens only once. Recursive step: T)k) | |·k + T(k 1 ) + T(k 2 ) where k 1 +k 2 =k, k 1 =| 1 |, k 2 =| 2 | Can be shown by induction: T)k) | |·k·( depth of recursion) Worst case analysis: T)k) | |·k·k= | |·k 2 Total run time is bounded by: O (| |·n + | |·k 2 ) which is also: O (| |·n 2 ) since k n.
31 Improvements and Variations Instead of arbitrarily choosing ’, find one whose sub- statements { A, B, P, Q } have balanced size (can improve run-time complexity). Using the derivation chain presented in the constructive proof, the algorithm can also return a derivation chain for with a length of O (k).
32 Variations (contd.) The algorithm can be expanded into a polynomial algorithm for the following problems: Given two sets and , is cl( ) cl( ) ? is cl( ) = cl( ) ? Minimize the size of while preserving cl( ) : Start with a maximal-size statement and remove from all statements derivable from it. Repeat with the next largest statement etc.
Thank you!